Analytical Data Clustering

2018-04-02

Analytical clustering is a quick and automatic way by preserving certain features of the input data. The method is analytical, deterministic, unsupervised, automatic, and noniterative.

A Monothetic Clustering Method

By Chih-Ling Hsu

2018-03-29

Monothetic Clustering is often used in Taxonomy. For example, when you see a strange animal, how do you know if it’s never reported before? You may need to ask $N$ True-or-False questions.

Q1. animal? (yes/no)
Q2. with legs? (yes/no)
…

So $Monthetic$ means that every time we only use a single attribute(variable) to cluster.

Fuzzy Clustering and Fuzzy k-Means

By Chih-Ling Hsu

2018-03-25

Fuzzy clustering is the opposite of “Hard Clustering” (i.e., “Crispy Clustering”).

For example, every data point $x$ would claim its percentage belongness to every cluster $C_i$ ($1 \leq i \leq K$ where $K$ is the number of clusters). However, the report will be too long as in this type of clustering representation.

Association Rule Mining 的 Java 程式實作

By Chih-Ling Hsu

2018-03-25

關聯規則探勘(Association Rule Mining)是資料探勘領域中很常用的一種探勘方式，其中Apriori演算法和FP-Growth演算法是最為有名的。在這篇文章中，我會介紹我在這兩個演算法上的實作以及實作成果的實驗數據。

Peak-Climbing Data Clustering

By Chih-Ling Hsu

2018-03-15

Peak-climbing is also called “mode-seeking” or “valley-seeking”.

Graph-Theoretical Method for Clustering

By Chih-Ling Hsu

2018-03-15

In general, there are two steps in Graph Methods.

Step 1. Construct a graph to connect all data (e.g., Minimal Spanning Tree, Relative Neighborhood Graph, Gabrial Graph, Delaunay Triangles, …)

Step 2. Delete some edges which are too long (inconsistent edges)

Clustering on New York City Bike Dataset

By Chih-Ling Hsu

2018-01-02

Our major task here is turn data into different clusters and explain what the cluster means. We will try spatial clustering, temporal clustering and the combination of both.

For each method of clustering, we will

try at least 2 values for each parameter in every algorithm.
explain the clustering result.
make some observation , compare different method and parameters.

Mining Association Rules on New York City Bike Dataset

By Chih-Ling Hsu

2018-01-01

What we want to do here is to design 3 mining tasks with their definitions of transactions and find some rules behind them.

For each task, we should

Try at least two discretization methods (divided by 10, divided by 20, …)
Try at least two algorithms (Apriori, FP-growth, …) to find association rules.
List the interesting rules.
Compare the differences between them.

Data Preprocessing and Exploring the New York City Bike Dataset

By Chih-Ling Hsu

2017-12-20

In this report, I will do some data preprocessing and then get some basic information about the dataset, New York Citi Bike Trip Histories, via tools.

Data Mining - Anomaly Detection

By Chih-Ling Hsu

2017-11-07

Anomalies, or say outliers, are the set of data points that are considerably different than the remainder of the data. Common applications of anomaly detection are credit card fraud detection, telecommunication fraud detection, network intrusion detection, fault detection, and so on.

The working assumption of anomaly detection is:

There are considerably more “normal” observations than “abnormal” observations (outliers/anomalies) in the data.

An Explorer of Things

Hello, my name is Chihling :)

Analytical Data Clustering

A Monothetic Clustering Method

Fuzzy Clustering and Fuzzy k-Means

Association Rule Mining 的 Java 程式實作

Peak-Climbing Data Clustering

Graph-Theoretical Method for Clustering

Clustering on New York City Bike Dataset

Mining Association Rules on New York City Bike Dataset

Data Preprocessing and Exploring the New York City Bike Dataset

Data Mining - Anomaly Detection