Basic Functions in Natural Language Processing

2017-10-30

In this article, I will introduce some basic functions being uesd in Natural Language Processing using python package nltk.

To use the package nltk, you should download the library and corpus you need beforehands.

線上會議筆記與分享：「Demystifying Data Science」

By Chih-Ling Hsu

2017-09-28

「Demystifying Data Science」是美國一場非常精彩的 12 小時免費直播講座，邀請 28 位來自 Facebook、Airbnb、Quora、Etsy、Fast.ai 等知名企業的資深資料科學家分享「如何轉職進入成一位數據分析師」。

由於直播時間是美國時間的早上十點到晚上十點，即，台灣時間的晚上十點到格式的早上十點，因此我只看了晚上十點到半夜十二點半共五場演講，並筆記一些講者分享的內容。由於一些來不及紀錄的缺漏內容是事後再根據記憶補上的，因此有些地方可能用詞或說法會不太精準，就請多多體諒啦。

Paper Notes: Deep Learning at Alibaba

By Chih-Ling Hsu

2017-09-27

This notes is taken from the paper

Rong Jin. Deep Learning at Alibaba. ICJAI 2017

In this keynote, the presenter discussed the limitations of the existing deep learning techniques and shared some solutions that Alibaba chosed to address these problems.

My Trip To Germay in September 2017

By Chih-Ling Hsu

2017-09-25

這次去德國主要是由於父親要去漢諾威看智慧機具展(EMO)，所以有順道到德國南部旅遊，含看展的三天我們總共玩了約莫十天。原以為再美美不過之前去過的瑞士，但出乎我意料地，仍舊令我大開眼界。

Data Manipulation and Visualization Using Elasticsearch and Kibana

By Chih-Ling Hsu

2017-09-24

Elasticsearch is a distributed, real-time, search and analytics platform.

Using a restful API, Elasticsearch saves data and indexes it automatically. It assigns types to fields and that way a search can be done smartly and quickly using filters and different queries.

It’s uses JVM in order to be as fast as possible. It distributes indexes in “shards” of data. It replicates shards in different nodes, so it’s distributed and clusters can function even if not all nodes are operational. Adding nodes is super easy and that’s what makes it so scalable.

ES uses Lucene to solve searches. This is quite an advantage with comparing with, for example, Django query strings. A restful API call allows us to perform searches using json objects as parameters, making it much more flexible and giving each search parameter within the object a different weight, importance and or priority.

Data Mining - Advanced Concepts and Algorithms of Cluster Analysis

By Chih-Ling Hsu

2017-09-24

Agglomerative clustering algorithms vary in terms of how the proximity of two clusters are computed. However, with MIN (single link), it is susceptible to noise/outliers; with MAX/GROUP AVERAGE, it may not work well with non-globular clusters.

So how can we deal with these two types of problem?

Connect to Elastic Cloud with R Client

By Chih-Ling Hsu

2017-09-20

Elastic (part of the rOpenSci project) is a general purpose R interface to Elasticsearch.

Divisive Method for Hierarchical Clustering and Minimum Spanning Tree Clustering

By Chih-Ling Hsu

2017-09-01

Divisive clustering starts with one, all-inclusive cluster. At each step, it splits a cluster until each cluster contains a point (or there are k clusters).

Data Mining - Basic Cluster Analysis

By Chih-Ling Hsu

2017-09-01

“The validation of clustering structures is the most difficult and frustrating part of cluster analysis. Without a strong effort in this direction, cluster analysis will remain a black art accessible only to those true believers who have experience and great courage.”

– Algorithms for Clustering Data, Jain and Dubes

Clustering Analysis is finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups such that

Intra-cluster distances are minimized
Inter-cluster distances are maximized

Cluster Center Initialization Algorithms (CCIA)

By Chih-Ling Hsu

2017-09-01

In iterative clustering algorithms, the procedure adopted for choosing initial cluster centers is extremely important as it has a direct impact on the formation of final clusters. It is dangerous to select outliers as initial centers, since they are away from normal samples.

Cluster Center Initialization Algorithms (CCIA) is a density-based multi-scale data condensation. This procedure is applicable to clustering algorithms for continuous data. In CCIA, we assume that an individual attribute may provide some information about initial cluster center.

An Explorer of Things

Hello, my name is Chihling :)

Basic Functions in Natural Language Processing

線上會議筆記與分享：「Demystifying Data Science」

Paper Notes: Deep Learning at Alibaba

My Trip To Germay in September 2017

Data Manipulation and Visualization Using Elasticsearch and Kibana

Data Mining - Advanced Concepts and Algorithms of Cluster Analysis

Connect to Elastic Cloud with R Client

Divisive Method for Hierarchical Clustering and Minimum Spanning Tree Clustering

Data Mining - Basic Cluster Analysis

Cluster Center Initialization Algorithms (CCIA)