What is Database Management System(DBMS)? In short, a DBMS should support the following functionalities:

  • store data (large amount of data!)
  • query data
  • modify data (add, update, delete, …)
  • enable durability (recover from failure)
  • control access of many users
    • isolation (each transaction appears to execute in isolation from other transactions)
    • atomicity (prevents updates to the database occurring only partially)
  • Allow users to create new databases and specify their schemas using a data definition language

Read More

In a graph structure, what can be called a community? The notion of community structure captures the tendency of nodes to be organized into groups and members within a community are more similar among each other. Typically, a community in graphs/networks is a set of nodes with more/better/stronger connections between its members, than to the rest of the network. However, there is no widely accepted single definition. It depends heavily on the application domain and the properies of the graph.

Read More

Labelling/Annotating data is very expensive. Therefore, usualy we have only small amounts of labelled data and large amounts of unlabelled data. To predict the labels fo the unlabelled data, we can use the graph-based semi-supervised machine learning technique called Label Propagation.

So what is label propagation?

Label Propagation Algorithm (LPA) is an iterative algorithm where we assign labels to unlabelled points by propagating labels through the dataset. For example, if we are given a graph with partially labelled nodes, for any unlabelled node, we can either adopt the dominant label in its neighborhood or wait until a label “propagates” to it if its neighborhood does not have any label.

Read More

A random walk is known as a stochastic or random process which describes a path that consists of a succession of random steps on some mathematical space:

  • given a graph and a starting point, select a neighbour at random
  • move to the selected neighbour and repeat the same process till a termination condition is verified
  • the random sequence of points selected in this way is a random walk of the graph

Read More

In June 2006, MSN Messenger had 30 billion conversations among 240 million people. From the data, the MSN network is constructed as a communication graph with $n = 180 \text{ million}$ nodes and $m = 1.3 \text{ billion}$ undirected edges. To investigate the properties of large-scale networks, we take the real-world data, MSN network, as an example and discover its macroscopic properties:

  • Diameter - What is the degrees of separation in this network?
  • Clustering coefficient - What fraction of my friends are also friends themselves?
  • Degree distribution - Are there many “hubs” in the network?
  • Connectivity - How many “islands” and how big are they?

Read More

Kleinberg’s model presents the infinite family of navigable Small-World networks that generalizes Watts-Strogatz model. Moreover, with Kleinberg’s model it is shown that short paths not only exist but can be found with limited knowledge of the global network. Decentralized search algorithms can find short paths with high probability.

Read More