Summary
Within this project, I came across different search engines as follows:
- Escape: An engine that focuses on finding graphlet in small networks.[7]
- GraphLift: An engine that finds motifs in large networks.
- PGD: An engine that finds graphlets in large networks.
I tested all of them and figured out the PGD works the best in this case. However, each of these search engines has value and can be used in different cases. By the system I represent, we can implement any type of search engine for the end-users to simply use and get the result in less than a second. This user-friendly system is tested on local machines and can be deployed on the cloud as well.
Experimental details
All of the search engines were installed and tested with a Linux(Ubuntu) operating system on a Huawei matebook pro machine. The framework that I used for building this user-friendly system is Flask. Flask is a micro and light web framework written in Python for simple web applications. Since our purpose is to make a system that users can use the search engine without installing it with just one webpage then this framework fits the best. With Flask we communicate between back and front end via Python language.
Flask supports extensions that can add application features as if they were implemented in Flask itself. Extensions exist for object-relational mappers, form validation, upload handling, various open authentication technologies and several common framework related tools.
Flask was created by Armin Ronacher of Pocoo, an international group of Python enthusiasts formed in 2004. According to Ronacher, the idea was originally an April Fool’s joke that was popular enough to make into a serious application.
Result
Graphlet is used in different fields that the use of computer technology is not vital. Hence, there should be a user friendly system that could skip all the tackling parts with technical issues that come from installing the search engine on an operating system. Therefore, I represent a user-friendly website for the end-users to be able to use a search algorithm (here we used PGD which I will talk about later) to search for any type of graphlets in the network. Users can use this webpage to upload their large or small graph/dataset and get the number of the graphlet they are looking for. The different types of graphlet that a user can look for can be found in the table here. Here, we currently have 17 different types of graphlet that can be detected and counted in any given network.
In this system, users do not need to install anything. This search engine works on the backend side and they can use it via any browser. On the website, there are instructions on how to upload the file which contains their data sets and once they upload the file, and choose their target graphlet, they can see the result by clicking on the search button. This search engine system shows the result in less than a second no matter how large the network is. This system itself is runnable both in local machines and can be deployed in the cloud.
The body
A network consists of a set of objects such as nodes and vertices that are connected together. The connections between the nodes are called edges or links. In mathematics, networks are often referred to as graphs[2]. An example of a network is the Internet, which connects millions of people all over the world. That could be considered as Facebook or other social media in which people are vertices and the friendship between them describes the edges.
There are a variety of different types of networks such as social network, biological network, technological network, and information network. Scientists, based on their field of a subject named the networks and its components differently. A network or a graph that consists of vertices and edges is called nodes and links (in Technology), sites and bonds (in Physics), or actors and ties (in Sociology). The point of using networks for real-world phenomena is to better understand the phenomena and to forecast their behavior with experiments[1].
Detecting and counting subgraphs in networks is one of the fundamental network analysis techniques that is used on a variety of networks such as bioinformatics, social science, or infrastructure networks. Another fundamental network analysis technique is to classify vertex roles in networks. All of these could be obtained by finding graphlets/motifs in networks.
Graphlets are small connected induced subgraphs of a large network. A close task to graphlet is motifs detection. Graphlets differ from network motifs, since they must be induced subgraphs, whereas motifs are partial subgraphs. An induced subgraph must contain all edges between its nodes that are present in the large network, while a partial subgraph may contain only some of these edges. Moreover, graphlets do not need to be over-represented in the data when compared with randomized networks, while motifs do. Detecting and counting graphlets are used in different subjects; from social science to biology mostly in the building blocks of network analysis. Graphlet analysis in social science also known as k-subgraph census is widely adopted in sociometric studies.
This is mostly focused on analyzing triadic tendencies as important structural features of social networks, analyzing triadic configurations. In social networks, discovering transitivity which is basically the tendency of friends of friends to be friends themselves is a feature. In biology, graphlet is used for protein function prediction, network alignment (is a method to align nodes that belong to the same entity from different networks) , and phylogeny (which In biology, phylogenetics is a part of systematics that addresses the derivation of the evolutionary history and relationships within groups of organisms ).[8]
By the way graphlet detection was used for studying Covid-19. In one of the research related to Covid-19, they use graphlets as coding elements for encoding topological structures and dynamic changes of live networks. Graphlets are mostly used for statistical characterization and modeling of entire networks. In one of the studies on that they introduced how to differentiate the collaboration networks at three granularity levels, and associate them as well, via topological encoding with graphlets. [9]
Graphlet analysis is also used in many other cases such as computer networking, chemoinformatics, image segmentation, and many others[6].
The recursive decomposition of networks is one of the common ways in network analysis to categorize the large and complex graphs in result to find small subgraph patterns of size k nodes. This approach is used in Parallel Parameterized Graphlet Decomposition(PGD).
The graphlet counting is computationally intensive since the number of possible k-subgraphs in a graph G increases exponentially with k and can be computed in the multiplication of the number of vertices and maximum degree of the graph O(|V |.∆k−1 )[4]. To address these problems, PGD proposes a fast, efficient, and parallel algorithm for counting graphlets of size up to 4 nodes that take only a fraction of the time to compute when compared with the different methods used. The proposed graphlet counting algorithm leverages a number of proven combinatorial arguments for different graphlets. For each edge, we count a few graphlets, and with these counts along with the combinatorial arguments, we obtain the exact counts of others in constant time. On a large collection of more than 300 networks from a variety of domains, these graphlet counting strategies are on average 480 times faster than current methods. This brings new opportunities to investigate the use of graphlets on much larger networks and newer applications[5]. Furthermore, a number of important machine learning tasks are likely to benefit from such an approach.
Conclusion
Graphlet detection did impact in a variety of domains from social science to biology. The search engine for graphlet counting is still under the process of getting faster and faster. The system that I implemented is a way to use this essential search engine for any kind of research subject.
Recommendation
The search engine that I used in here is not necessarily the latest search engine. One thing that could potentially enhance the search engine is to investigate the latest search engines and find which is a better version of PGD and implement that. The other thing could be to make the system to install any search engine autonomously on the backend and make it usable for the user.
References:
- 1.Networks an introduction (by M. E. J. Newman)
- 2.mathinsight.org
- 3.Graft: An Efficient Graphlet Counting Method for Large Graph Analysis (Mahmudur Rahman, Mansurul Alam Bhuiyan, Mohammad Al Hasan), In IEEE Trans. Knowl. Data Eng., volume 26, 2014.
- 4.“Efficient graphlet kernels for large graph comparison,” (N. Shervashidze, T. Petri, K. Mehlhorn, K. M. Borgwardt, and S. Vishwanathan), in AISTATS, 2009
- 5. “Efficient Graphlet Counting for Large Networks” (Nesreen K. Ahmed, Jennifer Neville, Ryan A. Rossi, Nick Duffield), in IEEE International Conference on Data Mining, 2015
- 6. “Graphlet Decomposition: Framework, Algorithms, and Applications” (Nesreen K. Ahmed, Jennifer Neville, Ryan Rossi, Nick G. Duffield, and Theodore L. Willke), in Springer-Verlag London, 2016
- 7.”ESCAPE: Efficiently Counting All 5-Vertex Subgraphs” (Ali Pinar, C. Seshadhri, V. Vishal), in Creative Commons CC BY 4.0 License, 2017
- 8.”Efficient estimation of graphlet frequency distributions in protein–protein interaction networks” (N. Pržulj, D. G. Corneil, I. Jurisica), by Oxford University Press, 2006
- 9.”Using Graphlet Spectrograms for Temporal Pattern Analysis of Virus-Research Collaboration Networks” (Dimitris Floros, Tiancheng Liu, Nikos Pitsianis, Xiaobai Sun), 2020