TY - JOUR
T1 - Community Detection on Citation Network of DBLP Data Sample Set Using LinkRank Algorithm
AU - Yudhoatmojo, Satrio Baskoro
AU - Samuar, Muhammad Arvin
N1 - Funding Information:
The authors would like to thank the Faculty of Computer Science, Universitas Indonesia for funding this publication and the accommodation for attending the Information System International Conference 2017 (ISICO 2017) at Sanur Paradise Plaza Hotel from November 6, 2017 to November 8, 2017 in Bali Indonesia.
Publisher Copyright:
© 2018 The Authors.
PY - 2017
Y1 - 2017
N2 - This paper describes the application of a community detection algorithm, namely LinkRank algorithm, on a citation network. Community detection is a task in network analysis which aims to find sets of tightly connected nodes that are loosely connected with other nodes outside of those sets. In our study, we focused on a citation network which depicts relationships between cited papers and the papers which cite those papers. The objectives of our study are to identify communities of papers based on the citation relationships and analyze the similarities of topics within each community. The approach of our study to reach the objectives is by applying LinkRank algorithm to a citation network. LinkRank algorithm is chosen because it can be applied to a directed network where other algorithms that we have surveyed can only be used on undirected network. The citation network that we used in our study is from Aminer website. In applying the algorithm, we had to port the original source code which is written in C programming language into Python programming language for our convenience in doing the experiment. The result shows that the algorithm able to detect 10,442 communities from 188,514 nodes. Once the communities have been detected, we sampled top three communities (the ones with the largest number of members) and took the top 10 nodes with the highest PageRank score in each of those communities. The samples show that most of the nodes have similar topic, but there are still some nodes with different topics mixed inside the same community. We found the ratio between nodes with similar and different topics to be 7 to 3, that is 70% of the nodes have similar topic while the other 30% have different topics. Thus, the homophily of each community does not reach 100%. Nevertheless, our study confirms that LinkRank algorithm can be used for community detection on directed network.
AB - This paper describes the application of a community detection algorithm, namely LinkRank algorithm, on a citation network. Community detection is a task in network analysis which aims to find sets of tightly connected nodes that are loosely connected with other nodes outside of those sets. In our study, we focused on a citation network which depicts relationships between cited papers and the papers which cite those papers. The objectives of our study are to identify communities of papers based on the citation relationships and analyze the similarities of topics within each community. The approach of our study to reach the objectives is by applying LinkRank algorithm to a citation network. LinkRank algorithm is chosen because it can be applied to a directed network where other algorithms that we have surveyed can only be used on undirected network. The citation network that we used in our study is from Aminer website. In applying the algorithm, we had to port the original source code which is written in C programming language into Python programming language for our convenience in doing the experiment. The result shows that the algorithm able to detect 10,442 communities from 188,514 nodes. Once the communities have been detected, we sampled top three communities (the ones with the largest number of members) and took the top 10 nodes with the highest PageRank score in each of those communities. The samples show that most of the nodes have similar topic, but there are still some nodes with different topics mixed inside the same community. We found the ratio between nodes with similar and different topics to be 7 to 3, that is 70% of the nodes have similar topic while the other 30% have different topics. Thus, the homophily of each community does not reach 100%. Nevertheless, our study confirms that LinkRank algorithm can be used for community detection on directed network.
KW - Citation Network
KW - Community Detection
KW - Complex Network
KW - Directed Network
KW - LinkRank Algorithm
KW - Social Network Analysis
UR - http://www.scopus.com/inward/record.url?scp=85041507869&partnerID=8YFLogxK
U2 - 10.1016/j.procs.2017.12.126
DO - 10.1016/j.procs.2017.12.126
M3 - Conference article
AN - SCOPUS:85041507869
VL - 124
SP - 29
EP - 37
JO - Procedia Computer Science
JF - Procedia Computer Science
SN - 1877-0509
T2 - 4th Information Systems International Conference 2017, ISICO 2017
Y2 - 6 November 2017 through 8 November 2017
ER -