Evaluation of the implementation of Indonesian electronic journals citation system using regex technique and PDF extraction tool

Riri Fitri Sari, Agung Kurniawan

Research output: Contribution to journalArticlepeer-review


All research papers produced by researchers worldwide now are based on previous academic publication written by other researchers. Many research papers published in electronic media and new media also refered to previous publications. The advancement of technology makes internet become the most widely used media. Research papers are published in many formats such as in the journal. Relation among journals can be traced through their citations. The number of citation to a journal study can also be calculated to show the contribution of that particular study. In order to know the relation among journal articles published on the internet, a system was designed which can automatically produce a relationship information betwen articles from different journals which are located in different websites. Therefore, in this research we created a mashup in order to extract the web pages and then pick required files automatically. This system produced a database to save the extracted files and then find the relations. The results of the process are shown in a web portal. The interface has functionalities for searching by using the key words inputted by users. As a result, the whole system forms a Mashup. We created an automatic extraction for Indonesian electronic journals system using data from fourteen university e-journal sites. We built the system using PHP language and MySQL database, after carefully studied the algorithm in Openkapow Robomaker. The system can successfully extract information from journal provider's web pages which include special type of PDF pages then save them in database. The system generated finally shows the connection and the relation among the journals. The test result shows the processing time and memory usage evaluation for a random number of files. The evaluation results show that the execution time is dependent on the number of journal series, volumes and number of articles on related e-journal sites. The system has been complemented with some functionalities for the user interface to report the number of the total journal articles extracted automatically from different sites. Some approach such as the use of DOM tree, Regular Expression techniques and PDF extraction tools have been used to improve the system in extracting web pages and getting full journal articles to be processed.

Original languageEnglish
Pages (from-to)259-270
Number of pages12
JournalAsian Journal of Information Technology
Issue number7
Publication statusPublished - 2011


  • Citation index
  • Journal
  • PDF extraction tools
  • Regular expression technique
  • Web extraction


Dive into the research topics of 'Evaluation of the implementation of Indonesian electronic journals citation system using regex technique and PDF extraction tool'. Together they form a unique fingerprint.

Cite this