Harvesting bibliography multi-thread, safe and ethical web crawling

Harry Tursulistyono Yani Achsan, Wahyu Catur Wibowo, Wahyuningdiah Trisari Harsanti Putri, M. Muhtar Baswara Achsan, Quintin Kumia Dikara Barcah

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Web mining is an important techniques for it enables extraction of data and information from the web for further needs. Although there are a lot of web databases that holds bibliography data, The Online Computer Library Center (OCLC) owns the largest web databases on bibliography in the world. A large number of bibliography data, surely needs more than manual downloading. In this research we conduct an experimentation to harvest bibliography data using multi-threading process that is fast, safe and ethical. Using C# programming language and Visual Studio IDE, We were successful in harvesting five million bibliography data, without being penalized by the source site.

Original languageEnglish
Title of host publication2018 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages355-360
Number of pages6
ISBN (Electronic)9781728101354
DOIs
Publication statusPublished - 17 Jan 2019
Event10th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2018 - Yogyakarta, Indonesia
Duration: 27 Oct 201828 Oct 2018

Publication series

Name2018 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2018

Conference

Conference10th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2018
CountryIndonesia
CityYogyakarta
Period27/10/1828/10/18

Fingerprint Dive into the research topics of 'Harvesting bibliography multi-thread, safe and ethical web crawling'. Together they form a unique fingerprint.

  • Cite this

    Achsan, H. T. Y., Wibowo, W. C., Putri, W. T. H., Achsan, M. M. B., & Barcah, Q. K. D. (2019). Harvesting bibliography multi-thread, safe and ethical web crawling. In 2018 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2018 (pp. 355-360). [8618262] (2018 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICACSIS.2018.8618262