Twitter dataset on public sentiments towards biodiversity policy in Indonesia

Mohammad Teduh Uliniansyah, Indra Budi, Elvira Nurfadhilah, Dian Isnaeni Nurul Afra, Agung Santosa, Andi Djalal Latief, Asril Jarin, Gunarso, Meganingrum Arista Jiwanggi, Nuraisa Novia Hidayati, Radhiyatul Fajri, Ryan Randy Suryono, Siska Pebiana, Siti Shaleha, Tosan Wiar Ramdhani, Tri Sampurno

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


In recent years, biodiversity has emerged as a prominent and pressing topic due to the urgent need to address biodiversity loss and the recognition of its connections to climate change and sustainable development. Additionally, increased public awareness and the consideration of economic factors have further underscored the significance of biodiversity conservation. To investigate the sentiment of the Indonesian people towards biodiversity, we conducted a comprehensive data collection on Twitter, focusing on keywords we have set. We amassed a substantial dataset of 500,000 Indonesian tweets from January 2020 to March 2023. These tweets encompassed a wide range of discussions on biodiversity, including its subdomains such as food security, health, and environmental management. Three annotators labeled each tweet with a sentiment class (positive, negative, neutral), or label none for unrelated tweet. The final label was determined using the majority voting method. The tweets with the final label none and those with undecided sentiment class were considered invalid and excluded in the subsequent process. Before labeling, a team of 18 experts jointly developed a labeling guide. This document served as a reference in labeling. After going through a series of processes, including cleaning (removing duplications, irrelevant tweets, and tweets written other than in Indonesian) and preprocessing, we prepared a dataset containing 13,435 tweets. We measured the inter-annotator agreement level, made several models using different algorithms and the K-Fold cross-validation method, and evaluated the models. The Fleiss' Kappa value of the dataset was 0.62187 as the value of the inter-annotator agreement level, and the F1-score value with the best model using the pre-trained IndoBERT model was 0.7959. The Fleiss' Kappa and F1-score values suggest that the annotators have a substantial comprehension and agreement of how to label a tweet, thus ensuring consistency and reliability of our dataset, and the reusability of our dataset is quite suitable for further research on sentiment analysis on biodiversity, respectively. This dataset will benefit various research, including topic modeling, sentiment analysis, public opinion analysis on Twitter, etc., especially biodiversity-related policies.

Original languageEnglish
Article number109890
JournalData in Brief
Publication statusPublished - Feb 2024


  • Environmental management
  • Food security
  • Health
  • Indonesian
  • Natural language processing
  • Sentiment analysis


Dive into the research topics of 'Twitter dataset on public sentiments towards biodiversity policy in Indonesia'. Together they form a unique fingerprint.

Cite this