TY - GEN
T1 - How Resilient is Privacy-preserving Machine Learning Towards Data-Driven Policy? Jakarta COVID-19 Patient Study Case
AU - Nasution, Bahrul Ilmi
AU - Nugraha, Yudhistira
AU - Bhaswara, Irfan Dwiki
AU - Aminanto, Muhamad Erza
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/11/30
Y1 - 2023/11/30
N2 - With the rise of personal data law in various countries, data privacy has recently become an essential issue. One of the well-known techniques used in overcoming privacy issues during analysis is differential privacy. However, many studies have shown that differential privacy decreased the machine learning model performance. It becomes problematic for any organization like the government to draw a policy from accurate insights from citizen statistics while maintaining citizen privacy. This study reviews differential privacy in machine learning algorithms and evaluates its performance on real COVID-19 patient data, using Jakarta, Indonesia as a case study. Besides that, we also validate our study with two additional datasets, the public Adult dataset from University of California, Irvine, and an Indonesia socioeconomic dataset. We find that using differential privacy tends to reduce accuracy and may lead to model failure in imbalanced data, particularly in more complex models such as random forests. The finding emphasizes differential privacy usage in government is practical for the trustworthy government but with distinct challenges. We discuss limitations and recommendations for any organization that works with personal data to leverage differential privacy in the future.
AB - With the rise of personal data law in various countries, data privacy has recently become an essential issue. One of the well-known techniques used in overcoming privacy issues during analysis is differential privacy. However, many studies have shown that differential privacy decreased the machine learning model performance. It becomes problematic for any organization like the government to draw a policy from accurate insights from citizen statistics while maintaining citizen privacy. This study reviews differential privacy in machine learning algorithms and evaluates its performance on real COVID-19 patient data, using Jakarta, Indonesia as a case study. Besides that, we also validate our study with two additional datasets, the public Adult dataset from University of California, Irvine, and an Indonesia socioeconomic dataset. We find that using differential privacy tends to reduce accuracy and may lead to model failure in imbalanced data, particularly in more complex models such as random forests. The finding emphasizes differential privacy usage in government is practical for the trustworthy government but with distinct challenges. We discuss limitations and recommendations for any organization that works with personal data to leverage differential privacy in the future.
KW - covid-19
KW - data-driven policy
KW - machine learning
KW - privacy-preserving
UR - http://www.scopus.com/inward/record.url?scp=85179551377&partnerID=8YFLogxK
U2 - 10.1145/3605772.3624003
DO - 10.1145/3605772.3624003
M3 - Conference contribution
AN - SCOPUS:85179551377
T3 - ARTMAN 2023 - Proceedings of the 2023 Workshop on Recent Advances in Resilient and Trustworthy ML Systems in Autonomous Networks
SP - 5
EP - 10
BT - ARTMAN 2023 - Proceedings of the 2023 Workshop on Recent Advances in Resilient and Trustworthy ML Systems in Autonomous Networks
PB - Association for Computing Machinery, Inc
T2 - 1st Workshop on Recent Advances in Resilient and Trustworthy ML Systems in Autonomous Networks, ARTMAN 2023, co-located with ACM CCS 2023
Y2 - 30 November 2023
ER -