TY - GEN
T1 - Understanding the Use of e-Prints on Reddit and 4chan's Politically Incorrect Board
AU - Yudhoatmojo, Satrio
AU - De Cristofaro, Emiliano
AU - Blackburn, Jeremy
N1 - Funding Information:
Satrio Yudhoatmojo would like to thank DIKTI-funded Fulbright Grants for supporting his doctoral study at Binghamton University. This research was supported by the National Science Foundation under Grant No. IIS-2046590. We also would like to thank Media Ecosystems Analysis Group and the Bill & Melinda Gates Foundation for their generous support.
Publisher Copyright:
© 2023 ACM.
PY - 2023/4/30
Y1 - 2023/4/30
N2 - The dissemination and reach of scientific knowledge have increased at a blistering pace. In this context, e-Print servers have played a central role by providing scientists with a rapid and open mechanism for disseminating research without waiting for the (lengthy) peer review process. While helping the scientific community in several ways, e-Print servers also provide scientific communicators and the general public with access to a wealth of knowledge without paying hefty subscription fees. This motivates us to study how e-Prints are positioned within Web community discussions. In this paper, we analyze data from two Web communities: 14 years of Reddit data and over 4 from 4chan's Politically Incorrect board. Our findings highlight the presence of e-Prints in both science-enthusiast and general-audience communities. Real-world events and distinct factors influence the e-Prints people's discussions; e.g., there was a surge of COVID-19-related research publications during the early months of the outbreak and increased references to e-Prints in online discussions. Text in e-Prints and in online discussions referencing them has a low similarity, suggesting that the latter are not exclusively talking about the findings in the former. Further, our analysis of a sample of threads highlights: 1) misinterpretation and generalization of research findings, 2) early research findings being amplified as a source for future predictions, and 3) questioning findings from a pseudoscientific e-Print. Overall, our work emphasizes the need to quickly and effectively validate non-peer-reviewed e-Prints that get substantial press/social media coverage to help mitigate wrongful interpretations of scientific outputs.
AB - The dissemination and reach of scientific knowledge have increased at a blistering pace. In this context, e-Print servers have played a central role by providing scientists with a rapid and open mechanism for disseminating research without waiting for the (lengthy) peer review process. While helping the scientific community in several ways, e-Print servers also provide scientific communicators and the general public with access to a wealth of knowledge without paying hefty subscription fees. This motivates us to study how e-Prints are positioned within Web community discussions. In this paper, we analyze data from two Web communities: 14 years of Reddit data and over 4 from 4chan's Politically Incorrect board. Our findings highlight the presence of e-Prints in both science-enthusiast and general-audience communities. Real-world events and distinct factors influence the e-Prints people's discussions; e.g., there was a surge of COVID-19-related research publications during the early months of the outbreak and increased references to e-Prints in online discussions. Text in e-Prints and in online discussions referencing them has a low similarity, suggesting that the latter are not exclusively talking about the findings in the former. Further, our analysis of a sample of threads highlights: 1) misinterpretation and generalization of research findings, 2) early research findings being amplified as a source for future predictions, and 3) questioning findings from a pseudoscientific e-Print. Overall, our work emphasizes the need to quickly and effectively validate non-peer-reviewed e-Prints that get substantial press/social media coverage to help mitigate wrongful interpretations of scientific outputs.
KW - 4chan
KW - doc2vec
KW - e-print
KW - preprint
KW - quantitative
KW - reddit
KW - top2vec
UR - http://www.scopus.com/inward/record.url?scp=85159233407&partnerID=8YFLogxK
U2 - 10.1145/3578503.3583627
DO - 10.1145/3578503.3583627
M3 - Conference contribution
AN - SCOPUS:85159233407
T3 - ACM International Conference Proceeding Series
SP - 117
EP - 127
BT - WebSci 2023 - Proceedings of the 15th ACM Web Science Conference
PB - Association for Computing Machinery
T2 - 15th ACM Web Science Conference, WebSci 2023
Y2 - 30 April 2023 through 1 May 2023
ER -