Feb 2, 2012

Justin Bieber falsely correlates with Influenza

Just now we got aware of a scientific paper by Aron Culotta (2010) evaluating data from The U.S. Centers for Disease Control and Prevention (CDC) on Influenza Like Illnesses (ILI) and specific influenza-related keyphrases on twitter (flu, cough, headache, sore throat...). The correlation of twitter-based predictions of ILI-devlopment (after a training-phase to optimize the algorithm) with real data is amazing, giving proof to the concept of data-mining from social-media streams. While for a variety of analyzed phrases the results were comparably good, there is a word of caution from the authors These results show extremely strong correlations for all queries except for fever, which appears frequently in ļ¬gurative phrases such as “I’ve got Bieber fever”.
Besides the beauty of the demonstrated algorithms the paper gives a helpful overview of fundamental literature in this young field.

No comments: