Just now we got aware of a scientific paper by Aron Culotta (2010) evaluating data from The U.S. Centers for Disease Control and Prevention (CDC) on Influenza Like Illnesses (ILI) and specific influenza-related keyphrases on twitter (flu, cough, headache, sore throat...). The correlation of twitter-based predictions of ILI-devlopment (after a training-phase to optimize the algorithm) with real data is amazing, giving proof to the concept of data-mining from social-media streams. While for a variety of analyzed phrases the results were comparably good, there is a word of caution from the authors These results show extremely strong correlations for all queries except for fever, which appears frequently in figurative phrases such as “I’ve got Bieber fever”.
Besides the beauty of the demonstrated algorithms the paper gives a helpful overview of fundamental literature in this young field.
Besides the beauty of the demonstrated algorithms the paper gives a helpful overview of fundamental literature in this young field.
Comments