figshare
Browse
Final Project Data Science Project California Irvine University.pdf (220.57 kB)

Who has ears, listen: Citizen Listening Program for disease prevention.

Download (220.57 kB)
software
posted on 2022-03-13, 07:51 authored by Ramiro García PereiraRamiro García Pereira
This project was created as part of the final project of the Data Science specialization at the University of California Irvine.
The main objective was to create a series of scripts and algorithms capable of performing a comprehensive process (through a fairly complex pipeline) of collecting and processing data from a specific web page, in order to detect any indication of the appearance of a new disease, using natural language processing and the Naive Bayes statistic.
In the final part, a simple word cloud creation algorithm is used to see if there is a trend in keywords previously defined as indicators of the appearance of diseases that were previously considered as potentially likely to appear.
All the code is fully functional even as far as the creation of a datawarehouse in MySQL in order to store the collected historical data.
The paper lacks the final part, in which a script was created using the "apriori" algorithm of association rules to link the most frequent words together in order to make a more accurate prediction of the type of disease that may be brewing.
If anyone is interested in this last block of text, please write to me at blackcatnemegmail.com and I will be happy to send it to you.

History