datas.zip (2.19 MB)
Datas of Disease Patterns
1.the "dingxiang_datas.xls"contains all the original data which is crawled from DingXiang forum, and also the word segmentation result for each medical record is given.
2.the "pmi_new_words.txt" is the result of new medical words found by calculating mutual information.
3.the "association_rules" folder contains the association rules mined from the dataset where h-confidence threshold is set 0.3 and support threshold is set 0.0001.
4.the "network_communities.csv" describes the complication communities.
p.s. if you encounter a "d", it means the word is a disease description vocabulary, and "z" or "s" represents a symptom description vocabulary.