More than one million negative reviews from a Chinese e-commerce platform

2020-05-14T03:44:41Z (GMT) by Jichang Zhao
***Note: Because of the COVID-19, the lab is shutdown and the data set can not be uploaded now. It will be avaiable later after we can get into the lab.***

The dataset is from a B2C e-commerce platform in China, with massive product negative reviews of four representative sectors including Computers, Phone&Accessories, Gifts&Flowers and Clothing.Here the negative reviews are defined as the reviews with scores 1. After the raw data was collected, deduplication, user anonymization & categorization and text classification was employed to process the raw data. The data contains fields of id for comment, anonymous id for user, review text, timestamp of the posting, negative reason label and user level.

The dataset contains four JSON files, with each file titled by the corresponding sector name.In each JSON file, each line represents a record of a negative review from this sector, in which the filed ‘id’ is the unique code we created for reviews, the filed ‘userID’ is the unique code we created for users, the field ‘userLevel’ is the user’s level in the platform, the field ‘creationTime’ is the timestamp a review was posted, the filed ‘content’ is the review text in Chinese and the field ‘label’ represent why the consumers post the negative reviews, in which 0 for Logistic, 1 for Product function, 2 for Consumer Service and 3 for False Marketing.

The dataset comes from our paper:

Menghan Sun and Jichang Zhao. How do online consumers review negatively? arXiv:2004.13463, 2020.

If it is helpful, please cite the paper.

This work was supported by NSFC (Grant No. 71871006).