figshare
Browse

Comments on Fujian Tulou by Tourists on Ctrip.com.xlsx

Download (317.36 kB)
dataset
posted on 2025-01-06, 06:50 authored by Yang ChenYang Chen

The research data forthis paper were obtainedfrom tourist comments regardingFujian Tulou onCtrip.com.As China's leading online travel service platform, Ctrip boasts an extensive user base and a substantial repository of tourism product reviews, positioning it as a pivotal resource for tourism information in China. With over 400 million registered members, Ctrip accumulates vast amounts of review data that encompass feedback from travelers across diverse regions. Given the breadth and depth of Ctrip's data collection and user coverage, this study selects Ctrip as the primary source for gathering tourist reviews to ensure the accuracy and scientific rigor of the research outcomes. Usingthe Octopus collector to accessthe Ctrip website and by searching for "Fujian Tulou," this study identified"Fujian Tulou (Nanjing) scenic spot," "Yongding Tulou," "Yunshuiyao Scenic Spot," "TianluokengTulou Cluster," and other attractions. The two attractions "Fujian Tulou (Nanjing) Scenic Spot" and "Yongding Tulou" wereselected as the research objectsbecause they are more comprehensive and representative.

The Octopus Collector software was utilized to crawl visitor comments, including comment posting time and IP addressinformation, covering the periodfrom October 2015 to May 2024. The captured web text waspreprocessed to remove emoticons and duplicate irrelevant comments, ultimately yielding 2,923web comments totaling 155,790 characters, along with the corresponding comment posting time and IP addressinformation. This article focuses solely on the analysis of textual comments and does not encompass any photographs or selfies that may have been uploaded alongside the comments. The textual data collected in this study is entirely in Chinese. Specifically, 652 entries contain 15 or fewer characters, constituting 22.3% of the total; 1,941 entries have between 16 and 100 characters, representing 66.4%; and 326 entries exceed 100 characters, accounting for 11.2%.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC