Python code for hierarchical cluster analysis of detected R-strategies from rule-based NLP on 500 circular economy definitions
The dataset used in this analysis consists of 500 peer-reviewed circular economy (CE) definitions systematically collected from key academic sources. The definitions were processed using a rule-based NLP model to extract the presence of R-strategies (R0-R9), which operationalize circularity in CE frameworks. Each definition was analyzed for the presence of these strategies, and the results were structured into a binary format (1 if detected, 0 if not) for statistical and clustering analysis.
The hierarchical cluster analysis was performed on this dataset to reveal co-occurrence patterns among R-strategies, using Ward’s method for clustering and Euclidean distance as the similarity metric. The resulting dendrogram visually represents how different strategies are conceptually related based on their co-occurrence in CE definitions.
This Python code was optimized and debugged using ChatGPT-4o to ensure implementation efficiency, accuracy, and clarity.