figshare
Browse
02-cjece-2020-0001.pdf (662.18 kB)

R Libraries {dendextend} and {magrittr} and Clustering Package scipy.cluster of Python For Modelling Diagrams of Dendrogram Trees

Download (662.18 kB)
journal contribution
posted on 2020-08-03, 08:17 authored by Polina LemenkovaPolina Lemenkova

The paper presents a comparison of the two languages Python and R related to the classification tools and demonstrates the differences in their syntax and graphical output. It indicates the functionality of R and Python packages {dendextend} and scipy.cluster as effective tools for the dendrogram modelling by the algorithms of sorting and ranking datasets. R and Python programming languages have been tested on a sample dataset including marine geological measurements. The work aims to detect how bathymetric data change along the 25 bathymetric profiles digitized across the Mariana Trench. The methodology includes performed hierarchical cluster analysis with dendrograms and plotted clustermap with marginal dendrograms. The statistical libraries include Matplotlib, SciPy, NumPy, Pandas by Python and {dendextend}, {pvclust}, {magrittr} by R. The dendrograms were compared by the model-simulated clusters of the bathymetric ranges. The results show three distinct groups of the profiles sorted by the elevation ranges with maximal depths detected in a group of profiles 19-21. The dendrogram visualization in a cluster analysis demonstrates the effective representation of the data sorting, grouping and classifying by the machine learning algorithms. The programming codes presented in this study enable to sort a dataset in a similar research aimed to group data based on the similarity of attributes. Effective visualization by dendrograms is a useful modelling tool for the geospatial management where data ranking is required. Plotting dendrograms by R, comparing to Python, presented functional and sophisticated algorithms, refined design control and fine graphical data output. The interdisciplinary nature of this work consists in application of the coding algorithms for spatial data analysis.

Funding

China Scholarship Council (CSC), State Oceanic Administration, Marine Scholarship of China, Grant Nr. 2016SOA002

History