figshare
Browse

ELMAS dataset

dataset
posted on 2023-09-03, 17:53 authored by Kevin BELLINGUERKevin BELLINGUER, Robin Girard, Alexis Bocquet, Antoine Chevalier

This dataset provides a set of 18 load profiles with an hourly temporal resolution that represent main industrial and tertiary sectors in France for the year 2018.

The ELMAS dataset is derived from a total of 55,730 consumption time series initially split into 424 business sectors and three levels of subscribed capacity. The customer’s field of activity follows the Statistical Classification of Economic Activities in the European Community (NACE), which is a four-digit industry standard classification used in the European Union composed of 21 sections, 88 divisions, 272 groups, and 615 classes. For anonymity concerns, the initial times series are averaged according to their NACE coding and level of subscribed capacity.

Discrepancies between the temporal patterns of customers that belong to the same NACE section highlight the need to resort to another clustering approach. Thus, a K-means algorithm is used to gather the business groups sharing similar temporal patterns into 18 clusters. The resulting clustering shows that numerous NACE sections are scattered over various clusters, which increases the global heterogeneity of the clustering while spoiling the interpretation. The proportion of these dispersed NACE classes in terms of annual energy consumption remains low, which suggests that a manual reorganisation has little impact on the global consistency of the clusters. This manual reclassification is conducted in such a way that scattered NACE classes are gathered in the cluster that possesses the highest share of the considered NACE section. The energy consumption time series dataset represents a limited panel composed of 55,730 customers, which may bias the output load profiles in comparison with the whole French panel of industrial and tertiary customers. To fill this gap, Enedis provides the annual energy consumption of a wider range of customers for the year 2019. This annual energy consumption dataset is used to generate weights implemented in the clustering approach and to derive weighted average time series for the clusters.

History