An Empirical Study on Energy Usage Patterns of Different Variants of Data Processing Libraries
As computing power grows, so does the need for data processing, which uses a lot of energy in steps like cleaning and analyzing data. This study looks at the energy and time efficiency of four common Python libraries—Pandas, Vaex, Scikit-learn, and NumPy—tested on five datasets across 21 tasks. We compared the energy use of the newest and older versions of each library. Our findings show that no single library always saves the most energy. Instead, energy use varies by task type, how often tasks are done, and the library version. In some cases, newer versions use less energy, pointing to the need for more research on making data processing more energy-efficient.
A zip file accompanying this study contains the scripts, datasets, and a README file for guidance. This setup allows for easy replication and testing of the experiments described, helping to further analyze energy efficiency across different libraries and tasks.