Easily Parallelising Python Across Nodes Using MPI
One of the key benefits of large-scale cluster computing is the ability to run massively parallelised computing tasks for orders of magnitude gains in completion times. Yet, custom scripts often cannot suitably exploit multi-core/multi-node clusters. While interpreted languages, such as python, can readily support parallelisation across multiple CPU cores, parallelisation across cluster nodes is typically more complex and implemented in lower level languages such as C.
The ubiquity of Python in life science research presents the opportunity for large improvements in compute times, should parallelisation techniques be deployed by researchers. This is a brief introduction to the mpi4py library and a novel tool that exemplifies strategic use of mpi4py to straightforwardly parallelise custom python scripts not only across multiple cores but simultaneously across multiple nodes at OSC.
Funding
From viromes to virocells: dissecting viral roles in terrestrial microbiomes and nutrient_x000d_cycling
Office of Biological and Environmental Research
Find out more...BII-Implementation: The EMERGE Institute: Identifying EMergent Ecosystem Responses through Genes-to-Ecosystems Integration
Directorate for Biological Sciences
Find out more...