Presenter
Simon Perkins
Biography
Radio astronomy software is entering a new era, driven by advanced interferometers like MeerKAT, SKA, ngVLA, and DSA-2000. These instruments generate enormous data volumes; for instance, a four-hour SKA MID observation may produce up to 3.3 PB of data. This introduces two key challenges: Firstly, processing such massive data requires a new generation of high-performance computing (HPC) software. Secondly, increased sensitivity from modern instruments necessitates new techniques to handle previously inconsequential and unknown artifacts.
This produces a critical tension in Radio Astronomy software development: A fully optimised HPC pipeline is desirable for producing science products in a tractable amount of time, but the design requirements for such a pipeline are unlikely to be understood upfront in the context of artefacts unveiled by greater instrument sensitivity. Therefore, new techniques must continuously be developed to address these artefacts and integrated into a full pipeline, producing a fundamental trade-off between a trifecta of (1) flexibility (2) ease-of-development and (3) performance. At one end of the spectrum, rigid design requirements are unlikely to capture the full scope of the problem, while throw-away research code is unsuitable for production use.
In this talk, we introduce a framework that prioritizes flexibility and ease of development without significantly compromising performance. Our approach adapts methodologies from the Pangeo project, which leverages open-source Python tools for large-scale data processing in climate science. Software and data formats from the Python Open Source Community are used to develop distributed Radio Astronomy applications that run on both cloud platforms and supercomputers. By combining Xarray, Dask, Zarr, NumPy, and SciPy, we unlock a plethora of algorithms for rapid development and testing.
This approach culminates in the Africanus ecosystem—a suite of applications used to develop a radio transient detection pipeline destined for deployment on the multi-petabyte MeerKAT archive.
This produces a critical tension in Radio Astronomy software development: A fully optimised HPC pipeline is desirable for producing science products in a tractable amount of time, but the design requirements for such a pipeline are unlikely to be understood upfront in the context of artefacts unveiled by greater instrument sensitivity. Therefore, new techniques must continuously be developed to address these artefacts and integrated into a full pipeline, producing a fundamental trade-off between a trifecta of (1) flexibility (2) ease-of-development and (3) performance. At one end of the spectrum, rigid design requirements are unlikely to capture the full scope of the problem, while throw-away research code is unsuitable for production use.
In this talk, we introduce a framework that prioritizes flexibility and ease of development without significantly compromising performance. Our approach adapts methodologies from the Pangeo project, which leverages open-source Python tools for large-scale data processing in climate science. Software and data formats from the Python Open Source Community are used to develop distributed Radio Astronomy applications that run on both cloud platforms and supercomputers. By combining Xarray, Dask, Zarr, NumPy, and SciPy, we unlock a plethora of algorithms for rapid development and testing.
This approach culminates in the Africanus ecosystem—a suite of applications used to develop a radio transient detection pipeline destined for deployment on the multi-petabyte MeerKAT archive.
Presentations