Close

Presentation

Turbocharging Dask Apps: Accelerating Data Flow with ProxyStore
DescriptionDespite advancements in distributed computing libraries, performance challenges, such as data serialization and transfer, still persist. We focus on understanding data limitations within Dask, a versatile and popular Python library designed for distributed and parallel computing, and then investigate the potential of using the pass-by-proxy paradigm implemented by ProxyStore to address these inefficiencies. By integrating ProxyStore, we streamline data flow in Dask applications, reducing overheads associated with data serialization and scheduler overheads.
Our approach evaluates the impact of proxies on data transfer times and overall computational efficiency. We find that our integration reduces task overheads by 5-6x on a real machine learning application.