Presentation
Turbocharging Dask Apps: Accelerating Data Flow with ProxyStore
DescriptionDespite advancements in distributed computing libraries, performance challenges, such as data serialization and transfer, still persist. We focus on understanding data limitations within Dask, a versatile and popular Python library designed for distributed and parallel computing, and then investigate the potential of using the pass-by-proxy paradigm implemented by ProxyStore to address these inefficiencies. By integrating ProxyStore, we streamline data flow in Dask applications, reducing overheads associated with data serialization and scheduler overheads.
Our approach evaluates the impact of proxies on data transfer times and overall computational efficiency. We find that our integration reduces task overheads by 5-6x on a real machine learning application.
Our approach evaluates the impact of proxies on data transfer times and overall computational efficiency. We find that our integration reduces task overheads by 5-6x on a real machine learning application.

Event Type
ACM Student Research Competition: Graduate Poster
ACM Student Research Competition: Undergraduate Poster
Doctoral Showcase
Posters
TimeTuesday, 19 November 202412pm - 5pm EST
LocationB302-B305
TP
XO/EX