Presentation
Improving SQL Query Execution of Distributed Query Engines on Object-Based Computational Storage through Multi-Layered Offloading
DescriptionThis paper presents an approach to optimize SQL query execution in distributed engines using Object-Based Computational Storage (OCS). Modern analytics platforms like Presto often suffer from excessive data movement between compute and storage nodes, even when only a small subset of data is required. While solutions like S3 SELECT address this by allowing limited operations to be offloaded to storage, they are restricted to simple queries. The OCS system overcomes these limitations by enabling offloading of more complex, platform-independent query plans via Substrait. This work introduces a multi-layered offloading strategy, where query plans are decomposed and distributed between the OCS Front-End (OCSFE) and OCS Array (OCSA), enhancing resource utilization and reducing query latency. Moreover, this paper presents an integration with Presto, which allows seamless query offloading, and a heuristic algorithm that dynamically manages query distribution across OCS layers to ensure efficient execution and scalability.