Presentation
Improving Input-Step Performance
DescriptionIn many “Big Data” problems, the data to be analyzed are stored in files; to solve such problems, an input step reads the data from a file into an array for processing. This input step has traditionally been performed sequentially, causing the time to perform that step to grow linearly with N, the number of values in the file. This paper explores different ways to reduce the time consumed by the input step, including the use of different file formats, as well as parallel I/O via MPI-IO. To make parallel I/O easier for students to use, we have created OO_MPI_IO, a new set of C++ abstractions that hide the complexity of MPI-IO. We also demonstrate how these OO_MPI_IO abstractions can (i) improve the scalability of data-intensive problem solutions, and (ii) provide a means of helping students understand Amdahl’s and Gustafson’s Laws.