Presentation
Exploring Data at Scale with Arkouda: A Practical Introduction to Scalable Data Science
DescriptionData scientists can be thought of as modern-day explorers, venturing into the vast unknown of information. However, this exciting journey is not without its hurdles. One of the biggest challenges they face is the sheer immensity of data they encounter. Modern datasets cannot fit in laptop memory, containing terabytes or even petabytes of information. Working with massive data requires specialized tools to extract meaningful insights. As data sets are growing ever larger, data science demands interactivity, where scientists can learn while working with the data. Data science demands scalability, where scientists are able to work with data sets in their entirety. Data scientists have naturally been drawn to Python as it provides interactivity through its read, evaluate, print loop and performance through its utilization of libraries written in other languages, like C and Fortran. These libraries typically are not designed for HPC and run into problems when attempting to scale.