 | Interactive and Tool-Agnostic ML-Driven Workflow for Automated HPC Performance Modeling | | |
 | ORCHA: A Performance Portability System for Flash-X — A Multiphysics Application Software | | |
 | Improving Polyhedral-Based Optimizations with Dynamic Coordinate Descent | | |
 | Performance Engineering and Mesoscale-Microscale Coupling for Wind Energy Simulations | | |
 | Establishing Best Practices for Applying Inline Compressed Arrays to Improve Performance in HPC | | |
 | Stalls and Memory Analysis on Fujitsu A64FX and NVIDIA Grace | | |
 | FortranX: Harnessing Code Generation, Portability, and Heterogeneity in Fortran | | |
 | Hardware-Independent Sampling Library for CPUs and (Multi-)GPUs: hws | | |
 | Fault-Tolerant Numerical Iterative Algorithms at Scale | | |
 | Exploration of Super-Resolution Techniques for Image Compression | | |
 | Seesaw: Elastic Scaling for Task-Based Distributed Programs | | |
 | SanQus: Staleness and Quantization-Aware Full-Graph Decentralized Training in GNNs | | |
 | Parallel Verification of Neural Networks Applied to Medical Imaging | | |
 | Power Patterns: Understanding the Energy Dynamics of I/O for Parallel Storage Configurations | | |
 | Meteorologic Real-Time Extreme Learning Machine for Pressure Prediction | | |
 | On the Accuracy and Efficiency of Approximate Triangle Counting via Randomized Numerical Linear Algebra | | |
 | Optimal Client Selection Algorithms for Federated Learning | | |
 | Machine Learning Applications for Early-Stage Ovarian Cancer Diagnosis | | |
 | A Comparison Study of Open Source LLMs for HPC Ticket Answering | | |
 | MatRIS: Performance Portable Math Library of IRIS Runtime for Multi-Device Heterogeneity | | |
 | Performance of Inline Compression with Software Caching for Reducing the Memory Footprint in pySDC | | |
 | GPU Compression (for Scientific Data) Done Right | | |
 | Design of Reliable and Efficient Syscall Hooking Library for a Parallel File System | | |
 | Computational Radiation Hydrodynamics with FleCSI | | |
 | Characterizing the Performance of the GENE-X Code for Gyrokinetic Turbulence Simulations | | |
 | Turbocharging Dask Apps: Accelerating Data Flow with ProxyStore | | |
 | NetCDFaster: A Geospatial Cyberinfrastructure Enhancing Multi-Dimensional Scientific Dataset Access and Visualization Through Machine Learning Optimization | | |
 | Assessing the Impact of Real-Time Traffic Updates on Traffic Flow: A High-Performance Computing Perspective on Scalability and Demand | | |
 | PINE: Efficient Yet Effective Piecewise Linear Trees | | |
 | JACC: HPC Meta-Programming and Performance Portability Ecosystem for Julia Language | | |
| Profiling the Impact of Hyper-Threading on Pagosa Hydrocodes | | |
 | Neural Network Optimization and Performance Analysis for Real-Time Object Detection at the Edge | | |
 | Exploiting Data Compression and Low Precision for Exascale Fusion Turbulence Simulations | | |
 | Enhancing HPC Resource Management to Integrate Quantum Workflows | | |
 | LM-Offload: Performance Model-Guided Generative Inference of Large Language Models with Parallelism Control | | |
 | QDD: Multi-Node Implementation of Decision Diagram-Based Quantum Circuit Simulator with Ring Communication and Auto SWAP Insertion | | |
 | PerfFlowAspect: A User-Friendly Performance Tool for Scientific Workflows | | |
 | Predicting Dataset Popularity for Improved Distributed Content Caching in High Energy Physics | | |
 | Enhancing the Traditional Benchmarks for Parallel Computing Education | | |
 | FAS-GED: GPU-Accelerated Graph Edit Distance Computation | | |
 | Scored Non-Deterministic Finite Automata Processor for Sequence Alignment | | |
 | Large-Scale Randomized Program Generation with Large Language Models | | |
 | Comparing Cache Utilization Trends for Regional Scientific Caches with Transfer Learning Models | | |
 | Analyzing Alltoall Algorithms with SST | | |
 | Enhancing Performance Reproducibility on HPC Workflows | | |
 | A Novel Gradient Compression Design with Ultra-High Compression Ratio for Communication-Efficient Federated Learning | | |
 | PipeInfer: Accelerating LLM Inference Using Asynchronous Pipelined Speculation | | |
 | Breaking the Barriers to Effective Supercomputing: Web Dashboard for Job Accounting and Performance Metrics | | |
 | A Sparse Approach for Translation-Based Training of Knowledge Graph Embeddings | | |
 | Scalable Low-Latency Hardware Function Chaining with Chain Control Circuit | | |
 | HARVEST-2.0: High-Performance Vision Framework for End-to-End Preprocessing, Training, Inference, and Visualization | | |
 | Exploring Fine-Grained Memory Analysis for PIM Offloading | | |
 | Uncover the Overhead and Resource Usage for Handling KV Cache Overflow in LLM Inference | | |
 | Scalable Performance and Accuracy Analysis for Distributed and Extreme-Scale Systems | | |
 | Improving the Performance of Proof-of-Space in Blockchain Systems | | |
 | Communication Hiding for Matrix-Free Finite Element Operators of a Complex PDE: Nonlinear Stokes Flow of Earth’s Mantle | | |
 | Increasing the Efficiency of Neutral Atoms by Reducing Qubit Waste from Measurement-Related Ejections | | |
 | The P3 Explorer: Exploring the Performance, Portability, and Productivity Wilderness | | |
 | GNN-RL: An Intelligent HPC Resource Scheduler | | |
 | PcMINER: Mining Performance-Related Commits at Scale | | |
 | Profiling Communication Overhead in 3D Parallel Pretrain of Large Language Models | | |
 | An Accurate and Scalable Multidimensional Quantum Solver for Partial Differential Equations | | |
 | Generalizing ExaDigiT Datacenter Digital Twin Framework for Multiple Architectures | | |
 | Edge-Enabled Real-Time Data Processing in Power-Efficient Weather Stations Using IBIS | | |
 | CoVA: Compiler for Versatile Architectures | | |
 | Trusted Platform Provisioning for the OpenCHAMI Cluster Management Stack | | |
 | Parallelization of the Finite Element-Based Mesh Warping Algorithm Using Hybrid Parallel Programming | | |
 | QFw: A Quantum Framework for Large-Scale HPC Ecosystems | | |
 | Efficient Approaches to Analyzing Large Dynamic Networks | | |
 | JUmPER: Performance Data Monitoring, Instrumentation and Visualization for Jupyter Notebooks | | |
 | Memory Disaggregation in Serverless Computing | | |
 | Evolving a Multi-Population Evolutionary-QAOA on Distributed QPUs | | |
 | SWARM: Scientific Workflow Applications on Resilient Metasystem | | |
 | Benchmarking and Modeling of Producer-Consumer Data Movement Performance in Scientific Workflows | | |
 | Cluster-Based Methodology for Characterizing the Performance of Portable Applications | | |
 | DART-X: Software Infrastructure for Prototyping In-Memory Data Transfer Between Ensemble Data Assimilation and Coupled Earth Systems Models | | |
 | HPC Fastpass: Visualizing Descriptive and Predictive HPC Queue Time Data | | |
 | MIGnificient: Fast, Isolated, and GPU-Enabled Serverless Functions | | |
 | iSeeMore: Design of a 256-Node RPi Cluster to Visualize LLM Computation Through Light and Movement for Mass Audiences | | |
 | 5G in Practice: Measuring Emerging Wireless Technology in Rural Iowa for Edge Devices in Distributed Computation Workloads | | |
 | Trackable Agent-Based Evolution Models at Wafer Scale | | |
 | New Semi-Implicit Electrostatic Particle-In-Cell Method to Extend Scope of the Exascale WarpX Code | | |
 | KVSort: Drastically Improving LLM Inference Performance via KV Cache Compression | | |
 | Web-Based Simulator of Superscalar RISC-V Processors | | |
 | Quantum Volume Benchmarking Simulators on HPC Systems | | |
 | Simplifying HPC Resource Selection: A Tool for Optimizing Execution Time and Cost on Azure | | |
 | Exploring DAOS as a Burst Buffer for a 100 Gbps DAQ Real-Time Streaming System | | |
 | Persistent and Partitioned MPI for Stencil Communication | | |
 | An Adaptive Kernel Execution for Dynamic Applications on GPUs Using CUDA Graphs | | |
 | Mind Your Manners: Detoxifying Language Models via Attention Head Intervention | | |
 | Bringing It HOME: Analyzing Contention Hotspots Across the Memory Hierarchy with Low Overhead | | |
 | Active Learning for Metamaterial Optimization on HPC and QC Integrated Systems | | |
 | An Error-Bounded Lossy Compression Method with Bit-Adaptive Quantization for Particle Data | | |
 | Creating Code LLMs for HPC: It’s LLMs All the Way Down | | |
 | Towards Scalable Quantum Simulation on Wafer-Scale Engines | | |
 | I/O Characterization of Heterogeneous Workflows | | |
 | RAPIDS: Reduced API Data-Transfer Specifications | | |
 | A Zero-Copy Storage with Metadata-Driven File Management Using Persistent Memory | | |
 | Exploring Software-Defined Networking for Routing in Dragonfly Topology | | |
 | Cluster Management with Containerization on Switches | | |
 | Benchmarking Quantum-Inspired Optimization Platforms and Tools on an HPC Cluster | | |
 | Development of TEZip in PyTorch: Integrating New Prediction Models into an Existing Compression Framework | | |
 | Improvement of Bridges-2 Resource Utilization Through User Optimization | | |
 | Profiling and Bottleneck Identification for Large Language Model Optimizations | | |
 | A Survey-Based Evaluation of the Efficacy of a Girls Who Code Club at the University of Southern Indiana | | |
 | Improving SpGEMM Performance Through Reordering and Cluster-Wise Computation | | |
 | Prompt Phrase Ordering Using Large Language Models in HPC: Evaluating Prompt Sensitivity | | |
 | Proposal for a Parallel Automatic Tuning Using d-Spline According to the Operating State of the Computer System | | |
 | Assessing Matrix Multiplication Performance with Fully Homomorphic Encryption | | |
 | Generating Coupled Cluster Code for Modern Distributed Memory Tensor Software | | |
 | Formal Approaches to Characterize Emerging Arithmetic Realizations | | |
 | Integrating HPCToolkit with Tools for Automated Analysis | | |
 | Performance of LAMMPS-SNAP in Different Runtime Environments | | |
 | Algorithmic Patterns from Computational Biology for Proxy Application Development and Co-Design | | |
 | Performance of N10 Benchmarks with Different BLAS Implementations | | |
 | Eve: Less Memory, Same Might | | |
 | Lagrangian Particle-Tracking in GPU-Enabled Extreme Scale Turbulence Simulations | | |
 | Scalable Motif Counting on Large-Scale Dynamic Graphs | | |
 | Empowering Scientific Datasets with Large Language Models | | |
 | DisCostiC: Simulating MPI Applications Without Executing Code | | |
 | Identifying Regions of Non-Determinism in HPC Simulations Through Event Graph Alignment | | |
 | Fault Tolerance in Krylov Subspace Methods | | |
 | Prototype Development and Testing of a Smart Buoy System for Coastal and Marine Ecosystems Using IBIS | | |
 | AI-Based Scalable Analytics for Improving Performance and Resilience of HPC Systems | | |
 | Large Genomic Language Models: Towards Their Hyperparameter Optimization | | |
 | Poseidon: A Source-to-Source Translator for Holistic HPC Optimization of Ocean Models on Regular Grids | | |
| Algorithmic and Optimization Techniques for Graph Applications in Heterogeneous Systems at Scale | | |
| Efficient, Scalable, Robust Neuromorphic High Performance Computing | | |
| Going Beyond the Chicken and Egg Situation with Modern MPI Features | | |
| Effects of Lossy Compression Data on Machine Learning Models | | |
| Scalable Planning Platform for Orchestration of Autonomous Systems Across Edge-Cloud Continuum | | |
| Toward Performance & Portability & Productivity in Parallel Programming | | |
| Efficient Large Dynamic Graph Analysis on Emerging Storage Technology | | |
| Enhancing HPC I/O Performance: Leveraging Runtime and Offline I/O Optimization Frameworks | | |
| Q-NFSO: Exploring Quantum Applications, Noise Management, Fault Injection, Resource Scheduling and Optimization in the NISQ Era | | |
| Data Layout Optimizations for Tensor Applications | | |
| Designing Efficient Data Reduction Approaches for Multi-Resolution Simulations on HPC Systems | | |
| FFT-Based Spherical Harmonics and Radial Transforms on GPU | | |
| High-Performance Computing Resilience Analysis Using Large Language Models | | |
| Supporting End Users in Implementing Quantum Computing Applications | | |
| Accelerating Communications in High-Performance Scientific Workflows | | |
| Accelerating HPC Workflow Results and Performance Reproducibility Analytics | | |