Close

Session

Event TypePosters
TimeWednesday, 20 November 202410am - 5pm EST
LocationB302-B305
Registration Categories
TP
XO/EX
Presentations
MIGnificient: Fast, Isolated, and GPU-Enabled Serverless Functions
Identifying Regions of Non-Determinism in HPC Simulations Through Event Graph Alignment
Exploration of Super-Resolution Techniques for Image Compression
Stalls and Memory Analysis on Fujitsu A64FX and NVIDIA Grace
Mind Your Manners: Detoxifying Language Models via Attention Head Intervention
HARVEST-2.0: High-Performance Vision Framework for End-to-End Preprocessing, Training, Inference, and Visualization
GNN-RL: An Intelligent HPC Resource Scheduler
CoVA: Compiler for Versatile Architectures
Trusted Platform Provisioning for the OpenCHAMI Cluster Management Stack
RAPIDS: Reduced API Data-Transfer Specifications
Generating Coupled Cluster Code for Modern Distributed Memory Tensor Software
Efficient Approaches to Analyzing Large Dynamic Networks
Enhancing Performance Reproducibility on HPC Workflows
Communication Hiding for Matrix-Free Finite Element Operators of a Complex PDE: Nonlinear Stokes Flow of Earth’s Mantle
Optimal Client Selection Algorithms for Federated Learning
Performance of LAMMPS-SNAP in Different Runtime Environments
AI-Based Scalable Analytics for Improving Performance and Resilience of HPC Systems
Generalizing ExaDigiT Datacenter Digital Twin Framework for Multiple Architectures
Comparing Cache Utilization Trends for Regional Scientific Caches with Transfer Learning Models
5G in Practice: Measuring Emerging Wireless Technology in Rural Iowa for Edge Devices in Distributed Computation Workloads
Prompt Phrase Ordering Using Large Language Models in HPC: Evaluating Prompt Sensitivity
Performance of N10 Benchmarks with Different BLAS Implementations
Simplifying HPC Resource Selection: A Tool for Optimizing Execution Time and Cost on Azure
Improvement of Bridges-2 Resource Utilization Through User Optimization
Poseidon: A Source-to-Source Translator for Holistic HPC Optimization of Ocean Models on Regular Grids
Improving Polyhedral-Based Optimizations with Dynamic Coordinate Descent
Cluster-Based Methodology for Characterizing the Performance of Portable Applications
Benchmarking Quantum-Inspired Optimization Platforms and Tools on an HPC Cluster
Proposal for a Parallel Automatic Tuning Using d-Spline According to the Operating State of the Computer System
A Sparse Approach for Translation-Based Training of Knowledge Graph Embeddings
Development of TEZip in PyTorch: Integrating New Prediction Models into an Existing Compression Framework
Formal Approaches to Characterize Emerging Arithmetic Realizations
Scalable Low-Latency Hardware Function Chaining with Chain Control Circuit
Active Learning for Metamaterial Optimization on HPC and QC Integrated Systems
Quantum Volume Benchmarking Simulators on HPC Systems
Enhancing HPC Resource Management to Integrate Quantum Workflows
Improving the Performance of Proof-of-Space in Blockchain Systems
Fault-Tolerant Numerical Iterative Algorithms at Scale
An Adaptive Kernel Execution for Dynamic Applications on GPUs Using CUDA Graphs
Persistent and Partitioned MPI for Stencil Communication
FortranX: Harnessing Code Generation, Portability, and Heterogeneity in Fortran
KVSort: Drastically Improving LLM Inference Performance via KV Cache Compression
Scalable Motif Counting on Large-Scale Dynamic Graphs
Author
Evolving a Multi-Population Evolutionary-QAOA on Distributed QPUs
Fault Tolerance in Krylov Subspace Methods
A Survey-Based Evaluation of the Efficacy of a Girls Who Code Club at the University of Southern Indiana
JACC: HPC Meta-Programming and Performance Portability Ecosystem for Julia Language
Performance Engineering and Mesoscale-Microscale Coupling for Wind Energy Simulations
I/O Characterization of Heterogeneous Workflows
Improving SpGEMM Performance Through Reordering and Cluster-Wise Computation
Bringing It HOME: Analyzing Contention Hotspots Across the Memory Hierarchy with Low Overhead
Algorithmic Patterns from Computational Biology for Proxy Application Development and Co-Design
A Comparison Study of Open Source LLMs for HPC Ticket Answering​
Assessing the Impact of Real-Time Traffic Updates on Traffic Flow: A High-Performance Computing Perspective on Scalability and Demand
Exploring Software-Defined Networking for Routing in Dragonfly Topology
Computational Radiation Hydrodynamics with FleCSI
Scalable Performance and Accuracy Analysis for Distributed and Extreme-Scale Systems
Hardware-Independent Sampling Library for CPUs and (Multi-)GPUs: hws
Design of Reliable and Efficient Syscall Hooking Library for a Parallel File System
Benchmarking and Modeling of Producer-Consumer Data Movement Performance in Scientific Workflows
Exploring DAOS as a Burst Buffer for a 100 Gbps DAQ Real-Time Streaming System
Lagrangian Particle-Tracking in GPU-Enabled Extreme Scale Turbulence Simulations
Machine Learning Applications for Early-Stage Ovarian Cancer Diagnosis
Towards Scalable Quantum Simulation on Wafer-Scale Engines
PINE: Efficient Yet Effective Piecewise Linear Trees
Web-Based Simulator of Superscalar RISC-V Processors
SanQus: Staleness and Quantization-Aware Full-Graph Decentralized Training in GNNs
Exploiting Data Compression and Low Precision for Exascale Fusion Turbulence Simulations
Parallelization of the Finite Element-Based Mesh Warping Algorithm Using Hybrid Parallel Programming
A Zero-Copy Storage with Metadata-Driven File Management Using Persistent Memory
Establishing Best Practices for Applying Inline Compressed Arrays to Improve Performance in HPC
QDD: Multi-Node Implementation of Decision Diagram-Based Quantum Circuit Simulator with Ring Communication and Auto SWAP Insertion
The P3 Explorer: Exploring the Performance, Portability, and Productivity Wilderness
HPC Fastpass: Visualizing Descriptive and Predictive HPC Queue Time Data
Uncover the Overhead and Resource Usage for Handling KV Cache Overflow in LLM Inference
Empowering Scientific Datasets with Large Language Models
Interactive and Tool-Agnostic ML-Driven Workflow for Automated HPC Performance Modeling
Breaking the Barriers to Effective Supercomputing: Web Dashboard for Job Accounting and Performance Metrics
Performance of Inline Compression with Software Caching for Reducing the Memory Footprint in pySDC
PipeInfer: Accelerating LLM Inference Using Asynchronous Pipelined Speculation​
Eve: Less Memory, Same Might
NetCDFaster: A Geospatial Cyberinfrastructure Enhancing Multi-Dimensional Scientific Dataset Access and Visualization Through Machine Learning Optimization
Memory Disaggregation in Serverless Computing
iSeeMore: Design of a 256-Node RPi Cluster to Visualize LLM Computation Through Light and Movement for Mass Audiences
Power Patterns: Understanding the Energy Dynamics of I/O for Parallel Storage Configurations
Profiling and Bottleneck Identification for Large Language Model Optimizations
Creating Code LLMs for HPC: It’s LLMs All the Way Down
Neural Network Optimization and Performance Analysis for Real-Time Object Detection at the Edge
Predicting Dataset Popularity for Improved Distributed Content Caching in High Energy Physics
A Novel Gradient Compression Design with Ultra-High Compression Ratio for Communication-Efficient Federated Learning
Cluster Management with Containerization on Switches
Seesaw: Elastic Scaling for Task-Based Distributed Programs
Integrating HPCToolkit with Tools for Automated Analysis
Increasing the Efficiency of Neutral Atoms by Reducing Qubit Waste from Measurement-Related Ejections
Analyzing Alltoall Algorithms with SST
Large-Scale Randomized Program Generation with Large Language Models​
Turbocharging Dask Apps: Accelerating Data Flow with ProxyStore
ORCHA: A Performance Portability System for Flash-X — A Multiphysics Application Software
DART-X: Software Infrastructure for Prototyping In-Memory Data Transfer Between Ensemble Data Assimilation and Coupled Earth Systems Models
SWARM: Scientific Workflow Applications on Resilient Metasystem
Meteorologic Real-Time Extreme Learning Machine for Pressure Prediction
Profiling the Impact of Hyper-Threading on Pagosa Hydrocodes
Trackable Agent-Based Evolution Models at Wafer Scale
Profiling Communication Overhead in 3D Parallel Pretrain of Large Language Models
Prototype Development and Testing of a Smart Buoy System for Coastal and Marine Ecosystems Using IBIS
Exploring Fine-Grained Memory Analysis for PIM Offloading
GPU Compression (for Scientific Data) Done Right
JUmPER: Performance Data Monitoring, Instrumentation and Visualization for Jupyter Notebooks
Large Genomic Language Models: Towards Their Hyperparameter Optimization
DisCostiC: Simulating MPI Applications Without Executing Code
An Error-Bounded Lossy Compression Method with Bit-Adaptive Quantization for Particle Data
PerfFlowAspect: A User-Friendly Performance Tool for Scientific Workflows
MatRIS: Performance Portable Math Library of IRIS Runtime for Multi-Device Heterogeneity
Edge-Enabled Real-Time Data Processing in Power-Efficient Weather Stations Using IBIS
PcMINER: Mining Performance-Related Commits at Scale
Parallel Verification of Neural Networks Applied to Medical Imaging
Enhancing the Traditional Benchmarks for Parallel Computing Education
FAS-GED: GPU-Accelerated Graph Edit Distance Computation
QFw: A Quantum Framework for Large-Scale HPC Ecosystems
Scored Non-Deterministic Finite Automata Processor for Sequence Alignment
On the Accuracy and Efficiency of Approximate Triangle Counting via Randomized Numerical Linear Algebra
Assessing Matrix Multiplication Performance with Fully Homomorphic Encryption
Characterizing the Performance of the GENE-X Code for Gyrokinetic Turbulence Simulations
LM-Offload: Performance Model-Guided Generative Inference of Large Language Models with Parallelism Control
An Accurate and Scalable Multidimensional Quantum Solver for Partial Differential Equations
New Semi-Implicit Electrostatic Particle-In-Cell Method to Extend Scope of the Exascale WarpX Code
Toward Performance & Portability & Productivity in Parallel Programming
Author
High-Performance Computing Resilience Analysis Using Large Language Models
Q-NFSO: Exploring Quantum Applications, Noise Management, Fault Injection, Resource Scheduling and Optimization in the NISQ Era
Effects of Lossy Compression Data on Machine Learning Models
Enhancing HPC I/O Performance: Leveraging Runtime and Offline I/O Optimization Frameworks
Efficient, Scalable, Robust Neuromorphic High Performance Computing
FFT-Based Spherical Harmonics and Radial Transforms on GPU
Accelerating Communications in High-Performance Scientific Workflows
Scalable Planning Platform for Orchestration of Autonomous Systems Across Edge-Cloud Continuum
Author
Efficient Large Dynamic Graph Analysis on Emerging Storage Technology
Going Beyond the Chicken and Egg Situation with Modern MPI Features
Data Layout Optimizations for Tensor Applications
Algorithmic and Optimization Techniques for Graph Applications in Heterogeneous Systems at Scale
Accelerating HPC Workflow Results and Performance Reproducibility Analytics
Supporting End Users in Implementing Quantum Computing Applications
Designing Efficient Data Reduction Approaches for Multi-Resolution Simulations on HPC Systems