Close

Session

Event TypePosters
TimeThursday, 21 November 202410am - 5pm EST
LocationB302-B305
Registration Categories
TP
XO/EX
Presentations
Prompt Phrase Ordering Using Large Language Models in HPC: Evaluating Prompt Sensitivity
Trusted Platform Provisioning for the OpenCHAMI Cluster Management Stack
Lagrangian Particle-Tracking in GPU-Enabled Extreme Scale Turbulence Simulations
Machine Learning Applications for Early-Stage Ovarian Cancer Diagnosis
Exploration of Super-Resolution Techniques for Image Compression
Proposal for a Parallel Automatic Tuning Using d-Spline According to the Operating State of the Computer System
Predicting Dataset Popularity for Improved Distributed Content Caching in High Energy Physics
A Comparison Study of Open Source LLMs for HPC Ticket Answering​
Generating Coupled Cluster Code for Modern Distributed Memory Tensor Software
Quantum Volume Benchmarking Simulators on HPC Systems
Scalable Performance and Accuracy Analysis for Distributed and Extreme-Scale Systems
Trackable Agent-Based Evolution Models at Wafer Scale
Meteorologic Real-Time Extreme Learning Machine for Pressure Prediction
Assessing Matrix Multiplication Performance with Fully Homomorphic Encryption
Large Genomic Language Models: Towards Their Hyperparameter Optimization
5G in Practice: Measuring Emerging Wireless Technology in Rural Iowa for Edge Devices in Distributed Computation Workloads
Breaking the Barriers to Effective Supercomputing: Web Dashboard for Job Accounting and Performance Metrics
PipeInfer: Accelerating LLM Inference Using Asynchronous Pipelined Speculation​
Scalable Motif Counting on Large-Scale Dynamic Graphs
Author
Profiling and Bottleneck Identification for Large Language Model Optimizations
Establishing Best Practices for Applying Inline Compressed Arrays to Improve Performance in HPC
Performance of LAMMPS-SNAP in Different Runtime Environments
Exploring Software-Defined Networking for Routing in Dragonfly Topology
Cluster-Based Methodology for Characterizing the Performance of Portable Applications
Hardware-Independent Sampling Library for CPUs and (Multi-)GPUs: hws
iSeeMore: Design of a 256-Node RPi Cluster to Visualize LLM Computation Through Light and Movement for Mass Audiences
FAS-GED: GPU-Accelerated Graph Edit Distance Computation
PcMINER: Mining Performance-Related Commits at Scale
Edge-Enabled Real-Time Data Processing in Power-Efficient Weather Stations Using IBIS
An Accurate and Scalable Multidimensional Quantum Solver for Partial Differential Equations
Exploiting Data Compression and Low Precision for Exascale Fusion Turbulence Simulations
Exploring DAOS as a Burst Buffer for a 100 Gbps DAQ Real-Time Streaming System
QDD: Multi-Node Implementation of Decision Diagram-Based Quantum Circuit Simulator with Ring Communication and Auto SWAP Insertion
Towards Scalable Quantum Simulation on Wafer-Scale Engines
Improving SpGEMM Performance Through Reordering and Cluster-Wise Computation
Design of Reliable and Efficient Syscall Hooking Library for a Parallel File System
Interactive and Tool-Agnostic ML-Driven Workflow for Automated HPC Performance Modeling
Parallelization of the Finite Element-Based Mesh Warping Algorithm Using Hybrid Parallel Programming
PINE: Efficient Yet Effective Piecewise Linear Trees
Poseidon: A Source-to-Source Translator for Holistic HPC Optimization of Ocean Models on Regular Grids
SanQus: Staleness and Quantization-Aware Full-Graph Decentralized Training in GNNs
LM-Offload: Performance Model-Guided Generative Inference of Large Language Models with Parallelism Control
An Error-Bounded Lossy Compression Method with Bit-Adaptive Quantization for Particle Data
Web-Based Simulator of Superscalar RISC-V Processors
A Survey-Based Evaluation of the Efficacy of a Girls Who Code Club at the University of Southern Indiana
Integrating HPCToolkit with Tools for Automated Analysis
GPU Compression (for Scientific Data) Done Right
AI-Based Scalable Analytics for Improving Performance and Resilience of HPC Systems
Profiling the Impact of Hyper-Threading on Pagosa Hydrocodes
Optimal Client Selection Algorithms for Federated Learning
MIGnificient: Fast, Isolated, and GPU-Enabled Serverless Functions
PerfFlowAspect: A User-Friendly Performance Tool for Scientific Workflows
Increasing the Efficiency of Neutral Atoms by Reducing Qubit Waste from Measurement-Related Ejections
Algorithmic Patterns from Computational Biology for Proxy Application Development and Co-Design
DART-X: Software Infrastructure for Prototyping In-Memory Data Transfer Between Ensemble Data Assimilation and Coupled Earth Systems Models
Scored Non-Deterministic Finite Automata Processor for Sequence Alignment
Parallel Verification of Neural Networks Applied to Medical Imaging
Benchmarking and Modeling of Producer-Consumer Data Movement Performance in Scientific Workflows
Efficient Approaches to Analyzing Large Dynamic Networks
Cluster Management with Containerization on Switches
JUmPER: Performance Data Monitoring, Instrumentation and Visualization for Jupyter Notebooks
QFw: A Quantum Framework for Large-Scale HPC Ecosystems
Seesaw: Elastic Scaling for Task-Based Distributed Programs
Improving Polyhedral-Based Optimizations with Dynamic Coordinate Descent
MatRIS: Performance Portable Math Library of IRIS Runtime for Multi-Device Heterogeneity
CoVA: Compiler for Versatile Architectures
I/O Characterization of Heterogeneous Workflows
The P3 Explorer: Exploring the Performance, Portability, and Productivity Wilderness
KVSort: Drastically Improving LLM Inference Performance via KV Cache Compression
New Semi-Implicit Electrostatic Particle-In-Cell Method to Extend Scope of the Exascale WarpX Code
Turbocharging Dask Apps: Accelerating Data Flow with ProxyStore
Enhancing Performance Reproducibility on HPC Workflows
HPC Fastpass: Visualizing Descriptive and Predictive HPC Queue Time Data
Uncover the Overhead and Resource Usage for Handling KV Cache Overflow in LLM Inference
Scalable Low-Latency Hardware Function Chaining with Chain Control Circuit
SWARM: Scientific Workflow Applications on Resilient Metasystem
RAPIDS: Reduced API Data-Transfer Specifications
JACC: HPC Meta-Programming and Performance Portability Ecosystem for Julia Language
GNN-RL: An Intelligent HPC Resource Scheduler
A Novel Gradient Compression Design with Ultra-High Compression Ratio for Communication-Efficient Federated Learning
Mind Your Manners: Detoxifying Language Models via Attention Head Intervention
Exploring Fine-Grained Memory Analysis for PIM Offloading
Characterizing the Performance of the GENE-X Code for Gyrokinetic Turbulence Simulations
NetCDFaster: A Geospatial Cyberinfrastructure Enhancing Multi-Dimensional Scientific Dataset Access and Visualization Through Machine Learning Optimization
Benchmarking Quantum-Inspired Optimization Platforms and Tools on an HPC Cluster
Improving the Performance of Proof-of-Space in Blockchain Systems
Assessing the Impact of Real-Time Traffic Updates on Traffic Flow: A High-Performance Computing Perspective on Scalability and Demand
Large-Scale Randomized Program Generation with Large Language Models​
HARVEST-2.0: High-Performance Vision Framework for End-to-End Preprocessing, Training, Inference, and Visualization
Prototype Development and Testing of a Smart Buoy System for Coastal and Marine Ecosystems Using IBIS
Formal Approaches to Characterize Emerging Arithmetic Realizations
Performance of Inline Compression with Software Caching for Reducing the Memory Footprint in pySDC
Memory Disaggregation in Serverless Computing
Bringing It HOME: Analyzing Contention Hotspots Across the Memory Hierarchy with Low Overhead
Eve: Less Memory, Same Might
Performance of N10 Benchmarks with Different BLAS Implementations
On the Accuracy and Efficiency of Approximate Triangle Counting via Randomized Numerical Linear Algebra
Enhancing HPC Resource Management to Integrate Quantum Workflows
Simplifying HPC Resource Selection: A Tool for Optimizing Execution Time and Cost on Azure
Stalls and Memory Analysis on Fujitsu A64FX and NVIDIA Grace
Identifying Regions of Non-Determinism in HPC Simulations Through Event Graph Alignment
Persistent and Partitioned MPI for Stencil Communication
Computational Radiation Hydrodynamics with FleCSI
Improvement of Bridges-2 Resource Utilization Through User Optimization
Enhancing the Traditional Benchmarks for Parallel Computing Education
Active Learning for Metamaterial Optimization on HPC and QC Integrated Systems
Empowering Scientific Datasets with Large Language Models
Generalizing ExaDigiT Datacenter Digital Twin Framework for Multiple Architectures
Profiling Communication Overhead in 3D Parallel Pretrain of Large Language Models
Fault-Tolerant Numerical Iterative Algorithms at Scale
Evolving a Multi-Population Evolutionary-QAOA on Distributed QPUs
Development of TEZip in PyTorch: Integrating New Prediction Models into an Existing Compression Framework
Power Patterns: Understanding the Energy Dynamics of I/O for Parallel Storage Configurations
A Sparse Approach for Translation-Based Training of Knowledge Graph Embeddings
Comparing Cache Utilization Trends for Regional Scientific Caches with Transfer Learning Models
An Adaptive Kernel Execution for Dynamic Applications on GPUs Using CUDA Graphs
Analyzing Alltoall Algorithms with SST
Fault Tolerance in Krylov Subspace Methods
Communication Hiding for Matrix-Free Finite Element Operators of a Complex PDE: Nonlinear Stokes Flow of Earth’s Mantle
A Zero-Copy Storage with Metadata-Driven File Management Using Persistent Memory
Neural Network Optimization and Performance Analysis for Real-Time Object Detection at the Edge
Performance Engineering and Mesoscale-Microscale Coupling for Wind Energy Simulations
FortranX: Harnessing Code Generation, Portability, and Heterogeneity in Fortran
Creating Code LLMs for HPC: It’s LLMs All the Way Down
ORCHA: A Performance Portability System for Flash-X — A Multiphysics Application Software
DisCostiC: Simulating MPI Applications Without Executing Code
High-Performance Computing Resilience Analysis Using Large Language Models
Effects of Lossy Compression Data on Machine Learning Models
Data Layout Optimizations for Tensor Applications
FFT-Based Spherical Harmonics and Radial Transforms on GPU
Going Beyond the Chicken and Egg Situation with Modern MPI Features
Accelerating Communications in High-Performance Scientific Workflows
Supporting End Users in Implementing Quantum Computing Applications
Accelerating HPC Workflow Results and Performance Reproducibility Analytics
Scalable Planning Platform for Orchestration of Autonomous Systems Across Edge-Cloud Continuum
Author
Toward Performance & Portability & Productivity in Parallel Programming
Author
Efficient Large Dynamic Graph Analysis on Emerging Storage Technology
Efficient, Scalable, Robust Neuromorphic High Performance Computing
Enhancing HPC I/O Performance: Leveraging Runtime and Offline I/O Optimization Frameworks
Q-NFSO: Exploring Quantum Applications, Noise Management, Fault Injection, Resource Scheduling and Optimization in the NISQ Era
Algorithmic and Optimization Techniques for Graph Applications in Heterogeneous Systems at Scale
Designing Efficient Data Reduction Approaches for Multi-Resolution Simulations on HPC Systems