BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250626T233527Z
LOCATION:B302-B305
DTSTART;TZID=America/New_York:20241120T100000
DTEND;TZID=America/New_York:20241120T170000
UID:submissions.supercomputing.org_SC24_sess533_post256@linklings.com
SUMMARY:PipeInfer: Accelerating LLM Inference Using Asynchronous Pipelined
  Speculation​
DESCRIPTION:Branden Butler and Sixing Yu (Iowa State University), Arya Maz
 aheri (Technical University Darmstadt), and Ali Jannesari (Iowa State Univ
 ersity)\n\nInference of large language models (LLMs) across computer clust
 ers has become a focal point of research in recent times, with many accele
 ration techniques taking inspiration from CPU speculative execution. These
  techniques reduce memory bandwidth requirements, but also increase\nlaten
 cy per inference run, requiring high speculation acceptance rates to impro
 ve performance. As a remedy, we propose PipeInfer, a pipelined speculative
  acceleration technique to reduce inter-token latency and improve system u
 tilization while also improving tolerance to low speculation acceptance ra
 tes and low-bandwidth interconnects. PipeInfer exhibits up to a 2.15× impr
 ovement in generation speed over standard speculative inference. PipeInfer
  achieves its improvement through Continuous Asynchronous Speculation and 
 Early Inference Cancellation; the former improves latency and generation s
 peed by running single-token inference simultaneously with several specula
 tive runs, while the latter improves speed and latency by skipping the com
 putation of invalidated runs.\n\nRegistration Category: Tech Program Reg P
 ass, Exhibits Reg Pass\n\nSession Chairs: Ayesha Afzal (Friedrich-Alexande
 r University, Erlangen-Nuremberg; Erlangen National High Performance Compu
 ting Center); Sally Ellingson (University of Kentucky); and Alan Sussman (
 University of Maryland)\n\n
END:VEVENT
END:VCALENDAR
