Event Actions
Abstract:
The efficiency and latency of ML accelerators are limited by the memory bottleneck problem: moving inputs, model data, and intermediate results back and forth between memory and a few large on-chip computational units is expensive. Analog in-memory computing may relax the problem but is beset by the challenges of analog design such as variability, limited reconfigurability, and compute imprecision. Stochastic computing is a radical alternative approach that represents numbers, not in fixed- or floating-point form, but instead as binary random streams. This “stochastic" representation allows great simplification of basic digital computing hardware allowing massive parallelization thereby reducing data movement with attendant latency and energy improvements. This talk will summarize our recent efforts in demonstrating substantial EDP improvements over fixed point implementations in 65nm/14nm CMOS while achieving comparable inference accuracy in example ML applications. Circuit and micro-architectural techniques and training methods will be described.