P42: TRIP: An Ultra-Low Latency, TeraOps/s Reconfigurable
Inference Processor for Multi-Layer Perceptrons
SessionPoster Reception
Event Type
ACM Student Research Competition
Poster
Reception
TimeTuesday, November 14th5:15pm -
7pm
LocationFour Seasons Ballroom
DescriptionMulti-Layer Perceptron (MLP) is one of the most
commonly deployed Deep Neural Networks, representing 61%
of the workload in Google data-centers. MLP Inference, a
memory bound problem, typically has hard response time
deadlines and prefers latency over throughput. In our
work, we designed a TeraOps/s Reconfigurable Inference
Processor for MLPs (TRIP) on FPGAs that alleviates the
memory bottleneck by storing all application specific
weights on-chip. It can be deployed in multiple
configurations, including host-independent operation. We
have shown that TRIP achieves 60x better performance
than the current state-of-the-art Google Tensor
Processing Unit (TPU) for MLP Inference. It was
demonstrated on the cancer patient datasets used in the
Candle Exascale Computing Project (ECP).




