Packing Neural Networks into End-User Client Devices: How Number Representation Shrinks the Footprint

Most of today’s neural networks can only run on high-performance servers. There’s a big push to change this and simplify network processing to the point where the algorithms can run on end-user client devices. One approach is to eliminate complexity by replacing floating-point representation with fixed-point representation. We take a different approach, and recommend a mix of the two, so as to reduce memory and power requirements while retaining accuracy.


Arm Whitepaper

Download your copy of the report to learn more.