Fitting GPT into Edge Devices, Why and How

The NeuPro-M NPU IP are a family of AI processor engines designed for embedded applications. The pre-processing software to map a model to these engines can reduce effective model size by up to 20:1, delivering a total LLM size including Retro compression to around a billion parameters, comfortably within the capacity of a modern edge AI engine such as the NeuPro-M IP.