The CFU Playground: Accelerate ML models on FPGAs¶
“CFU” stands for Custom Function Unit: accelerator hardware that is tightly coupled into the pipeline of a CPU core, to add new custom function instructions that complement the CPU’s standard functions (such as arithmetic/logic operations).
The CFU Playground is a collection of software, gateware and hardware configured to make it easy to:
Run ML models
Benchmark and profile performance
Make incremental improvements
In software by modifying source code
In gateware with a CFU
Measure the results of changes
ML acceleration on microcontroller-class hardware is a new area, and one that, due to the expense of building ASICs, is currently dominated by hardware engineers. In order to encourage software engineers to join in the innovation, the CFU-Playground aims to make experimentation as simple, fast and fun as possible.
As well as being a useful tool for accelerating ML inferencing, the CFU Playground is a relatively gentle introduction to using FPGAs for computation.
If you find that you need help or that anything is not working as you expect, please raise an issue and we’ll do our best to point you in the right direction.
Disclaimer: This is not an officially supported Google project. Support and/or new releases may be limited.
Learning and Using the CFU Playground¶
Begin with the Overview, which explains the various hardware, software and gateware components that make up the CFU Playground.
Setup Guide gives detailed instructions for setting up an environment.
Crash Course on Everything explains the basics of FPGAs, Verilog, Amaranth, RISCV, Custom Function Units and Tensorflow Lite for Microcontrollers.
The Step-by-Step Guide to Building an ML Accelerator will guide you through creating your first accelerator.
Developing CFU-Playground with Renode can tell you more about simulating your project in Renode.
Site Index¶
- Overview
- Setup Guide
- Step 1: Acquire an Arty A7-35T or other supported board
- Step 2: Clone the CFU-Playground Repository
- Step 3: Run the setup script
- Step 4: Install Toolchain
- Option 4a: Install Conda package for SymbiFlow (for Xilinx)
- Option 4b: Install Conda packages for Lattice FPGAs
- Option 4c: Use already-installed Yosys, Nextpnr, and other required tools
- Option 4d: Install/Use Vivado
- Step 5: Install RISC-V toolchain
- Step 6: Test Run
- Crash Course on Everything
- The Step-by-Step Guide to Building an ML Accelerator
- Details and Use Cases of the CPU <-> CFU interface
- Developing CFU-Playground with Renode
- Vivado HLS WebPack Installation
- Writing Documentation
- MobileNetV2 “First” Accelerator
- Objective
- Background
- Overview
- A Staged Approach
- Investigate
- C-only Optimisations
- Replacing Post Processing
- Moving Filter Values and Input Values to the CFU
- Use buffers with MACC instruction
- Examining MACC efficiency and overhead
- Connect Post Processing to Accumulator
- Calculate Output Channel by Word
- Stream Outputs
- Customizing the CPU itself
- FCCM 2021 Demo Night Material