The CFU Playground consists of an opinionated configuration of hardware, gateware and software. This page contains a short overview of all of it. If this is your first time working with FPGAs or machine learning, then we advise skimming this page briefly before moving on to Crash Course on Everything.

Overview showing hardware, gateware and software layers



We originally supported only the Arty A7 35T board, and the docs below have not yet been updated. We now support many other boards – check the Wiki!

The hardware is based around the Xilinx Artix 7 35T FPGA. The Artix 7 has:

  • 33,000 logic cells, which is sufficient for the soft CPU and quite a bit more

  • 90 DSP ‘slices’, which can be used as multiply-accumulate units

  • 50x 36Kbit block RAMs

We use the Arty A7 development board. As well as the Artix 7, it has:

  • 256MB external DDR DRAM

  • a convenient set of switches, buttons and LEDs

  • a USB serial connection for a host computer

  • LGPL

Gateware and CFU

We use LiteX to build a standard SoC (System-on-Chip) that runs on the FPGA.

Overview showing hardware, gateware and software layersThe SoC has:
  • a VexRiscV soft CPU with a Custom Function Unit (CFU) extension

  • access to the 256MB of DDR DRAM

  • access to the LEDs

  • a USB-UART to provide a serial terminal to the host

The CPU has an op code allocated to the CFU. When the CPU executes that op code, it passes the contents of two registers to the CFU, waits for a response and puts the result back into a third register. A notable feature of this architecture is that the CFU does not have direct access to memory. It relies on the CPU to move data back and forth.

CFUs may be written in Verilog or with any tool that outputs Verilog. We prefer using Amaranth, because it has good support for composition, reuse and unit testing. The CFU Playground includes Amaranth library components to support CFU development.

We are currently using Vivado to synthesize the FPGA bitstream, and intend to move to Symbiflow in the near future.


Programming the LiteX SoC is much the same as programming a traditional microcontroller. The VexRiscV CPU is a 32 bit RISCV CPU and we use the GCC C/C++ toolchain. The supplied software includes:

  • Tensorflow Lite for Microcontrollers to do ML model inferencing

  • Models to accelerate, and test data for those models

  • Library functions to profile and benchmark

  • Hooks to allow customisation of Tensorflow and any other part of the system.

Testing and simulation

We use Renode, an open source simulation framework by Antmicro, enabling you to run the software without having the hardware available.

Renode simulates the whole SoC and is able to use co-simulation with verilated CFU models to test the end-to-end flow of your application.

Consult the Developing CFU-Playground with Renode documentation for additional details.