Guide – How to: design an accelerator in RTL

Latest update: 2021-07-14

This guide illustrates how to create and integrate an accelerator designed in Verilog, SystemVerilog or VHDL with ESP.

Note: This RTL accelerator design flow is a preliminary version. Like other accelerator design flow in ESP, it includes the automated integration of the accelerator and the generation of Linux device driver and skeletons of the bare-metal and Linux test applications. However, it generates an empty top module of the accelerator, and the job of implementing the accelerator is left to the designer.

Note: Make sure to complete the prequisite tutorials before getting started with this one. This tutorial assumes that accelerator designers are familiar with the ESP infrastructure and know how to run basic Make targets to create a simple instance of ESP, integrating just a single core.

1. Accelerator design
2. Accelerator integration

1. Accelerator design

Accelerator skeleton

ESP provides an interactive script that generates a skeleton of the accelerator and its software test applications. It also generates the accelerator device driver. In the preliminary version of this flow, the accelerator skeleton simply consists in an empty Verilog top level of the accelerator, which has the correct interface to allow for an automated integration in ESP. You can modify the RTL language used for the skeleton as long as the interface and functionality remain the same, at the moment ESP support Verilog, SystemVerilog and VHDL.

Even if the accelerator skeleton is empty, this flow leverages the same interactive script used by other accelerator design flows in ESP.

# Move to the ESP root folder
cd <esp>
# Run the accelerator initialization script and respond as follows
./tools/accgen/accgen.sh
=== Initializing ESP accelerator template ===

  * Enter accelerator name [dummy]: example
  * Select design flow (Stratus HLS, Vivado HLS, hls4ml, RTL) [S]: R
  * Enter ESP path [/home/davide/Repos/esp/esp-rtlflow]:
  * Enter unique accelerator id as three hex digits [04A]: 075
  * Enter accelerator registers
    - register 0 name [size]: reg1
    - register 0 default value [1]: 8
    - register 0 max value [8]: 8
    - register 1 name []: reg2
    - register 1 default value [1]: 8
    - register 1 max value [8]: 8
    - register 2 name []: reg3
    - register 2 default value [1]: 8
    - register 2 max value [8]: 8
    - register 3 name []:
  * Configure PLM size and create skeleton for load and store:
    - Enter data bit-width (8, 16, 32, 64) [32]:
    - Enter input data size in terms of configuration registers (e.g. 2 * reg2}) [reg2]:
      data_in_size_max = 8
    - Enter output data size in terms of configuration registers (e.g. 2 * reg2) [reg2]:
      data_out_size_max = 8
    - Enter an integer chunking factor (use 1 if you want PLM size equal to data size) [1]:
      Input PLM has 8 32-bits words
      Output PLM has 8 32-bits words
    - Enter number of input data to be processed in batch (can be function of configuration registers) [1]:
      batching_factor_max = 1
    - Is output stored in place? [N]:

=== Generated accelerator skeleton for example ===

The detailed description of the entries of this configuration script is in the guide for the SystemC accelerator flow with Stratus HLS. In this case, however, the generated accelerator is empty. Thus, the default and max values of the configuration registers, as well as the questions that follow, are only used for creating the skeleton of the test applications. The names of the registers, instead, are used in various places, including the interface of the empty accelerator and the accelerator XML file used by ESP to generate the accelerator tile socket.

Executing the initialization script with the above parameters, generates the accelerator empty skeleton, located at the path accelerators/rtl/example_rtl/hw.

In addition, the accelerator’s device driver, bare metal application and user-space linux application are generated at the path accelerators/rtl/example_rtl/sw.

# Complete list of generated files
<esp>/accelerators/rtl/example_rtl/
├── hw
│   ├── example.xml              # Accelerator description and register list
│   ├── hls
│   │   └── Makefile -> ../../../common/hls/Makefile
│   └── src
│       ├── example_rtl_basic_dma32
│       │   └── example_rtl_basic_dma32.v  # Empty top level of the accelerator (32bit SoC)
│       └── example_rtl_basic_dma64
│           └── example_rtl_basic_dma64.v  # Empty top level of the accelerator (64bit SoC)
└── sw
    ├── baremetal                # Bare metal test application
    │   ├── example.c
    │   └── Makefile
    └── linux
        ├── app                  # Linux test application
        │   ├── cfg.h
        │   ├── example.c
        │   └── Makefile
        ├── driver               # Linux device driver
        │   ├── example_rtl.c
        │   ├── Kbuild
        │   └── Makefile
        └── include
            └── example_rtl.h

Accelerator behavior implementation

For this preliminary RTL accelerator flow, this step consists in fully implementing the accelerator, starting from the empty skeleton generated at the path accelerators/rtl/example_rtl/hw/src/.

The interface of the empty accelerator must not be modified, because it is the interface that ESP expects and it allows for an automated integration. In addition to the interface definition, the empty accelerator has a few assign statements. These can be removed as they are only there to assign a constant value to some of the outputs, and to raise the acc_done signal as soon as the accelerator receives the configuration (conf_done). These assign statement allow you to run the bare-metal and Linux test applications to completion even with the empty skeleton, where otherwise they would get stuck because of undefined outputs and the acc_done signal never being raised.

To design the accelerator body, you should comply to the ESP accelerator interface specifications. The following guide describes it in detail:

ESP Accelerator Specifications

Testbench implementation

The current implementation of the flow does not generate a testbench to test the accelerator in isolation. The remainder of this guide shows how to test the accelerator as part of a full SoC.

Installation

Choose the FPGA board or ASIC technology that you want to target for your SoC. The design paths in this tutorial refer to the Xilinx VC707 evaluation FPGA board, but all instructions are valid for any of the supported boards or ASIC technologies.

After creating the example_rtl accelerator, ESP automatically discovers it in the library of components and generates a set of make targets for it. Here are the instructions to install the accelerator.

# Move to the Xilinx VC707 working folder
cd <esp>/socs/xilinx-vc707-xc7vx485t

# Install (i.e. copy) the accelerator RTL to the `tech/virtex7/acc` folder
make example_rtl-hls

Only after this step you will be able to instantiate the accelerator in an SoC with the ESP SoC configuration GUI.

2. Accelerator integration

It is recommended to try the following steps before editing the accelerator and software automatically generated in the previous steps. Since this flow generated an empty accelerator, the full-system RTL simulation and the FPGA prototyping tests will not perform meaningful computation. However, the accelerator will still receive the configuration and raise the acc_done signal to notify the CPU. Testing this can confirm the correct integration of the accelerator, and it’s a good baseline on top of which to start the accelerator implementation.

Bare-metal and user applications implementation

In this tutorial we select the RISC-V Ariane core and use the corresponding paths to the software source code. Please note, however, that all instructions are valid for the other CPUs available in ESP (e.g. Leon3, Ibex).

Both baremetal and Linux test applications for the EXAMPLE_RTL accelerator are generated at the path <esp>/accelerators/rtl/example_rtl/sw. To complete them, you need to apply the same edits to both baremetal and Linux applications. The changes consist in initializing inputs and golden outputs. More details on this step are described in the guide for the SystemC accelerator flow with Stratus HLS.

SoC configuration

The final steps of the tutorial coincide with those presented in the tutorial about the SystemC accelerator flow with Stratus HLS. We recommend you review those steps if you are not familiar with ESP. More in general, the SoC design flow is the same regardless of which design flow was used for generating or integrating an accelerator.

# Move to the Xilinx VC707 working folder
cd <esp>/socs/xilinx-vc707-xc7vx485t

Follow the “Debug link configuration” instructions from the “How to: design a single-core SoC” guide. Then configure the SoC using the ESP configuration GUI.

# Run the ESP configuration GUI
make esp-xconfig

Select the processor that you prefer in the “CPU Architecture” frame and enable/disable the caches from the “Cache configuration” frame as you please. In the case of this guide we use Ariane and no caches. Select a 2x2 layout and set 1 memory tile, 1 processor tile, 1 I/O tile and 1 EXAMPLE_RTL tile. The implementation for EXAMPLE_RTL will default to basic_dma64, because Ariane-based ESP SoCs are 64-bit systems.

RTL simulation

Users can run a full-system RTL simulation of the EXAMPLE_RTL accelerator driven by the baremetal application running on the processor tile and invoking the accelerator.

# Compile baremetal application
make example_rtl-baremetal

# Modelsim
TEST_PROGRAM=./soft-build/<cpu>/baremetal/example_rtl.exe make sim[-gui]

# Incisive
TEST_PROGRAM=./soft-build/<cpu>/baremetal/example_rtl.exe make ncsim[-gui]

<cpu> corresponds to ariane because we selected the Ariane core in the “SoC Configuration” step.

FPGA prototyping

Follow the “FPGA prototyping” instructions from the “How to: design a single-core SoC” guide.

The only difference is that, just like for the RTL simulation, you need to specify the TEST_PROGRAM variable when launching the bare-metal test on FPGA:

TEST_PROGRAM=./soft-build/<cpu>/baremetal/example_rtl.exe make fpga-run

For what concerns the execution of the Linux application, after logging into Linux from the ESP Linux terminal run the example_rtl test application:

cd /applications/test/
./example_rtl.exe