Guide – How to: design an accelerator in RTL
Latest update: 2021-07-14
This guide illustrates how to create and integrate an accelerator designed in Verilog, SystemVerilog or VHDL with ESP.
Note: This RTL accelerator design flow is a preliminary version. Like other accelerator design flow in ESP, it includes the automated integration of the accelerator and the generation of Linux device driver and skeletons of the bare-metal and Linux test applications. However, it generates an empty top module of the accelerator, and the job of implementing the accelerator is left to the designer.
Note: Make sure to complete the prequisite tutorials before getting started with this one. This tutorial assumes that accelerator designers are familiar with the ESP infrastructure and know how to run basic Make targets to create a simple instance of ESP, integrating just a single core.
1. Accelerator design
Accelerator skeleton
ESP provides an interactive script that generates a skeleton of the accelerator and its software test applications. It also generates the accelerator device driver. In the preliminary version of this flow, the accelerator skeleton simply consists in an empty Verilog top level of the accelerator, which has the correct interface to allow for an automated integration in ESP. You can modify the RTL language used for the skeleton as long as the interface and functionality remain the same, at the moment ESP support Verilog, SystemVerilog and VHDL.
Even if the accelerator skeleton is empty, this flow leverages the same interactive script used by other accelerator design flows in ESP.
# Move to the ESP root folder
cd <esp>
# Run the accelerator initialization script and respond as follows
./tools/accgen/accgen.sh
=== Initializing ESP accelerator template ===
* Enter accelerator name [dummy]: example
* Select design flow (Stratus HLS, Vivado HLS, hls4ml, RTL) [S]: R
* Enter ESP path [/home/davide/Repos/esp/esp-rtlflow]:
* Enter unique accelerator id as three hex digits [04A]: 075
* Enter accelerator registers
- register 0 name [size]: reg1
- register 0 default value [1]: 8
- register 0 max value [8]: 8
- register 1 name []: reg2
- register 1 default value [1]: 8
- register 1 max value [8]: 8
- register 2 name []: reg3
- register 2 default value [1]: 8
- register 2 max value [8]: 8
- register 3 name []:
* Configure PLM size and create skeleton for load and store:
- Enter data bit-width (8, 16, 32, 64) [32]:
- Enter input data size in terms of configuration registers (e.g. 2 * reg2}) [reg2]:
data_in_size_max = 8
- Enter output data size in terms of configuration registers (e.g. 2 * reg2) [reg2]:
data_out_size_max = 8
- Enter an integer chunking factor (use 1 if you want PLM size equal to data size) [1]:
Input PLM has 8 32-bits words
Output PLM has 8 32-bits words
- Enter number of input data to be processed in batch (can be function of configuration registers) [1]:
batching_factor_max = 1
- Is output stored in place? [N]:
=== Generated accelerator skeleton for example ===
The detailed description of the entries of this configuration
script is in the guide for the SystemC accelerator flow with Stratus
HLS. In this case, however, the
generated accelerator is empty. Thus, the default and max values of
the configuration registers, as well as the questions that follow, are
only used for creating the skeleton of the test applications. The
names of the registers, instead, are used in various places, including
the interface of the empty accelerator and the accelerator XML file
used by ESP to generate the accelerator tile socket.
Executing the initialization script with the above parameters,
generates the accelerator empty skeleton, located at the path
accelerators/rtl/example_rtl/hw
.
In addition, the accelerator’s device driver, bare metal application
and user-space linux application are generated at the path
accelerators/rtl/example_rtl/sw
.
# Complete list of generated files
<esp>/accelerators/rtl/example_rtl/
├── hw
│ ├── example.xml # Accelerator description and register list
│ ├── hls
│ │ └── Makefile -> ../../../common/hls/Makefile
│ └── src
│ ├── example_rtl_basic_dma32
│ │ └── example_rtl_basic_dma32.v # Empty top level of the accelerator (32bit SoC)
│ └── example_rtl_basic_dma64
│ └── example_rtl_basic_dma64.v # Empty top level of the accelerator (64bit SoC)
└── sw
├── baremetal # Bare metal test application
│ ├── example.c
│ └── Makefile
└── linux
├── app # Linux test application
│ ├── cfg.h
│ ├── example.c
│ └── Makefile
├── driver # Linux device driver
│ ├── example_rtl.c
│ ├── Kbuild
│ └── Makefile
└── include
└── example_rtl.h
Accelerator behavior implementation
For this preliminary RTL accelerator flow, this step consists in fully
implementing the accelerator, starting from the empty skeleton
generated at the path accelerators/rtl/example_rtl/hw/src/
.
The interface of the empty accelerator must not be modified,
because it is the interface that ESP expects and it allows for an
automated integration. In addition to the interface definition, the
empty accelerator has a few assign
statements. These can be removed
as they are only there to assign a constant value to some of the
outputs, and to raise the acc_done
signal as soon as the accelerator
receives the configuration (conf_done
). These assign
statement
allow you to run the bare-metal and Linux test applications to
completion even with the empty skeleton, where otherwise they would
get stuck because of undefined outputs and the acc_done
signal never
being raised.
To design the accelerator body, you should comply to the ESP accelerator interface specifications. The following guide describes it in detail:
ESP Accelerator Specifications
Testbench implementation
The current implementation of the flow does not generate a testbench to test the accelerator in isolation. The remainder of this guide shows how to test the accelerator as part of a full SoC.
Installation
Choose the FPGA board or ASIC technology that you want to target for your SoC. The design paths in this tutorial refer to the Xilinx VC707 evaluation FPGA board, but all instructions are valid for any of the supported boards or ASIC technologies.
After creating the example_rtl
accelerator, ESP automatically
discovers it in the library of components and generates a set of
make targets for it. Here are the instructions to install the
accelerator.
# Move to the Xilinx VC707 working folder
cd <esp>/socs/xilinx-vc707-xc7vx485t
# Install (i.e. copy) the accelerator RTL to the `tech/virtex7/acc` folder
make example_rtl-hls
Only after this step you will be able to instantiate the accelerator
in an SoC with the ESP SoC configuration GUI.
2. Accelerator integration
It is recommended to try the following steps before editing the
accelerator and software automatically generated in the previous
steps. Since this flow generated an empty accelerator, the full-system
RTL simulation and the FPGA prototyping tests will not perform
meaningful computation. However, the accelerator will still receive
the configuration and raise the acc_done
signal to notify the
CPU. Testing this can confirm the correct integration of the
accelerator, and it’s a good baseline on top of which to start the
accelerator implementation.
Bare-metal and user applications implementation
In this tutorial we select the RISC-V Ariane core and use the corresponding paths to the software source code. Please note, however, that all instructions are valid for the other CPUs available in ESP (e.g. Leon3, Ibex).
Both baremetal and Linux test applications for the EXAMPLE_RTL
accelerator are generated at the path
<esp>/accelerators/rtl/example_rtl/sw
. To complete them, you need
to apply the same edits to both baremetal and Linux applications. The
changes consist in initializing inputs and golden outputs. More
details on this step are described in the guide for the SystemC
accelerator flow with Stratus
HLS.
SoC configuration
The final steps of the tutorial coincide with those presented in the tutorial about the SystemC accelerator flow with Stratus HLS. We recommend you review those steps if you are not familiar with ESP. More in general, the SoC design flow is the same regardless of which design flow was used for generating or integrating an accelerator.
# Move to the Xilinx VC707 working folder
cd <esp>/socs/xilinx-vc707-xc7vx485t
Follow the “Debug link configuration” instructions from the “How to:
design a single-core SoC” guide.
Then configure the SoC using the ESP configuration GUI.
# Run the ESP configuration GUI
make esp-xconfig
Select the processor that you prefer in the “CPU Architecture”
frame and enable/disable the caches from the “Cache configuration”
frame as you please. In the case of this guide we use Ariane and no
caches. Select a 2x2 layout and set 1 memory tile, 1 processor tile, 1
I/O tile and 1 EXAMPLE_RTL
tile. The implementation for
EXAMPLE_RTL
will default to basic_dma64, because Ariane-based ESP
SoCs are 64-bit systems.
RTL simulation
Users can run a full-system RTL simulation of the EXAMPLE_RTL
accelerator driven by the
baremetal application running on the processor tile and invoking the accelerator.
# Compile baremetal application
make example_rtl-baremetal
# Modelsim
TEST_PROGRAM=./soft-build/<cpu>/baremetal/example_rtl.exe make sim[-gui]
# Incisive
TEST_PROGRAM=./soft-build/<cpu>/baremetal/example_rtl.exe make ncsim[-gui]
<cpu>
corresponds to ariane
because we selected the Ariane core in the “SoC Configuration” step.
FPGA prototyping
Follow the “FPGA prototyping” instructions from the “How to: design a single-core SoC” guide.
The only difference is that, just like for the RTL simulation, you
need to specify the TEST_PROGRAM
variable when launching the
bare-metal test on FPGA:
TEST_PROGRAM=./soft-build/<cpu>/baremetal/example_rtl.exe make fpga-run
For what concerns the execution of the Linux application, after logging into
Linux from the ESP Linux terminal run the example_rtl
test application:
cd /applications/test/
./example_rtl.exe