Vitis AI Integration

Vitis AI is Xilinx’s development stack for hardware-accelerated AI inference on Xilinx platforms, including both edge devices and Alveo cards. It consists of optimized IP, tools, libraries, models, and example designs. It is designed with high efficiency and ease of use in mind, unleashing the full potential of AI acceleration on Xilinx FPGA and ACAP.

The current Vitis AI flow inside TVM enables acceleration of Neural Network model inference on edge and cloud with the Zynq Ultrascale+ MPSoc, Alveo and Versal platforms. The identifiers for the supported edge and cloud Deep Learning Processor Units (DPU’s) are:

Target Board	DPU ID	TVM Target ID
ZCU104	DPUCZDX8G	DPUCZDX8G-zcu104
ZCU102	DPUCZDX8G	DPUCZDX8G-zcu102
Kria KV260	DPUCZDX8G	DPUCZDX8G-kv260
VCK190	DPUCVDX8G	DPUCVDX8G
VCK5000	DPUCVDX8H	DPUCVDX8H
U200	DPUCADF8H	DPUCADF8H
U250	DPUCADF8H	DPUCADF8H
U50	DPUCAHX8H / DPUCAHX8L	DPUCAHX8H-u50 / DPUCAHX8L
U280	DPUCAHX8H / DPUCAHX8L	DPUCAHX8H-u280 / DPUCAHX8L

For more information about the DPU identifiers see following table:

DPU	Application	HW Platform	Quantization Method	Quantization Bitwidth	Design Target
Deep Learning Processing Unit	C: CNN R: RNN	AD: Alveo DDR AH: Alveo HBM VD: Versal DDR with AIE & PL ZD: Zynq DDR	X: DECENT I: Integer threshold F: Float threshold R: RNN	4: 4-bit 8: 8-bit 16: 16-bit M: Mixed Precision	G: General purpose H: High throughput L: Low latency C: Cost optimized

On this page you will find information on how to setup TVM with Vitis AI on different platforms (Zynq, Alveo, Versal) and on how to get started with Compiling a Model and executing on different platforms: Inference.

System Requirements

The Vitis AI System Requirements page lists the system requirements for running docker containers as well as doing executing on Alveo cards. For edge devices (e.g. Zynq), deploying models requires a host machine for compiling models using the TVM with Vitis AI flow, and an edge device for running the compiled models. The host system requirements are the same as specified in the link above.

Setup instructions

This section provide the instructions for setting up the TVM with Vitis AI flow for both cloud and edge. TVM with Vitis AI support is provided through a docker container. The provided scripts and Dockerfile compiles TVM and Vitis AI into a single image.

Clone TVM repo

git clone --recursive https://github.com/apache/tvm.git
cd tvm

Build and start the TVM - Vitis AI docker container.

./docker/build.sh demo_vitis_ai bash
./docker/bash.sh tvm.demo_vitis_ai

# Setup inside container
conda activate vitis-ai-tensorflow

Build TVM inside the container with Vitis AI (inside tvm directory)

mkdir build
cp cmake/config.cmake build
cd build
echo set\(USE_LLVM ON\) >> config.cmake
echo set\(USE_VITIS_AI ON\) >> config.cmake
cmake ..
make -j$(nproc)

Install TVM
```
cd ../python
pip3 install -e . --user
```

Inside this docker container you can now compile models for both cloud and edge targets. To run on cloud Alveo or Versal VCK5000 cards inside the docker container, please follow the Alveo respectively Versal VCK5000 setup instructions. To setup your Zynq or Versal VCK190 evaluation board for inference, please follow the Zynq respectively Versal VCK190 instructions.

Alveo Setup

Check out following page for setup information: Alveo Setup.

After setup, you can select the right DPU inside the docker container in the following way:

cd /workspace
git clone --branch v1.4 --single-branch --recursive https://github.com/Xilinx/Vitis-AI.git
cd Vitis-AI/setup/alveo
source setup.sh [DPU-IDENTIFIER]

The DPU identifier for this can be found in the second column of the DPU Targets table at the top of this page.

Versal VCK5000 Setup

Check out following page for setup information: VCK5000 Setup.

After setup, you can select the right DPU inside the docker container in the following way:

cd /workspace
git clone --branch v1.4 --single-branch --recursive https://github.com/Xilinx/Vitis-AI.git
cd Vitis-AI/setup/vck5000
source setup.sh

Zynq Setup

For the Zynq target (DPUCZDX8G) the compilation stage will run inside the docker on a host machine. This doesn’t require any specific setup except for building the TVM - Vitis AI docker. For executing the model, the Zynq board will first have to be set up and more information on that can be found here.

Download the Petalinux image for your target:
- ZCU104
- ZCU102
- Kria KV260
Use Etcher software to burn the image file onto the SD card.
Insert the SD card with the image into the destination board.
Plug in the power and boot the board using the serial port to operate on the system.
Set up the IP information of the board using the serial port. For more details on step 1 to 5, please refer to Setting Up The Evaluation Board.
Create 4GB of swap space on the board

fallocate -l 4G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo "/swapfile swap swap defaults 0 0" >> /etc/fstab

Install hdf5 dependency (will take between 30 min and 1 hour to finish)

cd /tmp && \
  wget https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.10/hdf5-1.10.7/src/hdf5-1.10.7.tar.gz && \
  tar -zxvf hdf5-1.10.7.tar.gz && \
  cd hdf5-1.10.7 && \
  ./configure --prefix=/usr && \
  make -j$(nproc) && \
  make install && \
  cd /tmp && rm -rf hdf5-1.10.7*

Install Python dependencies

pip3 install Cython==0.29.23 h5py==2.10.0 pillow

Install PyXIR

git clone --recursive --branch rel-v0.3.1 --single-branch https://github.com/Xilinx/pyxir.git
cd pyxir
sudo python3 setup.py install --use_vart_edge_dpu

Build and install TVM with Vitis AI

git clone --recursive https://github.com/apache/tvm
cd tvm
mkdir build
cp cmake/config.cmake build
cd build
echo set\(USE_LLVM OFF\) >> config.cmake
echo set\(USE_VITIS_AI ON\) >> config.cmake
cmake ..
make tvm_runtime -j$(nproc)
cd ../python
pip3 install --no-deps  -e .

Check whether the setup was successful in the Python shell:

python3 -c 'import pyxir; import tvm'

Note

You might see a warning about the ‘cpu-tf’ runtime not being found. This warning is expected on the board and can be ignored.

Versal VCK190 Setup

For the Versal VCK190 setup, please follow the instructions for Zynq Setup, but now use the VCK190 image in step 1. The other steps are the same.

Compiling a Model

The TVM with Vitis AI flow contains two stages: Compilation and Inference. During the compilation a user can choose a model to compile for the cloud or edge target devices that are currently supported. Once a model is compiled, the generated files can be used to run the model on a the specified target device during the Inference stage. Currently, the TVM with Vitis AI flow supported a selected number of Xilinx data center and edge devices.

In this section we walk through the typical flow for compiling models with Vitis AI inside TVM.

Imports

Make sure to import PyXIR and the DPU target (import pyxir.contrib.target.DPUCADF8H for DPUCADF8H):

import pyxir
import pyxir.contrib.target.DPUCADF8H

import tvm
import tvm.relay as relay
from tvm.contrib.target import vitis_ai
from tvm.contrib import utils, graph_executor
from tvm.relay.op.contrib.vitis_ai import partition_for_vitis_ai

Declare the Target

tvm_target = 'llvm'
dpu_target = 'DPUCADF8H' # options: 'DPUCADF8H', 'DPUCAHX8H-u50', 'DPUCAHX8H-u280', 'DPUCAHX8L', 'DPUCVDX8H', 'DPUCZDX8G-zcu104', 'DPUCZDX8G-zcu102', 'DPUCZDX8G-kv260'

The TVM with Vitis AI flow currently supports the DPU targets listed in the table at the top of this page. Once the appropriate targets are defined, we invoke the TVM compiler to build the graph for the specified target.

Import the Model

Example code to import an MXNet model:

mod, params = relay.frontend.from_mxnet(block, input_shape)

Partition the Model

After importing the model, we utilize the Relay API to annotate the Relay expression for the provided DPU target and partition the graph.

mod = partition_for_vitis_ai(mod, params, dpu=dpu_target)

Build the Model

The partitioned model is passed to the TVM compiler to generate the runtime libraries for the TVM Runtime.

export_rt_mod_file = os.path.join(os.getcwd(), 'vitis_ai.rtmod')
build_options = {
    'dpu': dpu_target,
    'export_runtime_module': export_rt_mod_file
}
with tvm.transform.PassContext(opt_level=3, config={'relay.ext.vitis_ai.options': build_options}):
    lib = relay.build(mod, tvm_target, params=params)

Quantize the Model

Usually, to be able to accelerate inference of Neural Network models with Vitis AI DPU accelerators, those models need to quantized upfront. In TVM - Vitis AI flow, we make use of on-the-fly quantization to remove this additional preprocessing step. In this flow, one doesn’t need to quantize his/her model upfront but can make use of the typical inference execution calls (module.run) to quantize the model on-the-fly using the first N inputs that are provided (see more information below). This will set up and calibrate the Vitis-AI DPU and from that point onwards inference will be accelerated for all next inputs. Note that the edge flow deviates slightly from the explained flow in that inference won’t be accelerated after the first N inputs but the model will have been quantized and compiled and can be moved to the edge device for deployment. Please check out the Running on Zynq section below for more information.

module = graph_executor.GraphModule(lib["default"](tvm.cpu()))

# First N (default = 128) inputs are used for quantization calibration and will
# be executed on the CPU
# This config can be changed by setting the 'PX_QUANT_SIZE' (e.g. export PX_QUANT_SIZE=64)
for i in range(128):
   module.set_input(input_name, inputs[i])
   module.run()

By default, the number of images used for quantization is set to 128. You could change the number of images used for On-The-Fly Quantization with the PX_QUANT_SIZE environment variable. For example, execute the following line in the terminal before calling the compilation script to reduce the quantization calibration dataset to eight images. This can be used for quick testing.

export PX_QUANT_SIZE=8

Lastly, we store the compiled output from the TVM compiler on disk for running the model on the target device. This happens as follows for cloud DPU’s (Alveo, VCK5000):

lib_path = "deploy_lib.so"
lib.export_library(lib_path)

For edge targets (Zynq, VCK190) we have to rebuild for aarch64. To do this we first have to normally export the module to also serialize the Vitis AI runtime module (vitis_ai.rtmod). We will load this runtime module again afterwards to rebuild and export for aarch64.

temp = utils.tempdir()
lib.export_library(temp.relpath("tvm_lib.so"))

# Build and export lib for aarch64 target
tvm_target = tvm.target.arm_cpu('ultra96')
lib_kwargs = {
   'fcompile': contrib.cc.create_shared,
   'cc': "/usr/aarch64-linux-gnu/bin/ld"
}

build_options = {
    'load_runtime_module': export_rt_mod_file
}
with tvm.transform.PassContext(opt_level=3, config={'relay.ext.vitis_ai.options': build_options}):
     lib_edge = relay.build(mod, tvm_target, params=params)

lib_edge.export_library('deploy_lib_edge.so', **lib_kwargs)

This concludes the tutorial to compile a model using TVM with Vitis AI. For instructions on how to run a compiled model please refer to the next section.

Inference

The TVM with Vitis AI flow contains two stages: Compilation and Inference. During the compilation a user can choose to compile a model for any of the target devices that are currently supported. Once a model is compiled, the generated files can be used to run the model on a target device during the Inference stage.

Check out the Running on Alveo and VCK5000 and Running on Zynq and VCK190 sections for doing inference on cloud accelerator cards respectively edge boards.

Running on Alveo and VCK5000

After having followed the steps in the Compiling a Model section, you can continue running on new inputs inside the docker for accelerated inference:

module.set_input(input_name, inputs[i])
module.run()

Alternatively, you can load the exported runtime module (the deploy_lib.so exported in Compiling a Model):

import pyxir
import tvm
from tvm.contrib import graph_executor

dev = tvm.cpu()

# input_name = ...
# input_data = ...

# load the module into memory
lib = tvm.runtime.load_module("deploy_lib.so")

module = graph_executor.GraphModule(lib["default"](dev))
module.set_input(input_name, input_data)
module.run()

Running on Zynq and VCK190

Before proceeding, please follow the Zynq or Versal VCK190 setup instructions.

Prior to running a model on the board, you need to compile the model for your target evaluation board and transfer the compiled model on to the board. Please refer to the Compiling a Model section for information on how to compile a model.

Afterwards, you will have to transfer the compiled model (deploy_lib_edge.so) to the evaluation board. Then, on the board you can use the typical “load_module” and “module.run” APIs to execute. For this, please make sure to run the script as root (execute su in terminal to log into root).

Note

Note also that you shouldn’t import the PyXIR DPU targets in the run script (import pyxir.contrib.target.DPUCZDX8G).

import pyxir
import tvm
from tvm.contrib import graph_executor

dev = tvm.cpu()

# input_name = ...
# input_data = ...

# load the module into memory
lib = tvm.runtime.load_module("deploy_lib_edge.so")

module = graph_executor.GraphModule(lib["default"](dev))
module.set_input(input_name, input_data)
module.run()