Vitis-AI Integration

Vitis-AI is Xilinx’s development stack for hardware-accelerated AI inference on Xilinx platforms, including both edge devices and Alveo cards. It consists of optimized IP, tools, libraries, models, and example designs. It is designed with high efficiency and ease of use in mind, unleashing the full potential of AI acceleration on Xilinx FPGA and ACAP.

The current Vitis-AI Byoc flow inside TVM enables acceleration of Neural Network model inference on edge and cloud. The identifiers for the supported edge and cloud Deep Learning Processor Units (DPU’s) are DPUCZDX8G respectively DPUCADX8G. DPUCZDX8G and DPUCADX8G are hardware accelerators for convolutional neural networks (CNN’s) on top of the Xilinx Zynq Ultrascale+ MPSoc respectively Alveo (U200/U250) platforms. For more information about the DPU identifiers see the section on DPU naming information.

On this page you will find information on how to build TVM with Vitis-AI and on how to get started with an example.

DPU naming information

DPU

Application

HW Platform

Quantization Method

Quantization Bitwidth

Design Target

Deep Learning Processing Unit

C: CNN R: RNN

AD: Alveo DDR AH: Alveo HBM VD: Versal DDR with AIE & PL ZD: Zynq DDR

X: DECENT I: Integer threshold F: Float threshold R: RNN

4: 4-bit 8: 8-bit 16: 16-bit M: Mixed Precision

G: General purpose H: High throughput L: Low latency C: Cost optimized

Build instructions

This section lists the instructions for building TVM with Vitis-AI for both cloud and edge.

Cloud (DPUCADX8G)

For Vitis-AI acceleration in the cloud TVM has to be built on top of the Xilinx Alveo platform.

System requirements

The following table lists system requirements for running docker containers as well as Alveo cards.

Component

Requirement

Motherboard

PCI Express 3.0-compliant with one dual-width x16 slot

System Power Supply

225W

Operating System

Ubuntu 16.04, 18.04

CentOS 7.4, 7.5

RHEL 7.4, 7.5

CPU

Intel i3/i5/i7/i9/Xeon 64-bit CPU

GPU (Optional to accelerate quantization)

NVIDIA GPU with a compute capability > 3.0

CUDA Driver (Optional to accelerate quantization)

nvidia-410

FPGA

Xilinx Alveo U200 or U250

Docker Version

19.03.1

Hardware setup and docker build

  1. Clone the Vitis AI repository:

    git clone --recurse-submodules https://github.com/Xilinx/Vitis-AI
    
  2. Install Docker, and add the user to the docker group. Link the user to docker installation instructions from the following docker’s website:

  3. Download the latest Vitis AI Docker with the following command. This container runs on CPU.

    docker pull xilinx/vitis-ai:latest
    

    To accelerate the quantization, you can optionally use the Vitis-AI GPU docker image. Use the below commands to build the Vitis-AI GPU docker container:

    cd Vitis-AI/docker
    ./docker_build_gpu.sh
    
  4. Set up Vitis AI to target Alveo cards. To target Alveo cards with Vitis AI for machine learning workloads, you must install the following software components:

    • Xilinx Runtime (XRT)

    • Alveo Deployment Shells (DSAs)

    • Xilinx Resource Manager (XRM) (xbutler)

    • Xilinx Overlaybins (Accelerators to Dynamically Load - binary programming files)

    While it is possible to install all of these software components individually, a script has been provided to automatically install them at once. To do so:

    • Run the following commands:

      cd Vitis-AI/alveo/packages
      sudo su
      ./install.sh
      
    • Power cycle the system.

  5. Clone tvm repo and pyxir repo

    git clone --recursive https://github.com/apache/tvm.git
    git clone --recursive https://github.com/Xilinx/pyxir.git
    
  6. Build and start the tvm runtime Vitis-AI Docker Container.

    ./tvm/docker/build.sh demo_vitis_ai bash
    ./tvm/docker/bash.sh tvm.demo_vitis_ai
    
    #Setup inside container
    source /opt/xilinx/xrt/setup.sh
    . $VAI_ROOT/conda/etc/profile.d/conda.sh
    conda activate vitis-ai-tensorflow
    
  7. Install PyXIR

    cd pyxir
    python3 setup.py install --use_vai_rt_dpucadx8g --user
    
  8. Build TVM inside the container with Vitis-AI

    cd tvm
    mkdir build
    cp cmake/config.cmake build
    cd build
    echo set\(USE_LLVM ON\) >> config.cmake
    echo set\(USE_VITIS_AI ON\) >> config.cmake
    cmake ..
    make -j$(nproc)
    
  9. Install TVM

    cd tvm/python
    pip3 install -e . --user
    

Edge (DPUCZDX8G)

For edge deployment we make use of two systems referred to as host and edge. The host system is responsible for quantization and compilation of the neural network model in a first offline step. Afterwards, the model will de deployed on the edge system.

Host requirements

The following table lists system requirements for running the TVM - Vitis-AI docker container.

Component

Requirement

Operating System

Ubuntu 16.04, 18.04

CentOS 7.4, 7.5

RHEL 7.4, 7.5

CPU

Intel i3/i5/i7/i9/Xeon 64-bit CPU

GPU (Optional to accelerate quantization)

NVIDIA GPU with a compute capability > 3.0

CUDA Driver (Optional to accelerate quantization)

nvidia-410

FPGA

Not necessary on host

Docker Version

19.03.1

Host setup and docker build

  1. Clone tvm repo

    git clone --recursive https://github.com/apache/tvm.git
    
  2. Build and start the tvm runtime Vitis-AI Docker Container.

    cd tvm
    ./tvm/docker/build.sh demo_vitis_ai bash
    ./tvm/docker/bash.sh tvm.demo_vitis_ai
    
    #Setup inside container
    . $VAI_ROOT/conda/etc/profile.d/conda.sh
    conda activate vitis-ai-tensorflow
    
  3. Install PyXIR

    git clone --recursive https://github.com/Xilinx/pyxir.git
    cd pyxir
    python3 setup.py install --user
    
  4. Build TVM inside the container with Vitis-AI.

    cd tvm
    mkdir build
    cp cmake/config.cmake build
    cd build
    echo set\(USE_LLVM ON\) >> config.cmake
    echo set\(USE_VITIS_AI ON\) >> config.cmake
    cmake ..
    make -j$(nproc)
    
  5. Install TVM

    cd tvm/python
    pip3 install -e . --user
    

Edge requirements

The DPUCZDX8G can be deployed on the Zynq Ultrascale+ MPSoc platform. The following development boards can be used out-of-the-box:

Target board

TVM identifier

Info

Ultra96

DPUCZDX8G-ultra96

https://www.xilinx.com/products/boards-and-kits/1-vad4rl.html

ZCU104

DPUCZDX8G-zcu104

https://www.xilinx.com/products/boards-and-kits/zcu104.html

ZCU102

DPUCZDX8G-zcu102

https://www.xilinx.com/products/boards-and-kits/ek-u1-zcu102-g.html

Edge hardware setup

Note

This section provides instructions for setting up with the Pynq platform but Petalinux based flows are also supported.

  1. Download the Pynq v2.5 image for your target (use Z1 or Z2 for Ultra96 target depending on board version) Link to image: https://github.com/Xilinx/PYNQ/releases/tag/v2.5

  2. Follow Pynq instructions for setting up the board: pynq setup

  3. After connecting to the board, make sure to run as root. Execute su

  4. Set up DPU on Pynq by following the steps here: DPU Pynq setup

  5. Run the following command to download the DPU bitstream:

    python3 -c 'from pynq_dpu import DpuOverlay ; overlay = DpuOverlay("dpu.bit")'
    
  6. Check whether the DPU kernel is alive:

    dexplorer -w
    

Edge TVM setup

Note

When working on Petalinux instead of Pynq, the following steps might take more manual work (e.g building hdf5 from source). Also, TVM has a scipy dependency which you then might have to build from source or circumvent. We don’t depend on scipy in our flow.

Building TVM depends on the Xilinx PyXIR package. PyXIR acts as an interface between TVM and Vitis-AI tools.

  1. First install the PyXIR h5py and pydot dependencies:

    apt-get install libhdf5-dev
    pip3 install pydot h5py
    
  2. Install PyXIR

    git clone --recursive https://github.com/Xilinx/pyxir.git
    cd pyxir
    sudo python3 setup.py install --use_vai_rt_dpuczdx8g
    
  3. Build TVM with Vitis-AI

    git clone --recursive https://github.com/apache/tvm
    cd tvm
    mkdir build
    cp cmake/config.cmake build
    cd build
    echo set\(USE_VITIS_AI ON\) >> config.cmake
    cmake ..
    make
    
  4. Install TVM

    cd tvm/python
    pip3 install -e . --user
    
  5. Check whether the setup was successful in the Python shell:

    python3 -c 'import pyxir; import tvm'
    

Getting started

This section shows how to use TVM with Vitis-AI. For this it’s important to understand that neural network models are quantized for Vitis-AI execution in fixed point arithmetic. The approach we take here is to quantize on-the-fly using the first N inputs as explained in the next section.

On-the-fly quantization

Usually, to be able to accelerate inference of Neural Network models with Vitis-AI DPU accelerators, those models need to quantized upfront. In TVM - Vitis-AI flow, we make use of on-the-fly quantization to remove this additional preprocessing step. In this flow, one doesn’t need to quantize his/her model upfront but can make use of the typical inference execution calls (module.run) to quantize the model on-the-fly using the first N inputs that are provided (see more information below). This will set up and calibrate the Vitis-AI DPU and from that point onwards inference will be accelerated for all next inputs. Note that the edge flow deviates slightly from the explained flow in that inference won’t be accelerated after the first N inputs but the model will have been quantized and compiled and can be moved to the edge device for deployment. Please check out the edge usage instructions below for more information.

Config/Settings

A couple of environment variables can be used to customize the Vitis-AI Byoc flow.

Environment Variable

Default if unset

Explanation

PX_QUANT_SIZE

128

The number of inputs that will be used for quantization (necessary for Vitis-AI acceleration)

PX_BUILD_DIR

Use the on-the-fly quantization flow

Loads the quantization and compilation information from the provided build directory and immediately starts Vitis-AI hardware acceleration. This configuration can be used if the model has been executed before using on-the-fly quantization during which the quantization and comilation information was cached in a build directory.

Cloud usage

This section shows how to accelerate a convolutional neural network model in TVM with Vitis-AI on the cloud.

To be able to target the Vitis-AI cloud DPUCADX8G target we first have to import the target in PyXIR. This PyXIR package is the interface being used by TVM to integrate with the Vitis-AI stack. Additionaly, import the typical TVM and Relay modules and the Vitis-AI contrib module inside TVM.

import pyxir
import pyxir.contrib.target.DPUCADX8G

import tvm
import tvm.relay as relay
from tvm.contrib.target import vitis_ai
from tvm.contrib import util, graph_runtime
from tvm.relay.build_module import bind_params_by_name
from tvm.relay.op.contrib.vitis_ai import annotation

After importing a convolutional neural network model using the usual Relay API’s, annotate the Relay expression for the given Vitis-AI DPU target and partition the graph.

mod["main"] = bind_params_by_name(mod["main"], params)
mod = annotation(mod, params, target)
mod = relay.transform.MergeCompilerRegions()(mod)
mod = relay.transform.PartitionGraph()(mod)

Now, we can build the TVM runtime library for executing the model. The TVM target is ‘llvm’ as the operations that can’t be handled by the DPU are executed on the CPU. The Vitis-AI target is DPUCADX8G as we are targeting the cloud DPU and this target is passed as a config to the TVM build call.

tvm_target = 'llvm'
target='DPUCADX8G'

with tvm.transform.PassContext(opt_level=3, config= {'relay.ext.vitis_ai.options.target': target}):
   lib = relay.build(mod, tvm_target, params=params)

As one more step before we can accelerate a model with Vitis-AI in TVM we have to quantize and compile the model for execution on the DPU. We make use of on-the-fly quantization for this. Using this method one doesn’t need to quantize their model upfront and can make use of the typical inference execution calls (module.run) to calibrate the model on-the-fly using the first N inputs that are provided. After the first N iterations, computations will be accelerated on the DPU. So now we will feed N inputs to the TVM runtime module. Note that these first N inputs will take a substantial amount of time.

module = graph_runtime.GraphModule(lib["default"](tvm.cpu()))

# First N (default = 128) inputs are used for quantization calibration and will
# be executed on the CPU
# This config can be changed by setting the 'PX_QUANT_SIZE' (e.g. export PX_QUANT_SIZE=64)
for i in range(128):
   module.set_input(input_name, inputs[i])
   module.run()

Afterwards, inference will be accelerated on the DPU.

module.set_input(name, data)
module.run()

To save and load the built module, one can use the typical TVM API’s:

lib_path = "deploy_lib.so"
lib.export_library(lib_path)

Load the module from compiled files and run inference

# load the module into memory
loaded_lib = tvm.runtime.load_module(lib_path)

module = graph_runtime.GraphModule(lib["default"](tvm.cpu()))
module.set_input(name, data)
module.run()

Edge usage

This section shows how to accelerate a convolutional neural network model in TVM with Vitis-AI at the edge. The first couple of steps will have to be run on the host machine and take care of quantization and compilation for deployment at the edge.

Host steps

To be able to target the Vitis-AI cloud DPUCZDX8G target we first have to import the target in PyXIR. This PyXIR package is the interface being used by TVM to integrate with the Vitis-AI stack. Additionaly, import the typical TVM and Relay modules and the Vitis-AI contrib module inside TVM.

import pyxir
import pyxir.contrib.target.DPUCZDX8G

import tvm
import tvm.relay as relay
from tvm.contrib.target import vitis_ai
from tvm.contrib import util, graph_runtime
from tvm.relay.build_module import bind_params_by_name
from tvm.relay.op.contrib.vitis_ai import annotation

After importing a convolutional neural network model using the usual Relay API’s, annotate the Relay expression for the given Vitis-AI DPU target and partition the graph.

mod["main"] = bind_params_by_name(mod["main"], params)
mod = annotation(mod, params, target)
mod = relay.transform.MergeCompilerRegions()(mod)
mod = relay.transform.PartitionGraph()(mod)

Now, we can build the TVM runtime library for executing the model. The TVM target is ‘llvm’ as the operations that can’t be handled by the DPU are executed on the CPU. At this point that means the CPU on the host machine. The Vitis-AI target is DPUCZDX8G-zcu104 as we are targeting the edge DPU on the ZCU104 board and this target is passed as a config to the TVM build call. Note that different identifiers can be passed for different targets, see edge targets info. Additionally, we provide the ‘export_runtime_module’ config that points to a file to which we can export the Vitis-AI runtime module. We have to do this because we will first be compiling and quantizing the model on the host machine before building the model for edge deployment. As you will see later on, the exported runtime module will be passed to the edge build so that the Vitis-AI runtime module can be included.

from tvm.contrib import util

temp = util.tempdir()

tvm_target = 'llvm'
target='DPUCZDX8G-zcu104'
export_rt_mod_file = temp.relpath("vitis_ai.rtmod")

with tvm.transform.PassContext(opt_level=3, config= {'relay.ext.vitis_ai.options.target': target,
                                                     'relay.ext.vitis_ai.options.export_runtime_module': export_rt_mod_file}):
   lib = relay.build(mod, tvm_target, params=params)

We will quantize and compile the model for execution on the DPU using on-the-fly quantization on the host machine. This makes use of TVM inference calls (module.run) to quantize the model on the host with the first N inputs.

module = graph_runtime.GraphModule(lib["default"](tvm.cpu()))

# First N (default = 128) inputs are used for quantization calibration and will
# be executed on the CPU
# This config can be changed by setting the 'PX_QUANT_SIZE' (e.g. export PX_QUANT_SIZE=64)
for i in range(128):
   module.set_input(input_name, inputs[i])
   module.run()

Save the TVM lib module so that the Vitis-AI runtime module will also be exported (to the ‘export_runtime_module’ path we previously passed as a config).

from tvm.contrib import util

temp = util.tempdir()
lib.export_library(temp.relpath("tvm_lib.so"))

After quantizing and compiling the model for Vitis-AI acceleration using the first N inputs we can build the model for execution on the ARM edge device. Here we pass the previously exported Vitis-AI runtime module so it can be included in the TVM build.

# Export lib for aarch64 target
tvm_target = tvm.target.arm_cpu('ultra96')
lib_kwargs = {
     'fcompile': contrib.cc.create_shared,
     'cc': "/usr/aarch64-linux-gnu/bin/ld"
}

with tvm.transform.PassContext(opt_level=3,
                               config={'relay.ext.vitis_ai.options.load_runtime_module': export_rt_mod_file}):
     lib_arm = relay.build(mod, tvm_target, params=params)

lib_dpuv2.export_library('tvm_dpu_arm.so', **lib_kwargs)

Now, move the TVM build files (tvm_dpu_arm.json, tvm_dpu_arm.so, tvm_dpu_arm.params) to the edge device. For information on setting up the edge device check out the edge setup section.

Edge steps

After setting up TVM with Vitis-AI on the edge device, you can now load the TVM runtime module into memory and feed inputs for inference.

ctx = tvm.cpu()

# load the module into memory
lib = tvm.runtime.load_module("tvm_dpu_arm.so")

module = graph_runtime.GraphModule(lib["default"](tvm.cpu()))
module.set_input(name, data)
module.run()