HLS Backend Example

TVM supports Xilinx FPGA board with SDAccel. Here is a tutorial for how to deploy TVM to AWS F1 FPGA instance.

Note

This feature is still experimental. We cannot use SDAccel to deploy an end to end neural networks for now.

We use two python scripts for this tutorial.

build.py - a script to synthesize FPGA bitstream.

import tvm
from tvm import te

tgt= tvm.target.Target("sdaccel", host="llvm")

n = te.var("n")
A = te.placeholder((n,), name='A')
B = te.placeholder((n,), name='B')
C = te.compute(A.shape, lambda i: A[i] + B[i], name="C")

s = te.create_schedule(C.op)
px, x = s[C].split(C.op.axis[0], nparts=1)

s[C].bind(px, tvm.te.thread_axis("pipeline"))

fadd = tvm.build(s, [A, B, C], tgt, name="myadd")

fadd.save("myadd.o")
fadd.imported_modules[0].save("myadd.xclbin")

tvm.contrib.cc.create_shared("myadd.so", ["myadd.o"])

run.py - a script to use FPGA as an accelerator.

import tvm
import numpy as np
import os

tgt="sdaccel"

fadd = tvm.runtime.load_module("myadd.so")
if os.environ.get("XCL_EMULATION_MODE"):
    fadd_dev = tvm.runtime.load_module("myadd.xclbin")
else:
    fadd_dev = tvm.runtime.load_module("myadd.awsxclbin")
fadd.import_module(fadd_dev)

dev = tvm.device(tgt, 0)

n = 1024
a = tvm.nd.array(np.random.uniform(size=n).astype("float32"), dev)
b = tvm.nd.array(np.random.uniform(size=n).astype("float32"), dev)
c = tvm.nd.array(np.zeros(n, dtype="float32"), dev)

fadd(a, b, c)
tvm.testing.assert_allclose(c.numpy(), a.numpy() + b.numpy())

Setup

Launch an instance using the FPGA Developer AMI. We don’t need an F1 instance for emulation and synthesis, so it is recommended to use a lower cost instance for them.

Setup AWS FPGA development kit.

git clone https://github.com/aws/aws-fpga.git
cd aws-fpga
source sdaccel_setup.sh
source ${XILINX_SDX}/settings64.sh

Setup TVM with OpenCL enabled.

Emulation

Create emconfig.json for emulation.

emconfigutil --platform ${AWS_PLATFORM} --nd 1

Copy emconfig.json to the python binary directory. It is because the current Xilinx toolkit assumes that both host binary and the emconfig.json file are in the same path.
```
cp emconfig.json $(dirname $(which python))
```

Run software emulation

export XCL_EMULATION_MODE=1
export XCL_TARGET=sw_emu

python build.py
python run.py

Run hardware emulation

export XCL_EMULATION_MODE=1
export XCL_TARGET=hw_emu

python build.py
python run.py

Synthesis

Run synthesis with the following script.

unset XCL_EMULATION_MODE
export XCL_TARGET=hw

python build.py

Create AWS FPGA image and upload it to AWS S3.

${SDACCEL_DIR}/tools/create_sdaccel_afi.sh \
    -xclbin=myadd.xclbin -o=myadd \
    -s3_bucket=<bucket-name> -s3_dcp_key=<dcp-folder-name> \
    -s3_logs_key=<logs-folder-name>

This also generates an awsxclbin file, which is necessary to use the AWS FPGA image on F1 instances.

Run

Launch Amazon EC2 F1 instance.
Copy myadd.so, myadd.awsxclbin, and run.py to the F1 instance.

Setup AWS FPGA development kit.

git clone https://github.com/aws/aws-fpga.git
cd aws-fpga
source sdaccel_setup.sh

Setup TVM with OpenCL enabled.

Become root and setup environment variables.

sudo sh
source ${INSTALL_ROOT}/setup.sh

Run
```
python run.py
```