Getting Starting using TVMC Python: a high-level API for TVM

Author: Jocelyn Shiue

Hi! Here we explain the scripting tool designed for the complete TVM beginner. 🙂

Before we get started let’s get an example model if you don’t already have one. Follow the steps to download a resnet model via the terminal:

mkdir myscripts
cd myscripts
wget https://github.com/onnx/models/raw/b9a54e89508f101a1611cd64f4ef56b9cb62c7cf/vision/classification/resnet/model/resnet50-v2-7.onnx
mv resnet50-v2-7.onnx my_model.onnx
touch tvmcpythonintro.py

Let’s start editing the python file in your favorite text editor.


Step 0: Imports

from tvm.driver import tvmc

Step 1: Load a model

Let’s import our model into tvmc. This step converts a machine learning model from a supported framework into TVM’s high level graph representation language called Relay. This is to have a unified starting point for all models in tvm. The frameworks we currently support are: Keras, ONNX, Tensorflow, TFLite, and PyTorch.

model = tvmc.load('my_model.onnx') #Step 1: Load

If you’d like to see the Relay, you can run: model.summary()

All frameworks support overwriting the input shapes with a shape_dict argument. For most frameworks this is optional, but for Pytorch this is necessary as TVM cannot automatically search for it.

#model = tvmc.load('my_model.onnx', shape_dict={'input1' : [1, 2, 3, 4], 'input2' : [1, 2, 3, 4]}) #Step 1: Load + shape_dict

A suggested way to see the model’s input/shape_dict is via netron. After opening the model, click the first node to see the name(s) and shape(s) in the inputs section.

Step 2: Compile

Now that our model is in Relay, our next step is to compile it to a desired hardware to run on. We refer to this hardware as a target. This compilation process translates the model from Relay into a lower-level language that the target machine can understand.

In order to compile a model a tvm.target string is required. To learn more about tvm.targets and their options look at the documentation. Some examples include:

  1. cuda (Nvidia GPU)

  2. llvm (CPU)

  3. llvm -mcpu=cascadelake (Intel CPU)

package = tvmc.compile(model, target="llvm") #Step 2: Compile

The compilation step returns a package.

Step 3: Run

The compiled package can now be run on the hardware target. The device input options are: CPU, Cuda, CL, Metal, and Vulkan.

result = tvmc.run(package, device="cpu") #Step 3: Run

And you can print the results: print(result)

Save and then start the process in the terminal:

python my_tvmc_script.py

Note: Your fans may become very active

Example results:

Time elapsed for training: 18.99 s
Execution time summary:
mean (ms)   max (ms)   min (ms)   std (ms)
  25.24      26.12      24.89       0.38


Output Names:
['output_0']

Additional TVMC Functionalities

Saving the model

To make things faster for later, after loading the model (Step 1) save the Relay version. The model will then appear where you saved it for later in the coverted syntax.

model = tvmc.load('my_model.onnx') #Step 1: Load
model.save(desired_model_path)

Saving the package

After the model has been compiled (Step 2) the package also is also saveable.

tvmc.compile(model, target="llvm", package_path="whatever") #Step 2: Compile

new_package = tvmc.TVMCPackage(package_path="whatever")
result = tvmc.run(new_package, device="cpu") #Step 3: Run

Using Autoscheduler

Use the next generation of tvm to enable potentially faster run speed results. The search space of the schedules is automatically generated unlike previously where they needed to be hand written. (Learn more: 1, 2)

tvmc.tune(model, target="llvm", enable_autoscheduler = True)

Saving the tuning results

The tuning results can be saved in a file for later reuse.

Method 1:
log_file = "hello.json"

# Run tuning
tvmc.tune(model, target="llvm", tuning_records=log_file)

...

# Later run tuning and reuse tuning results
tvmc.tune(model, target="llvm", prior_records=log_file)
Method 2:
# Run tuning
tuning_records = tvmc.tune(model, target="llvm")

...

# Later run tuning and reuse tuning results
tvmc.tune(model, target="llvm", prior_records=tuning_records)

Tuning a more complex model:

If you notice T’s printing that look like .........T.T..T..T..T.T.T.T.T.T. increase the searching time frame:

tvmc.tune(model,trials=10000,timeout=10,)

Compiling a model for a remote device:

A remote procedural call (RPC) is useful when you would like to compile for hardware that is not on your local machine. The tvmc methods support this. To set up the RPC server take a look at the ‘Set up RPC Server on Device’ section in this document.

Within the TVMC Script include the following and adjust accordingly:

tvmc.tune(
     model,
     target=target, # Compilation target as string // Device to compile for
     target_host=target_host, # Host processor
     hostname=host_ip_address, # The IP address of an RPC tracker, used when benchmarking remotely.
     port=port_number, # The port of the RPC tracker to connect to. Defaults to 9090.
     rpc_key=your_key, # The RPC tracker key of the target device. Required when rpc_tracker is provided
)

Gallery generated by Sphinx-Gallery