Marvell Machine Learning Integration
1. Introduction
Marvell(R) supports a family of high performance Data Processing Units (DPUs) with integrated compute, high speed I/O and workload accelerators. These workload accelerators includes Marvell’s Machine Learning Inference Processor (MLIP), a highly optimized, integrated inference engine.
TVM supports Marvell’s MLIP using the “mrvl” library. This partitions and compiles supported operations for accelerated execution on MLIP, or LLVM for general compute.
For runtime, the library supports native execution on MLIP hardware as well as Marvell’s ML simulator (mrvl-mlsim).
The library supports Marvell’s Octeon family of processors with ML accelarators.
This guide demonstrates building TVM with codegen and runtime enabled. It also provides example code to compile and run models using ‘mrvl’ runtime.
2. Building TVM with mrvl support
2.1 Clone TVM repo
Refer to the following TVM documentation for cloning TVM https://tvm.apache.org/docs/install/from_source.html
2.2 Build and start the TVM - mrvl docker container
./docker/build.sh demo_mrvl bash # Build the docker container
./docker/bash.sh tvm.demo_mrvl # Load the docker image
3. Compiling a model using TVMC command line
Models can be compiled and run for mrvl target using TVMC which is optimized for performance.
Refer to the following TVMC documentation, for tvmc generic options. https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html
Additional mrvl-specific options may be added as attributes if necessary. The advanced usage is described in this document below.
3.1 TVMC Compilation Flow for a model
Refer to the following TVM documentation, for compilation flow https://tvm.apache.org/docs/arch/index.html#example-compilation-flow
3.2. TVMC - Command line option(s): Syntax for mrvl target
Compiling an ONNX model using the tvmc for mrvl target.
Syntax:
python3 -m tvm.driver.tvmc compile --target="mrvl, llvm"
--target-llvm-<options>
--target-mrvl-<options>
--<tvm-generic-options>
model_file.onnx
Following is an example TVMC Compile command for an ARMv9 core and integrated MLIP cn10ka processor, using only 4 tiles in the block.
Example:
python3 -m tvm.driver.tvmc compile --target="mrvl, llvm" \
--target-llvm-mtriple=aarch64-linux-gnu --target-llvm-mcpu=neoverse-n2 \
--target-mrvl-num_tiles=4 \
--target-mrvl-mattr="hw -quantize=fp16 -wb_pin_ocm=1" \
--cross-compiler aarch64-linux-gnu-gcc \
--output model.tar \
mnist-12.onnx
3.3. TVMC Compiler: mrvl specific Command Line Options
--target-mrvl-mcpu
--target-mrvl-num_tiles
--target-mrvl-mattr
Description of mrvl options
- mcpu:
The CPU class of Marvell(R) ML Inference Processor; possible values = {cn10ka, cnf10kb}; defaults to cn10ka
- num_tiles:
Maximum number of tiles that may be used, possible values = {1,2,4,8}, defaults to 8
- mattr:
Attributes for mrvl; possible values = {quantize, wb_pin_ocm, run_mode}
mattr specifies the data type, code generation options and optimizations.
List of supported attributes are:
1. quantize
Specify the data type. Possible values = {fp16, int8}. Default is fp16, int8 is WIP and full support will be added in a future PR.
2. wb_pin_ocm
Optimize runtime by preloading a model’s weights and bias into the on chip memory. Possible values = {0, 1}. Default is 0 (no preload)
3. run_mode
Specify whether to compile for the simulator or for the target hardware (Octeon). Possible values = {sim, hw}. Default is sim (software simulator).
4. Compile ONNX model using the TVMC flow
In the TVMC mrvl flow, the model is partitioned into Marvell and LLVM regions. Building each partitioned Marvell subgraph generates serialized nodes.json and const.json. Partitioned nodes.json is the representation of the model graph which is suitable for the Marvell compiler (mrvl-tmlc). The compiler compiles the model graph to generate the model binary with MLIP instructions.
4.1 Compile and Run ONNX model for Simulator + LLVM / x86_64 target
Model Compilation for Simulator + LLVM / x86_64 target
python3 -m tvm.driver.tvmc compile --target="mrvl, llvm" \
--target-mrvl-num_tiles=4 --output model.tar model.onnx
Run TVM models on x86_64 host using MLIP Simulator
Generated model binary is simulated using Marvell’s MLIP Simulator(mrvl-mlsim).
python3 -m tvm.driver.tvmc run --inputs infer.npz --outputs predict.npz model.tar --number=0
4.2 Compile and Run ONNX model for Octeon target
Model Compilation for Octeon target
Please refer to section 3.2 for the example command line.
Run TVM models on the Octeon Target
The cross compiled binary can be run on the target hardware using the tvmc run command. Alternatively, the RPC flow enables remote execution on the target device from your local machine: https://tvm.apache.org/docs/how_to/tutorials/cross_compilation_and_rpc.html
python3 -m tvm.driver.tvmc run --inputs infer.npz --outputs predict.npz model.tar
5. Compiling a model using Python APIs
In addition to using TVMC, models can also be compiled and run using TVM Python API. Below is an example to compile and run the MNIST model.
Download MNIST model from the web
cd $HOME
wget https://github.com/onnx/models/raw/main/validated/vision/classification/mnist/model/mnist-12.onnx
Import the TVM and other dependent modules
import tvm, onnx
import numpy as np
import tvm.relay as relay
from tvm.contrib import graph_executor
from tvm.relay.op.contrib.mrvl import partition_for_mrvl
from tvm.relay.build_module import build
from keras.datasets import mnist
Load model onnx file
onnx_model = onnx.load("mnist-12.onnx")
Create a Relay graph from MNIST model
shape_dict = {'Input3' : (1,1,28,28)}
mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
Define option dictionary and Partition the Model
Annotate and partition the graph for mrvl. All operations which are supported by the mrvl will be marked and offloaded to mrvl hardware accelerator. The rest of the operations will go through the regular LLVM compilation and code generation for ARM.
tvm_target = "llvm"
option_dict = {'num_tiles': 4}
mod = partition_for_mrvl(mod, params, **option_dict)
Build the Relay Graph
Build the Relay graph, using the new module returned by partition_for_mrvl.
with tvm.transform.PassContext(opt_level=3, config={"relay.ext.mrvl.options" : option_dict}):
model_lib = relay.build(mod, tvm_target, params=params)
Generate runtime graph of the model library
dev = tvm.cpu()
model_rt_graph = graph_executor.GraphModule(model_lib["default"](dev))
Get test data and initialize model input
(train_X, train_y), (test_X, test_y) = mnist.load_data()
image = tvm.nd.array(test_X[0].reshape(1, 1, 28, 28).astype("float32") / 255)
inputs_dict = {}
inputs_dict["Input3"] = image
model_rt_graph.set_input(**inputs_dict)
Run Inference and print the output
model_rt_graph.run()
output_tensor = model_rt_graph.get_output(0).numpy()
print (output_tensor)