7. Running TVM on bare metal Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU with CMSIS-NN

Author: Grant Watson

This section contains an example of how to use TVM to run a model on an Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU with CMSIS-NN, using bare metal. The Cortex(R)-M55 is a small, low-power CPU designed for use in embedded devices. CMSIS-NN is a collection of kernels optimized for Arm(R) Cortex(R)-M CPUs. The Ethos(TM)-U55 is a microNPU, specifically designed to accelerate ML inference in resource-constrained embedded devices.

In order to run the demo application without having access to a Cortex(R)-M55 and Ethos(TM)-U55 development board, we will be running our sample application on a Fixed Virtual Platform (FVP). The FVP based on Arm(R) Corstone(TM)-300 software, models a hardware system containing a Cortex(R)-M55 and Ethos(TM)-U55. It provides a programmer’s view that is suitable for software development.

In this tutorial, we will be compiling a MobileNet v1 model and instructing TVM to offload operators to the Ethos(TM)-U55 where possible.

Obtaining TVM

To obtain TVM for you platform, please visit https://tlcpack.ai/ and follow the instructions. Once TVM has been installed correctly, you should have access to tvmc from the command line.

Typing tvmc on the command line should display the following:

usage: tvmc [-h] [-v] [--version] {tune,compile,run} ...

TVM compiler driver

optional arguments:
  -h, --help          show this help message and exit
  -v, --verbose       increase verbosity
  --version           print the version and exit

    tune              auto-tune a model
    compile           compile a model.
    run               run a compiled module

TVMC - TVM driver command-line interface

Installing additional python dependencies

In order to run the demo, you will need some additional python packages. These can be installed by using the requirements.txt file below:


These packages can be installed by running the following from the command line:

pip install -r requirements.txt

Obtaining the Model

For this tutorial, we will be working with MobileNet v1. MobileNet v1 is a convolutional neural network designed to classify images, that has been optimized for edge devices. The model we will be using has been pre-trained to classify images into one of 1001 different categories. The network has an input image size of 224x224 so any input images will need to be resized to those dimensions before being used.

For this tutorial we will be using the model in Tflite format.

mkdir -p ./build
cd build
wget https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_224_quant.tgz
gunzip mobilenet_v1_1.0_224_quant.tgz
tar xvf mobilenet_v1_1.0_224_quant.tar

Compiling the model for Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU with CMSIS-NN

Once we’ve downloaded the MobileNet v1 model, the next step is to compile it. To accomplish that, we are going to use tvmc compile. The output we get from the compilation process is a TAR package of the model compiled to the Model Library Format (MLF) for our target platform. We will be able to run that model on our target device using the TVM runtime.

tvmc compile --target=ethos-u,cmsis-nn,c \
             --target-ethos-u-accelerator_config=ethos-u55-256 \
             --target-cmsis-nn-mcpu=cortex-m55 \
             --target-c-mcpu=cortex-m55 \
             --runtime=crt \
             --executor=aot \
             --executor-aot-interface-api=c \
             --executor-aot-unpacked-api=1 \
             --pass-config tir.usmp.enable=1 \
             --pass-config tir.usmp.algorithm=hill_climb \
             --pass-config tir.disable_storage_rewrite=1 \
             --pass-config tir.disable_vectorize=1 \
             ./mobilenet_v1_1.0_224_quant.tflite \


Explanation of tvmc compile arguments:

  • --target=ethos-u,cmsis-nn,c : offload operators to the microNPU where possible, falling back to CMSIS-NN and finally generated C code where an operator is not supported on the microNPU..

  • --target-ethos-u-accelerator_config=ethos-u55-256 : specifies the microNPU configuration

  • --target-c-mcpu=cortex-m55 : Cross-compile for the Cortex(R)-M55.

  • --runtime=crt : Generate glue code to allow operators to work with C runtime.

  • --executor=aot : Use Ahead Of Time compiltaion instead of the Graph Executor.

  • --executor-aot-interface-api=c : Generate a C-style interface with structures designed for integrating into C apps at the boundary.

  • --executor-aot-unpacked-api=1 : Use the unpacked API internally.

  • --pass-config tir.usmp.enable=1 : Enable Unified Static Memory Planning

  • --pass-config tir.usmp.algorithm=hill_climb : Use the hill-climb algorithm for USMP

  • --pass-config tir.disable_storage_rewrite=1 : Disable storage rewrite

  • --pass-config tir.disable_vectorize=1 : Disable vectorize since there are no standard vectorized types in C.

  • ./mobilenet_v1_1.0_224_quant.tflite : The TFLite model that is being compiled.

  • --output-format=mlf : Output should be generated in the Model Library Format.


If you don’t want to make use of the microNPU and want to offload

operators to CMSIS-NN only:

  • Use --target=cmsis-nn,c in place of --target=ethos-u,cmsis-nn,c

  • Remove the microNPU config parameter --target-ethos-u-accelerator_config=ethos-u55-256

Extracting the generated code into the current directory

tar xvf module.tar

Getting ImageNet labels

When running MobileNet v1 on an image, the result is an index in the range 0 to 1000. In order to make our application a little more user friendly, instead of just displaying the category index, we will display the associated label. We will download these image labels into a text file now and use a python script to include them in our C application later.

curl -sS  https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/lite/java/demo/app/src/main/assets/labels_mobilenet_quant_v1_224.txt \
-o ./labels_mobilenet_quant_v1_224.txt

Getting the input image

As input for this tutorial, we will use the image of a cat, but you can substitute an image of your choosing.


We download the image into the build directory and we will use a python script in the next step to convert the image into an array of bytes in a C header file.

curl -sS https://s3.amazonaws.com/model-server/inputs/kitten.jpg -o ./kitten.jpg

Pre-processing the image

The following script will create 2 C header files in the src directory:

  • inputs.h - The image supplied as an argument to the script will be converted to an array of integers for input to our MobileNet v1 model.

  • outputs.h - An integer array of zeroes will reserve 1001 integer values for the output of inference.

 #!python ./convert_image.py
 import os
 import pathlib
 import re
 import sys
 from PIL import Image
 import numpy as np

 def create_header_file(name, section, tensor_name, tensor_data, output_path):
     This function generates a header file containing the data from the numpy array provided.
     file_path = pathlib.Path(f"{output_path}/" + name).resolve()
     # Create header file with npy_data as a C array
     raw_path = file_path.with_suffix(".h").resolve()
     with open(raw_path, "w") as header_file:
             "#include <tvmgen_default.h>\n"
             + f"const size_t {tensor_name}_len = {tensor_data.size};\n"
             + f'uint8_t {tensor_name}[] __attribute__((section("{section}"), aligned(16))) = "'
         data_hexstr = tensor_data.tobytes().hex()
         for i in range(0, len(data_hexstr), 2):

 def create_headers(image_name):
     This function generates C header files for the input and output arrays required to run inferences
     img_path = os.path.join("./", f"{image_name}")

     # Resize image to 224x224
     resized_image = Image.open(img_path).resize((224, 224))
     img_data = np.asarray(resized_image).astype("float32")

     # Convert input to NCHW
     img_data = np.transpose(img_data, (2, 0, 1))

     # Create input header file
     input_data = img_data.astype(np.uint8)
     create_header_file("inputs", "ethosu_scratch", "input", input_data, "./include")
     # Create output header file
     output_data = np.zeros([1001], np.uint8)

 if __name__ == "__main__":

Run the script from the command line:

python convert_image.py ./kitten.jpg

Pre-processing the labels

The following script will create a labels.h header file in the src directory. The labels.txt file that we downloaded previously will be turned into an array of strings. This array will be used to display the label that our image has been classified as.

 #!python ./convert_labels.py
 import os
 import pathlib
 import sys

 def create_labels_header(labels_file, section, output_path):
     This function generates a header file containing the ImageNet labels as an array of strings
     labels_path = pathlib.Path(labels_file).resolve()
     file_path = pathlib.Path(f"{output_path}/labels.h").resolve()

     with open(labels_path) as f:
         labels = f.readlines()

     with open(file_path, "w") as header_file:
         header_file.write(f'char* labels[] __attribute__((section("{section}"), aligned(16))) = {{')

         for _, label in enumerate(labels):


 if __name__ == "__main__":
     create_labels_header(sys.argv[1], "ethosu_scratch", "./include")

Run the script from the command line:

python convert_labels.py

Writing the demo application

The following C application will run a single inference of the MobileNet v1 model on the image that we downloaded and converted to an array of integers previously. Since the model was compiled with a target of “ethos-u …”, operators supported by the Ethos(TM)-U55 NPU will be offloaded for acceleration. Once the application is built and run, our test image should be correctly classied as a “tabby” and the result should be displayed on the console. This file should be placed in ./src

 #include <stdio.h>
 #include <tvm_runtime.h>

 #include "ethosu_mod.h"
 #include "uart_stdout.h"

 // Header files generated by convert_image.py and convert_labels.py
 #include "inputs.h"
 #include "labels.h"
 #include "outputs.h"

 int abs(int v) { return v * ((v > 0) - (v < 0)); }

 int main(int argc, char** argv) {
   printf("Starting Demo\n");

   printf("Allocating memory\n");
   StackMemoryManager_Init(&app_workspace, g_aot_memory, WORKSPACE_SIZE);

   printf("Running inference\n");
   struct tvmgen_default_outputs outputs = {
       .output = output,
   struct tvmgen_default_inputs inputs = {
       .input = input,
   struct ethosu_driver* driver = ethosu_reserve_driver();
   struct tvmgen_default_devices devices = {
       .ethos_u = driver,
   tvmgen_default_run(&inputs, &outputs, &devices);

   // Calculate index of max value
   uint8_t max_value = 0;
   int32_t max_index = -1;
   for (unsigned int i = 0; i < output_len; ++i) {
     if (output[i] > max_value) {
       max_value = output[i];
       max_index = i;
   printf("The image has been classified as '%s'\n", labels[max_index]);

   // The FVP will shut down when it receives "EXITTHESIM" on the UART
   while (1 == 1)
   return 0;

In addition, you will need these header files from github in your ./include directory:

include files


If you’d like to use FreeRTOS for task scheduling and queues, a sample application can be found here demo_freertos.c <https://github.com/apache/tvm/blob/main/apps/microtvm/ethosu/src/demo_freertos.c>

Creating the linker script

We need to create a linker script that will be used when we build our application in the following section. The linker script tells the linker where everything should be placed in memory. The corstone300.ld linker script below should be placed in your working directory.

An example linker script for the FVP can be found here corstone300.ld


The code generated by TVM will place the model weights and the Arm(R) Ethos(TM)-U55 command stream in a section named ethosu_scratch. For a model the size of MobileNet v1, the weights and command stream will not fit into the limited SRAM available. For this reason it’s important that the linker script places the ethosu_scratch section into DRAM (DDR).


Before building and running the application, you will need to update your PATH environment variable to include the path to cmake 3.19.5 and the FVP. For example if you’ve installed these in /opt/arm , then you would do the following:

export PATH=/opt/arm/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4:/opt/arm/cmake/bin:$PATH

Building the demo application using make

We can now build the demo application using make. The Makefile should be placed in your working directory before running make on the command line:

An example Makefile can be found here: Makefile


If you’re using FreeRTOS, the Makefile builds it from the specified FREERTOS_PATH:

make FREERTOS_PATH=<FreeRTOS directory>

Running the demo application

Finally, we can run our demo appliction on the Fixed Virtual Platform (FVP), by using the following command:

FVP_Corstone_SSE-300_Ethos-U55 -C cpu0.CFGDTCMSZ=15 \
-C cpu0.CFGITCMSZ=15 -C mps3_board.uart0.out_file=\"-\" -C mps3_board.uart0.shutdown_tag=\"EXITTHESIM\" \
-C mps3_board.visualisation.disable-visualisation=1 -C mps3_board.telnetterminal0.start_telnet=0 \
-C mps3_board.telnetterminal1.start_telnet=0 -C mps3_board.telnetterminal2.start_telnet=0 -C mps3_board.telnetterminal5.start_telnet=0 \
-C ethosu.extra_args="--fast" \
-C ethosu.num_macs=256 ./build/demo

You should see the following output displayed in your console window:

telnetterminal0: Listening for serial connection on port 5000
telnetterminal1: Listening for serial connection on port 5001
telnetterminal2: Listening for serial connection on port 5002
telnetterminal5: Listening for serial connection on port 5003

    Ethos-U rev dedfa618 --- Jan 12 2021 23:03:55
    (C) COPYRIGHT 2019-2021 Arm Limited

Starting Demo
ethosu_init. base_address=0x48102000, fast_memory=0x0, fast_memory_size=0, secure=1, privileged=1
ethosu_register_driver: New NPU driver at address 0x20000de8 is registered.
Soft reset NPU
Allocating memory
Running inference
ethosu_find_and_reserve_driver - Driver 0x20000de8 reserved.
ethosu_invoke OPTIMIZER_CONFIG
Optimizer release nbr: 0 patch: 1
Optimizer config cmd_stream_version: 0 macs_per_cc: 8 shram_size: 48 custom_dma: 0
Optimizer config Ethos-U version: 1.0.6
Ethos-U config cmd_stream_version: 0 macs_per_cc: 8 shram_size: 48 custom_dma: 0
Ethos-U version: 1.0.6
ethosu_invoke NOP
ethosu_invoke NOP
ethosu_invoke NOP
ethosu_invoke COMMAND_STREAM
handle_command_stream: cmd_stream=0x61025be0, cms_length 1181
QBASE=0x0000000061025be0, QSIZE=4724, base_pointer_offset=0x00000000
CMD=0x000Interrupt. status=0xffff0022, qread=4724
ethosu_release_driver - Driver 0x20000de8 released
The image has been classified as 'tabby'
Info: /OSCI/SystemC: Simulation stopped by user.

You should see near the end of the output that the image has been correctly classified as ‘tabby’.

Gallery generated by Sphinx-Gallery