In [None]:
%%shell
# Installs TVM version 0.12.0 from PyPI. If you wish to build
# from source, see https://tvm.apache.org/docs/install/from_source.html
pip install apache-tvm==0.12.0



# 7. Running TVM on bare metal Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU with CMSIS-NN
**Author**:
[Grant Watson](https://github.com/grant-arm)

This section contains an example of how to use TVM to run a model
on an Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU with CMSIS-NN, using bare metal.
The Cortex(R)-M55 is a small, low-power CPU designed for use in embedded
devices. CMSIS-NN is a collection of kernels optimized for Arm(R) Cortex(R)-M CPUs.
The Ethos(TM)-U55 is a microNPU, specifically designed to accelerate
ML inference in resource-constrained embedded devices.

In order to run the demo application without having access to a Cortex(R)-M55
and Ethos(TM)-U55 development board, we will be running our sample application
on a Fixed Virtual Platform (FVP). The FVP based on Arm(R) Corstone(TM)-300
software, models a hardware system containing a Cortex(R)-M55 and Ethos(TM)-U55.
It provides a programmer's view that is suitable for software development.

In this tutorial, we will be compiling a MobileNet v1 model and instructing
TVM to offload operators to the Ethos(TM)-U55 where possible.


## Obtaining TVM

To obtain TVM for you platform, please visit https://tlcpack.ai/ and follow the
instructions. Once TVM has been installed correctly, you should have access to
``tvmc`` from the command line.

Typing ``tvmc`` on the command line should display the following:

```text
usage: tvmc [-h] [-v] [--version] {tune,compile,run} ...

TVM compiler driver

optional arguments:
  -h, --help          show this help message and exit
  -v, --verbose       increase verbosity
  --version           print the version and exit

commands:
  {tune,compile,run}
    tune              auto-tune a model
    compile           compile a model.
    run               run a compiled module

TVMC - TVM driver command-line interface
```


## Installing additional python dependencies

In order to run the demo, you will need some additional python packages.
These can be installed by using the requirements.txt file below:

These packages can be installed by running the following from the command line:

```bash
pip install -r requirements.txt
```


## Obtaining the Model

For this tutorial, we will be working with MobileNet v1.
MobileNet v1 is a convolutional neural network designed to classify images,
that has been optimized for edge devices. The model we will be using has been
pre-trained to classify images into one of 1001 different categories.
The network has an input image size of 224x224 so any input images will need
to be resized to those dimensions before being used.

For this tutorial we will be using the model in Tflite format.

```bash
mkdir -p ./build
cd build
wget https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_224_quant.tgz
gunzip mobilenet_v1_1.0_224_quant.tgz
tar xvf mobilenet_v1_1.0_224_quant.tar
```


## Compiling the model for Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU with CMSIS-NN

Once we've downloaded the MobileNet v1 model, the next step is to compile it.
To accomplish that, we are going to use ``tvmc compile``. The output we get from
the compilation process is a TAR package of the model compiled to the Model
Library Format (MLF) for our target platform. We will be able to run that model
on our target device using the TVM runtime.

```bash
tvmc compile --target=ethos-u,cmsis-nn,c \
             --target-ethos-u-accelerator_config=ethos-u55-256 \
             --target-cmsis-nn-mcpu=cortex-m55 \
             --target-c-mcpu=cortex-m55 \
             --runtime=crt \
             --executor=aot \
             --executor-aot-interface-api=c \
             --executor-aot-unpacked-api=1 \
             --pass-config tir.usmp.enable=1 \
             --pass-config tir.usmp.algorithm=hill_climb \
             --pass-config tir.disable_storage_rewrite=1 \
             --pass-config tir.disable_vectorize=1 \
             ./mobilenet_v1_1.0_224_quant.tflite \
             --output-format=mlf
```


<div class="alert alert-info"><h4>Note</h4><p>Explanation of tvmc compile arguments:

  * ``--target=ethos-u,cmsis-nn,c`` : offload operators to the microNPU where possible, falling back to CMSIS-NN and finally generated C code where an operator is not supported on the microNPU..

  * ``--target-ethos-u-accelerator_config=ethos-u55-256`` : specifies the microNPU configuration

  * ``--target-c-mcpu=cortex-m55`` : Cross-compile for the Cortex(R)-M55.

  * ``--runtime=crt`` : Generate glue code to allow operators to work with C runtime.

  * ``--executor=aot`` : Use Ahead Of Time compiltaion instead of the Graph Executor.

  * ``--executor-aot-interface-api=c`` : Generate a C-style interface with structures designed for integrating into C apps at the boundary.

  * ``--executor-aot-unpacked-api=1`` : Use the unpacked API internally.

  * ``--pass-config tir.usmp.enable=1`` : Enable Unified Static Memory Planning

  * ``--pass-config tir.usmp.algorithm=hill_climb`` : Use the hill-climb algorithm for USMP

  * ``--pass-config tir.disable_storage_rewrite=1`` : Disable storage rewrite

  * ``--pass-config tir.disable_vectorize=1`` : Disable vectorize since there are no standard vectorized types in C.

  * ``./mobilenet_v1_1.0_224_quant.tflite`` : The TFLite model that is being compiled.

  * ``--output-format=mlf`` : Output should be generated in the Model Library Format.</p></div>




<div class="alert alert-info"><h4>Note</h4><p>If you don't want to make use of the microNPU and want to offload
   operators to CMSIS-NN only:

  * Use ``--target=cmsis-nn,c`` in place of ``--target=ethos-u,cmsis-nn,c``

  * Remove the microNPU config parameter ``--target-ethos-u-accelerator_config=ethos-u55-256``</p></div>




## Extracting the generated code into the current directory

```bash
tar xvf module.tar
```


## Getting ImageNet labels

When running MobileNet v1 on an image, the result is an index in the range 0 to
1000. In order to make our application a little more user friendly, instead of
just displaying the category index, we will display the associated label. We
will download these image labels into a text file now and use a python script
to include them in our C application later.

```bash
curl -sS  https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/lite/java/demo/app/src/main/assets/labels_mobilenet_quant_v1_224.txt \
-o ./labels_mobilenet_quant_v1_224.txt
```


## Getting the input image

As input for this tutorial, we will use the image of a cat, but you can
substitute an image of your choosing.

<img src="https://s3.amazonaws.com/model-server/inputs/kitten.jpg" height="224px" width="224px" align="center">

We download the image into the build directory and we will use a python script
in the next step to convert the image into an array of bytes in a C header file.

```bash
curl -sS https://s3.amazonaws.com/model-server/inputs/kitten.jpg -o ./kitten.jpg
```


## Pre-processing the image

The following script will create 2 C header files in the src directory:

* ``inputs.h`` - The image supplied as an argument to the script will be converted
  to an array of integers for input to our MobileNet v1 model.
* ``outputs.h`` - An integer array of zeroes will reserve 1001 integer values
  for the output of inference.

Run the script from the command line:

```bash
python convert_image.py ./kitten.jpg
```


## Pre-processing the labels

The following script will create a ``labels.h`` header file in the src directory.
The labels.txt file that we downloaded previously will be turned
into an array of strings. This array will be used to display the label that
our image has been classified as.

Run the script from the command line:

```bash
python convert_labels.py
```


## Writing the demo application

The following C application will run a single inference of the MobileNet v1
model on the image that we downloaded and converted to an array of integers
previously. Since the model was compiled with a target of "ethos-u ...",
operators supported by the Ethos(TM)-U55 NPU will be offloaded for acceleration.
Once the application is built and run, our test image should be correctly
classied as a "tabby" and the result should be displayed on the console.
This file should be placed in ``./src``

In addition, you will need these header files from github in your ``./include`` directory:

[include files](https://github.com/apache/tvm/tree/main/apps/microtvm/ethosu/include)



<div class="alert alert-info"><h4>Note</h4><p>If you'd like to use FreeRTOS for task scheduling and queues, a sample application can be found here
  `demo_freertos.c <https://github.com/apache/tvm/blob/main/apps/microtvm/ethosu/src/demo_freertos.c>`</p></div>



## Creating the linker script

We need to create a linker script that will be used when we build our application
in the following section. The linker script tells the linker where everything
should be placed in memory. The corstone300.ld linker script below should be
placed in your working directory.

An example linker script for the FVP can be found here
[corstone300.ld](https://github.com/apache/tvm/blob/main/apps/microtvm/ethosu/corstone300.ld)



<div class="alert alert-info"><h4>Note</h4><p>The code generated by TVM will place the model weights and the Arm(R)
  Ethos(TM)-U55 command stream in a section named ``ethosu_scratch``.
  For a model the size of MobileNet v1, the weights and command stream will not
  fit into the limited SRAM available. For this reason it's important that the
  linker script places the ``ethosu_scratch`` section into DRAM (DDR).</p></div>



<div class="alert alert-info"><h4>Note</h4><p>Before building and running the application, you will need to update your
  PATH environment variable to include the path to cmake 3.19.5 and the FVP.
  For example if you've installed these in ``/opt/arm`` , then you would do
  the following:

  ``export PATH=/opt/arm/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4:/opt/arm/cmake/bin:$PATH``</p></div>




## Building the demo application using make

We can now build the demo application using make. The Makefile should be placed
in your working directory before running ``make`` on the command line:

An example Makefile can be found here:
[Makefile](https://github.com/apache/tvm/blob/main/apps/microtvm/ethosu/Makefile)



<div class="alert alert-info"><h4>Note</h4><p>If you're using FreeRTOS, the Makefile builds it from the specified FREERTOS_PATH:
    ``make FREERTOS_PATH=<FreeRTOS directory>``</p></div>




## Running the demo application

Finally, we can run our demo appliction on the Fixed Virtual Platform (FVP),
by using the following command:

```bash
FVP_Corstone_SSE-300_Ethos-U55 -C cpu0.CFGDTCMSZ=15 \
-C cpu0.CFGITCMSZ=15 -C mps3_board.uart0.out_file=\"-\" -C mps3_board.uart0.shutdown_tag=\"EXITTHESIM\" \
-C mps3_board.visualisation.disable-visualisation=1 -C mps3_board.telnetterminal0.start_telnet=0 \
-C mps3_board.telnetterminal1.start_telnet=0 -C mps3_board.telnetterminal2.start_telnet=0 -C mps3_board.telnetterminal5.start_telnet=0 \
-C ethosu.extra_args="--fast" \
-C ethosu.num_macs=256 ./build/demo
```
You should see the following output displayed in your console window:

```text
telnetterminal0: Listening for serial connection on port 5000
telnetterminal1: Listening for serial connection on port 5001
telnetterminal2: Listening for serial connection on port 5002
telnetterminal5: Listening for serial connection on port 5003

    Ethos-U rev dedfa618 --- Jan 12 2021 23:03:55
    (C) COPYRIGHT 2019-2021 Arm Limited
    ALL RIGHTS RESERVED

Starting Demo
ethosu_init. base_address=0x48102000, fast_memory=0x0, fast_memory_size=0, secure=1, privileged=1
ethosu_register_driver: New NPU driver at address 0x20000de8 is registered.
CMD=0x00000000
Soft reset NPU
Allocating memory
Running inference
ethosu_find_and_reserve_driver - Driver 0x20000de8 reserved.
ethosu_invoke
CMD=0x00000004
QCONFIG=0x00000002
REGIONCFG0=0x00000003
REGIONCFG1=0x00000003
REGIONCFG2=0x00000013
REGIONCFG3=0x00000053
REGIONCFG4=0x00000153
REGIONCFG5=0x00000553
REGIONCFG6=0x00001553
REGIONCFG7=0x00005553
AXI_LIMIT0=0x0f1f0000
AXI_LIMIT1=0x0f1f0000
AXI_LIMIT2=0x0f1f0000
AXI_LIMIT3=0x0f1f0000
ethosu_invoke OPTIMIZER_CONFIG
handle_optimizer_config:
Optimizer release nbr: 0 patch: 1
Optimizer config cmd_stream_version: 0 macs_per_cc: 8 shram_size: 48 custom_dma: 0
Optimizer config Ethos-U version: 1.0.6
Ethos-U config cmd_stream_version: 0 macs_per_cc: 8 shram_size: 48 custom_dma: 0
Ethos-U version: 1.0.6
ethosu_invoke NOP
ethosu_invoke NOP
ethosu_invoke NOP
ethosu_invoke COMMAND_STREAM
handle_command_stream: cmd_stream=0x61025be0, cms_length 1181
QBASE=0x0000000061025be0, QSIZE=4724, base_pointer_offset=0x00000000
BASEP0=0x0000000061026e60
BASEP1=0x0000000060002f10
BASEP2=0x0000000060002f10
BASEP3=0x0000000061000fb0
BASEP4=0x0000000060000fb0
CMD=0x000Interrupt. status=0xffff0022, qread=4724
CMD=0x00000006
00006
CMD=0x0000000c
ethosu_release_driver - Driver 0x20000de8 released
The image has been classified as 'tabby'
EXITTHESIM
Info: /OSCI/SystemC: Simulation stopped by user.
```
You should see near the end of the output that the image has been correctly
classified as 'tabby'.

