.. DO NOT EDIT. THIS FILE WAS AUTOMATICALLY GENERATED BY
.. TVM'S MONKEY-PATCHED VERSION OF SPHINX-GALLERY. TO MAKE
.. CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "how_to/work_with_microtvm/micro_ethosu.py"

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        This tutorial can be used interactively with Google Colab! You can also click
        :ref:`here <sphx_glr_download_how_to_work_with_microtvm_micro_ethosu.py>` to run the Jupyter notebook locally.

        .. image:: https://raw.githubusercontent.com/tlc-pack/web-data/main/images/utilities/colab_button.svg
            :align: center
            :target: https://colab.research.google.com/github/apache/tvm-site/blob/asf-site/docs/_downloads/55a9eff88b1303e525d53269eeb16897/micro_ethosu.ipynb
            :width: 300px

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_how_to_work_with_microtvm_micro_ethosu.py:


.. _tutorial-micro-ethosu:

7. Running TVM on bare metal Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU with CMSIS-NN
=========================================================================================
**Author**:
`Grant Watson <https://github.com/grant-arm>`_

This section contains an example of how to use TVM to run a model
on an Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU with CMSIS-NN, using bare metal.
The Cortex(R)-M55 is a small, low-power CPU designed for use in embedded
devices. CMSIS-NN is a collection of kernels optimized for Arm(R) Cortex(R)-M CPUs.
The Ethos(TM)-U55 is a microNPU, specifically designed to accelerate
ML inference in resource-constrained embedded devices.

In order to run the demo application without having access to a Cortex(R)-M55
and Ethos(TM)-U55 development board, we will be running our sample application
on a Fixed Virtual Platform (FVP). The FVP based on Arm(R) Corstone(TM)-300
software, models a hardware system containing a Cortex(R)-M55 and Ethos(TM)-U55.
It provides a programmer's view that is suitable for software development.

In this tutorial, we will be compiling a MobileNet v1 model and instructing
TVM to offload operators to the Ethos(TM)-U55 where possible.

.. GENERATED FROM PYTHON SOURCE LINES 44-72

Obtaining TVM
-------------

To obtain TVM for you platform, please visit https://tlcpack.ai/ and follow the
instructions. Once TVM has been installed correctly, you should have access to
``tvmc`` from the command line.

Typing ``tvmc`` on the command line should display the following:

.. code-block:: text

    usage: tvmc [-h] [-v] [--version] {tune,compile,run} ...

    TVM compiler driver

    optional arguments:
      -h, --help          show this help message and exit
      -v, --verbose       increase verbosity
      --version           print the version and exit

    commands:
      {tune,compile,run}
        tune              auto-tune a model
        compile           compile a model.
        run               run a compiled module

    TVMC - TVM driver command-line interface


.. GENERATED FROM PYTHON SOURCE LINES 74-104

Installing additional python dependencies
-----------------------------------------

In order to run the demo, you will need some additional python packages.
These can be installed by using the requirements.txt file below:

.. code-block:: text
   :caption: requirements.txt
   :name: requirements.txt

    attrs==21.2.0
    cloudpickle==2.0.0
    decorator==5.1.0
    ethos-u-vela==3.8.0
    flatbuffers==2.0.7
    lxml==4.6.3
    nose==1.3.7
    numpy==1.19.5
    Pillow==8.3.2
    psutil==5.8.0
    scipy==1.5.4
    tflite==2.4.0
    tornado==6.1

These packages can be installed by running the following from the command line:

.. code-block:: bash

  pip install -r requirements.txt


.. GENERATED FROM PYTHON SOURCE LINES 106-126

Obtaining the Model
-------------------

For this tutorial, we will be working with MobileNet v1.
MobileNet v1 is a convolutional neural network designed to classify images,
that has been optimized for edge devices. The model we will be using has been
pre-trained to classify images into one of 1001 different categories.
The network has an input image size of 224x224 so any input images will need
to be resized to those dimensions before being used.

For this tutorial we will be using the model in Tflite format.

.. code-block:: bash

  mkdir -p ./build
  cd build
  wget https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_224_quant.tgz
  gunzip mobilenet_v1_1.0_224_quant.tgz
  tar xvf mobilenet_v1_1.0_224_quant.tar


.. GENERATED FROM PYTHON SOURCE LINES 128-154

Compiling the model for Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU with CMSIS-NN
------------------------------------------------------------------------------------

Once we've downloaded the MobileNet v1 model, the next step is to compile it.
To accomplish that, we are going to use ``tvmc compile``. The output we get from
the compilation process is a TAR package of the model compiled to the Model
Library Format (MLF) for our target platform. We will be able to run that model
on our target device using the TVM runtime.

.. code-block:: bash

  tvmc compile --target=ethos-u,cmsis-nn,c \
               --target-ethos-u-accelerator_config=ethos-u55-256 \
               --target-cmsis-nn-mcpu=cortex-m55 \
               --target-c-mcpu=cortex-m55 \
               --runtime=crt \
               --executor=aot \
               --executor-aot-interface-api=c \
               --executor-aot-unpacked-api=1 \
               --pass-config tir.usmp.enable=1 \
               --pass-config tir.usmp.algorithm=hill_climb \
               --pass-config tir.disable_storage_rewrite=1 \
               --pass-config tir.disable_vectorize=1 \
               ./mobilenet_v1_1.0_224_quant.tflite \
               --output-format=mlf


.. GENERATED FROM PYTHON SOURCE LINES 156-184

.. note:: Explanation of tvmc compile arguments:

  * ``--target=ethos-u,cmsis-nn,c`` : offload operators to the microNPU where possible, falling back to CMSIS-NN and finally generated C code where an operator is not supported on the microNPU..

  * ``--target-ethos-u-accelerator_config=ethos-u55-256`` : specifies the microNPU configuration

  * ``--target-c-mcpu=cortex-m55`` : Cross-compile for the Cortex(R)-M55.

  * ``--runtime=crt`` : Generate glue code to allow operators to work with C runtime.

  * ``--executor=aot`` : Use Ahead Of Time compiltaion instead of the Graph Executor.

  * ``--executor-aot-interface-api=c`` : Generate a C-style interface with structures designed for integrating into C apps at the boundary.

  * ``--executor-aot-unpacked-api=1`` : Use the unpacked API internally.

  * ``--pass-config tir.usmp.enable=1`` : Enable Unified Static Memory Planning

  * ``--pass-config tir.usmp.algorithm=hill_climb`` : Use the hill-climb algorithm for USMP

  * ``--pass-config tir.disable_storage_rewrite=1`` : Disable storage rewrite

  * ``--pass-config tir.disable_vectorize=1`` : Disable vectorize since there are no standard vectorized types in C.

  * ``./mobilenet_v1_1.0_224_quant.tflite`` : The TFLite model that is being compiled.

  * ``--output-format=mlf`` : Output should be generated in the Model Library Format.


.. GENERATED FROM PYTHON SOURCE LINES 186-193

.. note:: If you don't want to make use of the microNPU and want to offload
   operators to CMSIS-NN only:

  * Use ``--target=cmsis-nn,c`` in place of ``--target=ethos-u,cmsis-nn,c``

  * Remove the microNPU config parameter ``--target-ethos-u-accelerator_config=ethos-u55-256``


.. GENERATED FROM PYTHON SOURCE LINES 195-202

Extracting the generated code into the current directory
--------------------------------------------------------

.. code-block:: bash

  tar xvf module.tar


.. GENERATED FROM PYTHON SOURCE LINES 204-218

Getting ImageNet labels
-----------------------

When running MobileNet v1 on an image, the result is an index in the range 0 to
1000. In order to make our application a little more user friendly, instead of
just displaying the category index, we will display the associated label. We
will download these image labels into a text file now and use a python script
to include them in our C application later.

.. code-block:: bash

  curl -sS  https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/lite/java/demo/app/src/main/assets/labels_mobilenet_quant_v1_224.txt \
  -o ./labels_mobilenet_quant_v1_224.txt


.. GENERATED FROM PYTHON SOURCE LINES 220-238

Getting the input image
-----------------------

As input for this tutorial, we will use the image of a cat, but you can
substitute an image of your choosing.

.. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
   :height: 224px
   :width: 224px
   :align: center

We download the image into the build directory and we will use a python script
in the next step to convert the image into an array of bytes in a C header file.

.. code-block:: bash

  curl -sS https://s3.amazonaws.com/model-server/inputs/kitten.jpg -o ./kitten.jpg


.. GENERATED FROM PYTHON SOURCE LINES 240-317

Pre-processing the image
------------------------

The following script will create 2 C header files in the src directory:

* ``inputs.h`` - The image supplied as an argument to the script will be converted
  to an array of integers for input to our MobileNet v1 model.
* ``outputs.h`` - An integer array of zeroes will reserve 1001 integer values
  for the output of inference.

.. code-block:: python
   :caption: convert_image.py
   :name: convert_image.py

    #!python ./convert_image.py
    import os
    import pathlib
    import re
    import sys
    from PIL import Image
    import numpy as np


    def create_header_file(name, section, tensor_name, tensor_data, output_path):
        """
        This function generates a header file containing the data from the numpy array provided.
        """
        file_path = pathlib.Path(f"{output_path}/" + name).resolve()
        # Create header file with npy_data as a C array
        raw_path = file_path.with_suffix(".h").resolve()
        with open(raw_path, "w") as header_file:
            header_file.write(
                "#include <tvmgen_default.h>\n"
                + f"const size_t {tensor_name}_len = {tensor_data.size};\n"
                + f'uint8_t {tensor_name}[] __attribute__((section("{section}"), aligned(16))) = "'
            )
            data_hexstr = tensor_data.tobytes().hex()
            for i in range(0, len(data_hexstr), 2):
                header_file.write(f"\\x{data_hexstr[i:i+2]}")
            header_file.write('";\n\n')


    def create_headers(image_name):
        """
        This function generates C header files for the input and output arrays required to run inferences
        """
        img_path = os.path.join("./", f"{image_name}")

        # Resize image to 224x224
        resized_image = Image.open(img_path).resize((224, 224))
        img_data = np.asarray(resized_image).astype("float32")

        # Convert input to NCHW
        img_data = np.transpose(img_data, (2, 0, 1))

        # Create input header file
        input_data = img_data.astype(np.uint8)
        create_header_file("inputs", "ethosu_scratch", "input", input_data, "./include")
        # Create output header file
        output_data = np.zeros([1001], np.uint8)
        create_header_file(
            "outputs",
            "output_data_sec",
            "output",
            output_data,
            "./include",
        )


    if __name__ == "__main__":
        create_headers(sys.argv[1])

Run the script from the command line:

.. code-block:: bash

  python convert_image.py ./kitten.jpg

.. GENERATED FROM PYTHON SOURCE LINES 319-364

Pre-processing the labels
-------------------------

The following script will create a ``labels.h`` header file in the src directory.
The labels.txt file that we downloaded previously will be turned
into an array of strings. This array will be used to display the label that
our image has been classified as.

.. code-block:: python
   :caption: convert_labels.py
   :name: convert_labels.py

    #!python ./convert_labels.py
    import os
    import pathlib
    import sys


    def create_labels_header(labels_file, section, output_path):
        """
        This function generates a header file containing the ImageNet labels as an array of strings
        """
        labels_path = pathlib.Path(labels_file).resolve()
        file_path = pathlib.Path(f"{output_path}/labels.h").resolve()

        with open(labels_path) as f:
            labels = f.readlines()

        with open(file_path, "w") as header_file:
            header_file.write(f'char* labels[] __attribute__((section("{section}"), aligned(16))) = {{')

            for _, label in enumerate(labels):
                header_file.write(f'"{label.rstrip()}",')

            header_file.write("};\n")


    if __name__ == "__main__":
        create_labels_header(sys.argv[1], "ethosu_scratch", "./include")

Run the script from the command line:

.. code-block:: bash

  python convert_labels.py

.. GENERATED FROM PYTHON SOURCE LINES 366-438

Writing the demo application
----------------------------

The following C application will run a single inference of the MobileNet v1
model on the image that we downloaded and converted to an array of integers
previously. Since the model was compiled with a target of "ethos-u ...",
operators supported by the Ethos(TM)-U55 NPU will be offloaded for acceleration.
Once the application is built and run, our test image should be correctly
classied as a "tabby" and the result should be displayed on the console.
This file should be placed in ``./src``

.. code-block:: c
   :caption: demo.c
   :name: demo.c

    #include <stdio.h>
    #include <tvm_runtime.h>

    #include "ethosu_mod.h"
    #include "uart_stdout.h"

    // Header files generated by convert_image.py and convert_labels.py
    #include "inputs.h"
    #include "labels.h"
    #include "outputs.h"

    int abs(int v) { return v * ((v > 0) - (v < 0)); }

    int main(int argc, char** argv) {
      UartStdOutInit();
      printf("Starting Demo\n");
      EthosuInit();

      printf("Allocating memory\n");
      StackMemoryManager_Init(&app_workspace, g_aot_memory, WORKSPACE_SIZE);

      printf("Running inference\n");
      struct tvmgen_default_outputs outputs = {
          .output = output,
      };
      struct tvmgen_default_inputs inputs = {
          .input = input,
      };
      struct ethosu_driver* driver = ethosu_reserve_driver();
      struct tvmgen_default_devices devices = {
          .ethos_u = driver,
      };
      tvmgen_default_run(&inputs, &outputs, &devices);
      ethosu_release_driver(driver);

      // Calculate index of max value
      uint8_t max_value = 0;
      int32_t max_index = -1;
      for (unsigned int i = 0; i < output_len; ++i) {
        if (output[i] > max_value) {
          max_value = output[i];
          max_index = i;
        }
      }
      printf("The image has been classified as '%s'\n", labels[max_index]);

      // The FVP will shut down when it receives "EXITTHESIM" on the UART
      printf("EXITTHESIM\n");
      while (1 == 1)
        ;
      return 0;
    }


In addition, you will need these header files from github in your ``./include`` directory:

`include files <https://github.com/apache/tvm/tree/main/apps/microtvm/ethosu/include>`_

.. GENERATED FROM PYTHON SOURCE LINES 440-444

.. note::

  If you'd like to use FreeRTOS for task scheduling and queues, a sample application can be found here
  `demo_freertos.c <https://github.com/apache/tvm/blob/main/apps/microtvm/ethosu/src/demo_freertos.c>`

.. GENERATED FROM PYTHON SOURCE LINES 446-456

Creating the linker script
--------------------------

We need to create a linker script that will be used when we build our application
in the following section. The linker script tells the linker where everything
should be placed in memory. The corstone300.ld linker script below should be
placed in your working directory.

An example linker script for the FVP can be found here
`corstone300.ld <https://github.com/apache/tvm/blob/main/apps/microtvm/ethosu/corstone300.ld>`_

.. GENERATED FROM PYTHON SOURCE LINES 458-465

.. note::

  The code generated by TVM will place the model weights and the Arm(R)
  Ethos(TM)-U55 command stream in a section named ``ethosu_scratch``.
  For a model the size of MobileNet v1, the weights and command stream will not
  fit into the limited SRAM available. For this reason it's important that the
  linker script places the ``ethosu_scratch`` section into DRAM (DDR).

.. GENERATED FROM PYTHON SOURCE LINES 467-476

.. note::

  Before building and running the application, you will need to update your
  PATH environment variable to include the path to cmake 3.19.5 and the FVP.
  For example if you've installed these in ``/opt/arm`` , then you would do
  the following:

  ``export PATH=/opt/arm/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4:/opt/arm/cmake/bin:$PATH``


.. GENERATED FROM PYTHON SOURCE LINES 478-486

Building the demo application using make
----------------------------------------

We can now build the demo application using make. The Makefile should be placed
in your working directory before running ``make`` on the command line:

An example Makefile can be found here:
`Makefile <https://github.com/apache/tvm/blob/main/apps/microtvm/ethosu/Makefile>`_

.. GENERATED FROM PYTHON SOURCE LINES 488-493

.. note::

   If you're using FreeRTOS, the Makefile builds it from the specified FREERTOS_PATH:
    ``make FREERTOS_PATH=<FreeRTOS directory>``


.. GENERATED FROM PYTHON SOURCE LINES 495-575

Running the demo application
----------------------------

Finally, we can run our demo appliction on the Fixed Virtual Platform (FVP),
by using the following command:

.. code-block:: bash

    FVP_Corstone_SSE-300_Ethos-U55 -C cpu0.CFGDTCMSZ=15 \
    -C cpu0.CFGITCMSZ=15 -C mps3_board.uart0.out_file=\"-\" -C mps3_board.uart0.shutdown_tag=\"EXITTHESIM\" \
    -C mps3_board.visualisation.disable-visualisation=1 -C mps3_board.telnetterminal0.start_telnet=0 \
    -C mps3_board.telnetterminal1.start_telnet=0 -C mps3_board.telnetterminal2.start_telnet=0 -C mps3_board.telnetterminal5.start_telnet=0 \
    -C ethosu.extra_args="--fast" \
    -C ethosu.num_macs=256 ./build/demo

You should see the following output displayed in your console window:

.. code-block:: text

    telnetterminal0: Listening for serial connection on port 5000
    telnetterminal1: Listening for serial connection on port 5001
    telnetterminal2: Listening for serial connection on port 5002
    telnetterminal5: Listening for serial connection on port 5003

        Ethos-U rev dedfa618 --- Jan 12 2021 23:03:55
        (C) COPYRIGHT 2019-2021 Arm Limited
        ALL RIGHTS RESERVED

    Starting Demo
    ethosu_init. base_address=0x48102000, fast_memory=0x0, fast_memory_size=0, secure=1, privileged=1
    ethosu_register_driver: New NPU driver at address 0x20000de8 is registered.
    CMD=0x00000000
    Soft reset NPU
    Allocating memory
    Running inference
    ethosu_find_and_reserve_driver - Driver 0x20000de8 reserved.
    ethosu_invoke
    CMD=0x00000004
    QCONFIG=0x00000002
    REGIONCFG0=0x00000003
    REGIONCFG1=0x00000003
    REGIONCFG2=0x00000013
    REGIONCFG3=0x00000053
    REGIONCFG4=0x00000153
    REGIONCFG5=0x00000553
    REGIONCFG6=0x00001553
    REGIONCFG7=0x00005553
    AXI_LIMIT0=0x0f1f0000
    AXI_LIMIT1=0x0f1f0000
    AXI_LIMIT2=0x0f1f0000
    AXI_LIMIT3=0x0f1f0000
    ethosu_invoke OPTIMIZER_CONFIG
    handle_optimizer_config:
    Optimizer release nbr: 0 patch: 1
    Optimizer config cmd_stream_version: 0 macs_per_cc: 8 shram_size: 48 custom_dma: 0
    Optimizer config Ethos-U version: 1.0.6
    Ethos-U config cmd_stream_version: 0 macs_per_cc: 8 shram_size: 48 custom_dma: 0
    Ethos-U version: 1.0.6
    ethosu_invoke NOP
    ethosu_invoke NOP
    ethosu_invoke NOP
    ethosu_invoke COMMAND_STREAM
    handle_command_stream: cmd_stream=0x61025be0, cms_length 1181
    QBASE=0x0000000061025be0, QSIZE=4724, base_pointer_offset=0x00000000
    BASEP0=0x0000000061026e60
    BASEP1=0x0000000060002f10
    BASEP2=0x0000000060002f10
    BASEP3=0x0000000061000fb0
    BASEP4=0x0000000060000fb0
    CMD=0x000Interrupt. status=0xffff0022, qread=4724
    CMD=0x00000006
    00006
    CMD=0x0000000c
    ethosu_release_driver - Driver 0x20000de8 released
    The image has been classified as 'tabby'
    EXITTHESIM
    Info: /OSCI/SystemC: Simulation stopped by user.

You should see near the end of the output that the image has been correctly
classified as 'tabby'.


.. _sphx_glr_download_how_to_work_with_microtvm_micro_ethosu.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: micro_ethosu.py <micro_ethosu.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: micro_ethosu.ipynb <micro_ethosu.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_