.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "how_to/work_with_microtvm/micro_train.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_how_to_work_with_microtvm_micro_train.py: .. _microtvm-train-arduino: Training Vision Models for microTVM on Arduino ============================================== **Author**: `Gavin Uberti `_ This tutorial shows how MobileNetV1 models can be trained to fit on embedded devices, and how those models can be deployed to Arduino using TVM. .. GENERATED FROM PYTHON SOURCE LINES 30-93 .. note:: This tutorial is best viewed as a Jupyter Notebook. You can download and run it locally using the link at the bottom of this page, or open it online for free using Google Colab. Click the icon below to open in Google Colab. .. image:: https://raw.githubusercontent.com/tlc-pack/web-data/main/images/utilities/colab_button.png :align: center :target: https://colab.research.google.com/github/apache/tvm-site/blob/asf-site/docs/_downloads/a7c7ea4b5017ae70db1f51dd8e6dcd82/micro_train.ipynb :width: 300px Motivation ---------- When building IOT devices, we often want them to **see and understand** the world around them. This can take many forms, but often times a device will want to know if a certain **kind of object** is in its field of vision. For example, a security camera might look for **people**, so it can decide whether to save a video to memory. A traffic light might look for **cars**, so it can judge which lights should change first. Or a forest camera might look for a **kind of animal**, so they can estimate how large the animal population is. To make these devices affordable, we would like them to need only a low-cost processor like the `nRF52840 `_ (costing five dollars each on Mouser) or the `RP2040 `_ (just $1.45 each!). These devices have very little memory (~250 KB RAM), meaning that no conventional edge AI vision model (like MobileNet or EfficientNet) will be able to run. In this tutorial, we will show how these models can be modified to work around this requirement. Then, we will use TVM to compile and deploy it for an Arduino that uses one of these processors. Installing the Prerequisites ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This tutorial will use TensorFlow to train the model - a widely used machine learning library created by Google. TensorFlow is a very low-level library, however, so we will the Keras interface to talk to TensorFlow. We will also use TensorFlow Lite to perform quantization on our model, as TensorFlow by itself does not support this. Once we have our generated model, we will use TVM to compile and test it. To avoid having to build from source, we'll install ``tlcpack`` - a community build of TVM. Lastly, we'll also install ``imagemagick`` and ``curl`` to preprocess data: .. code-block:: bash %%bash pip install -q tensorflow tflite pip install -q tlcpack-nightly -f https://tlcpack.ai/wheels apt-get -qq install imagemagick curl # Install Arduino CLI and library for Nano 33 BLE curl -fsSL https://raw.githubusercontent.com/arduino/arduino-cli/master/install.sh | sh /content/bin/arduino-cli core update-index /content/bin/arduino-cli core install arduino:mbed_nano Using the GPU ^^^^^^^^^^^^^ This tutorial demonstrates training a neural network, which is requires a lot of computing power and will go much faster if you have a GPU. If you are viewing this tutorial on Google Colab, you can enable a GPU by going to **Runtime->Change runtime type** and selecting "GPU" as our hardware accelerator. If you are running locally, you can `follow TensorFlow's guide `_ instead. We can test our GPU installation with the following code: .. GENERATED FROM PYTHON SOURCE LINES 93-102 .. code-block:: default import tensorflow as tf if not tf.test.gpu_device_name(): print("No GPU was detected!") print("Model training will take much longer (~30 minutes instead of ~5)") else: print("GPU detected - you're good to go.") .. rst-class:: sphx-glr-script-out .. code-block:: none No GPU was detected! Model training will take much longer (~30 minutes instead of ~5) .. GENERATED FROM PYTHON SOURCE LINES 103-109 Choosing Our Work Dir ^^^^^^^^^^^^^^^^^^^^^ We need to pick a directory where our image datasets, trained model, and eventual Arduino sketch will all live. If running on Google Colab, we'll save everything in ``/root`` (aka ``~``) but you'll probably want to store it elsewhere if running locally. Note that this variable only affects Python scripts - you'll have to adjust the Bash commands too. .. GENERATED FROM PYTHON SOURCE LINES 109-114 .. code-block:: default import os FOLDER = "/root" .. GENERATED FROM PYTHON SOURCE LINES 120-162 Downloading the Data -------------------- Convolutional neural networks usually learn by looking at many images, along with labels telling the network what those images are. To get these images, we'll need a publicly available dataset with thousands of images of all sorts of objects and labels of what's in each image. We'll also need a bunch of images that **aren't** of cars, as we're trying to distinguish these two classes. In this tutorial, we'll create a model to detect if an image contains a **car**, but you can use whatever category you like! Just change the source URL below to one containing images of another type of object. To get our car images, we'll be downloading the `Stanford Cars dataset `_, which contains 16,185 full color images of cars. We'll also need images of random things that aren't cars, so we'll use the `COCO 2017 `_ validation set (it's smaller, and thus faster to download than the full training set. Training on the full data set would yield better results). Note that there are some cars in the COCO 2017 data set, but it's a small enough fraction not to matter - just keep in mind that this will drive down our percieved accuracy slightly. We could use the TensorFlow dataloader utilities, but we'll instead do it manually to make sure it's easy to change the datasets being used. We'll end up with the following file hierarchy: .. code-block:: /root ├── images │ ├── object │ │ ├── 000001.jpg │ │ │ ... │ │ └── 016185.jpg │ ├── object.tgz │ ├── random │ │ ├── 000000000139.jpg │ │ │ ... │ │ └── 000000581781.jpg │ └── random.zip We should also note that Stanford cars has 8k images, while the COCO 2017 validation set is 5k images - it is not a 50/50 split! If we wanted to, we could weight these classes differently during training to correct for this, but training will still work if we ignore it. It should take about **2 minutes** to download the Stanford Cars, while COCO 2017 validation will take **1 minute**. .. GENERATED FROM PYTHON SOURCE LINES 162-183 .. code-block:: default import os import shutil import urllib.request # Download datasets os.makedirs(f"{FOLDER}/downloads") os.makedirs(f"{FOLDER}/images") urllib.request.urlretrieve( "https://data.deepai.org/stanfordcars.zip", f"{FOLDER}/downloads/target.zip" ) urllib.request.urlretrieve( "http://images.cocodataset.org/zips/val2017.zip", f"{FOLDER}/downloads/random.zip" ) # Extract them and rename their folders shutil.unpack_archive(f"{FOLDER}/downloads/target.zip", f"{FOLDER}/downloads") shutil.unpack_archive(f"{FOLDER}/downloads/random.zip", f"{FOLDER}/downloads") shutil.move(f"{FOLDER}/downloads/cars_train/cars_train", f"{FOLDER}/images/target") shutil.move(f"{FOLDER}/downloads/val2017", f"{FOLDER}/images/random") .. rst-class:: sphx-glr-script-out .. code-block:: none '/tmp/tmpnts471in/images/random' .. GENERATED FROM PYTHON SOURCE LINES 184-204 Loading the Data ---------------- Currently, our data is stored on-disk as JPG files of various sizes. To train with it, we'll have to load the images into memory, resize them to be 64x64, and convert them to raw, uncompressed data. Keras's ``image_dataset_from_directory`` will take care of most of this, though it loads images such that each pixel value is a float from 0 to 255. We'll also need to load labels, though Keras will help with this. From our subdirectory structure, it knows the images in ``/objects`` are one class, and those in ``/random`` another. Setting ``label_mode='categorical'`` tells Keras to convert these into **categorical labels** - a 2x1 vector that's either ``[1, 0]`` for an object of our target class, or ``[0, 1]`` vector for anything else. We'll also set ``shuffle=True`` to randomize the order of our examples. We will also **batch** the data - grouping samples into clumps to make our training go faster. Setting ``batch_size = 32`` is a decent number. Lastly, in machine learning we generally want our inputs to be small numbers. We'll thus use a ``Rescaling`` layer to change our images such that each pixel is a float between ``0.0`` and ``1.0``, instead of ``0`` to ``255``. We need to be careful not to rescale our categorical labels though, so we'll use a ``lambda`` function. .. GENERATED FROM PYTHON SOURCE LINES 204-216 .. code-block:: default IMAGE_SIZE = (64, 64, 3) unscaled_dataset = tf.keras.utils.image_dataset_from_directory( f"{FOLDER}/images", batch_size=32, shuffle=True, label_mode="categorical", image_size=IMAGE_SIZE[0:2], ) rescale = tf.keras.layers.Rescaling(scale=1.0 / 255) full_dataset = unscaled_dataset.map(lambda im, lbl: (rescale(im), lbl)) .. rst-class:: sphx-glr-script-out .. code-block:: none Found 13144 files belonging to 2 classes. .. GENERATED FROM PYTHON SOURCE LINES 217-222 What's Inside Our Dataset? ^^^^^^^^^^^^^^^^^^^^^^^^^^ Before giving this data set to our neural network, we ought to give it a quick visual inspection. Does the data look properly transformed? Do the labels seem appropriate? And what's our ratio of objects to other stuff? We can display some examples from our datasets using ``matplotlib``: .. GENERATED FROM PYTHON SOURCE LINES 222-241 .. code-block:: default import matplotlib.pyplot as plt num_target_class = len(os.listdir(f"{FOLDER}/images/target/")) num_random_class = len(os.listdir(f"{FOLDER}/images/random/")) print(f"{FOLDER}/images/target contains {num_target_class} images") print(f"{FOLDER}/images/random contains {num_random_class} images") # Show some samples and their labels SAMPLES_TO_SHOW = 10 plt.figure(figsize=(20, 10)) for i, (image, label) in enumerate(unscaled_dataset.unbatch()): if i >= SAMPLES_TO_SHOW: break ax = plt.subplot(1, SAMPLES_TO_SHOW, i + 1) plt.imshow(image.numpy().astype("uint8")) plt.title(list(label.numpy())) plt.axis("off") .. image-sg:: /how_to/work_with_microtvm/images/sphx_glr_micro_train_001.png :alt: [1.0, 0.0], [1.0, 0.0], [1.0, 0.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [1.0, 0.0], [0.0, 1.0], [1.0, 0.0] :srcset: /how_to/work_with_microtvm/images/sphx_glr_micro_train_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none /tmp/tmpnts471in/images/target contains 8144 images /tmp/tmpnts471in/images/random contains 5000 images .. GENERATED FROM PYTHON SOURCE LINES 242-252 Validating our Accuracy ^^^^^^^^^^^^^^^^^^^^^^^ While developing our model, we'll often want to check how accurate it is (e.g. to see if it improves during training). How do we do this? We could just train it on *all* of the data, and then ask it to classify that same data. However, our model could cheat by just memorizing all of the samples, which would make it *appear* to have very high accuracy, but perform very badly in reality. In practice, this "memorizing" is called **overfitting**. To prevent this, we will set aside some of the data (we'll use 20%) as a **validation set**. Our model will never be trained on validation data - we'll only use it to check our model's accuracy. .. GENERATED FROM PYTHON SOURCE LINES 252-257 .. code-block:: default num_batches = len(full_dataset) train_dataset = full_dataset.take(int(num_batches * 0.8)) validation_dataset = full_dataset.skip(len(train_dataset)) .. GENERATED FROM PYTHON SOURCE LINES 258-305 Loading the Data ---------------- In the past decade, `convolutional neural networks `_ have been widely adopted for image classification tasks. State-of-the-art models like `EfficientNet V2 `_ are able to perform image classification better than even humans! Unfortunately, these models have tens of millions of parameters, and thus won't fit on cheap security camera computers. Our applications generally don't need perfect accuracy - 90% is good enough. We can thus use the older and smaller MobileNet V1 architecture. But this *still* won't be small enough - by default, MobileNet V1 with 224x224 inputs and alpha 1.0 takes ~50 MB to just **store**. To reduce the size of the model, there are three knobs we can turn. First, we can reduce the size of the input images from 224x224 to 96x96 or 64x64, and Keras makes it easy to do this. We can also reduce the **alpha** of the model, from 1.0 to 0.25, which downscales the width of the network (and the number of filters) by a factor of four. And if we were really strapped for space, we could reduce the number of **channels** by making our model take grayscale images instead of RGB ones. In this tutorial, we will use an RGB 64x64 input image and alpha 0.25. This is not quite ideal, but it allows the finished model to fit in 192 KB of RAM, while still letting us perform transfer learning using the official TensorFlow source models (if we used alpha <0.25 or a grayscale input, we wouldn't be able to do this). What is Transfer Learning? ^^^^^^^^^^^^^^^^^^^^^^^^^^ Deep learning has `dominated image classification `_ for a long time, but training neural networks takes a lot of time. When a neural network is trained "from scratch", its parameters start out randomly initialized, forcing it to learn very slowly how to tell images apart. With transfer learning, we instead start with a neural network that's **already** good at a specific task. In this example, that task is classifying images from `the ImageNet database `_. This means the network already has some object detection capabilities, and is likely closer to what you want then a random model would be. This works especially well with image processing neural networks like MobileNet. In practice, it turns out the convolutional layers of the model (i.e. the first 90% of the layers) are used for identifying low-level features like lines and shapes - only the last few fully connected layers are used to determine how those shapes make up the objects the network is trying to detect. We can take advantage of this by starting training with a MobileNet model that was trained on ImageNet, and already knows how to identify those lines and shapes. We can then just remove the last few layers from this pretrained model, and add our own final layers. We'll then train this conglomerate model for a few epochs on our cars vs non-cars dataset, to adjust the first layers and train from scratch the last layers. This process of training an already-partially-trained model is called *fine-tuning*. Source MobileNets for transfer learning have been `pretrained by the TensorFlow folks `_, so we can just download the one closest to what we want (the 128x128 input model with 0.25 depth scale). .. GENERATED FROM PYTHON SOURCE LINES 305-317 .. code-block:: default os.makedirs(f"{FOLDER}/models") WEIGHTS_PATH = f"{FOLDER}/models/mobilenet_2_5_128_tf.h5" urllib.request.urlretrieve( "https://storage.googleapis.com/tensorflow/keras-applications/mobilenet/mobilenet_2_5_128_tf.h5", WEIGHTS_PATH, ) pretrained = tf.keras.applications.MobileNet( input_shape=IMAGE_SIZE, weights=WEIGHTS_PATH, alpha=0.25 ) .. GENERATED FROM PYTHON SOURCE LINES 318-324 Modifying Our Network ^^^^^^^^^^^^^^^^^^^^^ As mentioned above, our pretrained model is designed to classify the 1,000 ImageNet categories, but we want to convert it to classify cars. Since only the bottom few layers are task-specific, we'll **cut off the last five layers** of our original model. In their place we'll build our own "tail" to the model by performing respape, dropout, flatten, and softmax operations. .. GENERATED FROM PYTHON SOURCE LINES 324-335 .. code-block:: default model = tf.keras.models.Sequential() model.add(tf.keras.layers.InputLayer(input_shape=IMAGE_SIZE)) model.add(tf.keras.Model(inputs=pretrained.inputs, outputs=pretrained.layers[-5].output)) model.add(tf.keras.layers.Reshape((-1,))) model.add(tf.keras.layers.Dropout(0.1)) model.add(tf.keras.layers.Flatten()) model.add(tf.keras.layers.Dense(2, activation="softmax")) .. GENERATED FROM PYTHON SOURCE LINES 336-349 Fine Tuning Our Network ^^^^^^^^^^^^^^^^^^^^^^^ When training neural networks, we must set a parameter called the **learning rate** that controls how fast our network learns. It must be set carefully - too slow, and our network will take forever to train; too fast, and our network won't be able to learn some fine details. Generally for Adam (the optimizer we're using), ``0.001`` is a pretty good learning rate (and is what's recommended in the `original paper `_). However, in this case ``0.0005`` seems to work a little better. We'll also pass the validation set from earlier to ``model.fit``. This will evaluate how good our model is each time we train it, and let us track how our model is improving. Once training is finished, the model should have a validation accuracy around ``0.98`` (meaning it was right 98% of the time on our validation set). .. GENERATED FROM PYTHON SOURCE LINES 349-357 .. code-block:: default model.compile( optimizer=tf.keras.optimizers.Adam(learning_rate=0.0005), loss="categorical_crossentropy", metrics=["accuracy"], ) model.fit(train_dataset, validation_data=validation_dataset, epochs=3, verbose=2) .. rst-class:: sphx-glr-script-out .. code-block:: none Epoch 1/3 328/328 - 46s - loss: 0.2201 - accuracy: 0.9246 - val_loss: 0.1470 - val_accuracy: 0.9592 - 46s/epoch - 141ms/step Epoch 2/3 328/328 - 43s - loss: 0.1036 - accuracy: 0.9611 - val_loss: 0.1175 - val_accuracy: 0.9615 - 43s/epoch - 132ms/step Epoch 3/3 328/328 - 43s - loss: 0.0696 - accuracy: 0.9748 - val_loss: 0.1467 - val_accuracy: 0.9551 - 43s/epoch - 131ms/step .. GENERATED FROM PYTHON SOURCE LINES 358-379 Quantization ------------ We've done a decent job of reducing our model's size so far - changing the input dimension, along with removing the bottom layers reduced the model to just 219k parameters. However, each of these parameters is a ``float32`` that takes four bytes, so our model will take up almost one MB! Additionally, it might be the case that our hardware doesn't have built-in support for floating point numbers. While most high-memory Arduinos (like the Nano 33 BLE) do have hardware support, some others (like the Arduino Due) do not. On any boards *without* dedicated hardware support, floating point multiplication will be extremely slow. To address both issues we will **quantize** the model - representing the weights as eight bit integers. It's more complex than just rounding, though - to get the best performance, TensorFlow tracks how each neuron in our model activates, so we can figure out how most accurately simulate the neuron's original activations with integer operations. We will help TensorFlow do this by creating a representative dataset - a subset of the original that is used for tracking how those neurons activate. We'll then pass this into a ``TFLiteConverter`` (Keras itself does not have quantization support) with an ``Optimize`` flag to tell TFLite to perform the conversion. By default, TFLite keeps the inputs and outputs of our model as floats, so we must explicitly tell it to avoid this behavior. .. GENERATED FROM PYTHON SOURCE LINES 379-395 .. code-block:: default def representative_dataset(): for image_batch, label_batch in full_dataset.take(10): yield [image_batch] converter = tf.lite.TFLiteConverter.from_keras_model(model) converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = representative_dataset converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] converter.inference_input_type = tf.uint8 converter.inference_output_type = tf.uint8 quantized_model = converter.convert() .. rst-class:: sphx-glr-script-out .. code-block:: none /usr/local/lib/python3.7/dist-packages/tensorflow/lite/python/convert.py:766: UserWarning: Statistics for quantized inputs were expected, but not specified; continuing anyway. warnings.warn("Statistics for quantized inputs were expected, but not " .. GENERATED FROM PYTHON SOURCE LINES 396-403 Download the Model if Desired ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We've now got a finished model that you can use locally or in other tutorials (try autotuning this model or viewing it on `https://netron.app/ `_). But before we do those things, we'll have to write it to a file (``quantized.tflite``). If you're running this tutorial on Google Colab, you'll have to uncomment the last two lines to download the file after writing it. .. GENERATED FROM PYTHON SOURCE LINES 403-410 .. code-block:: default QUANTIZED_MODEL_PATH = f"{FOLDER}/models/quantized.tflite" with open(QUANTIZED_MODEL_PATH, "wb") as f: f.write(quantized_model) # from google.colab import files # files.download(QUANTIZED_MODEL_PATH) .. GENERATED FROM PYTHON SOURCE LINES 411-451 Compiling With TVM For Arduino ------------------------------ TensorFlow has a built-in framework for deploying to microcontrollers - `TFLite Micro `_. However, it's poorly supported by development boards and does not support autotuning. We will use Apache TVM instead. TVM can be used either with its command line interface (``tvmc``) or with its Python interface. The Python interface is fully-featured and more stable, so we'll use it here. TVM is an optimizing compiler, and optimizations to our model are performed in stages via **intermediate representations**. The first of these is `Relay `_ a high-level intermediate representation emphasizing portability. The conversion from ``.tflite`` to Relay is done without any knowledge of our "end goal" - the fact we intend to run this model on an Arduino. Choosing an Arduino Board ^^^^^^^^^^^^^^^^^^^^^^^^^ Next, we'll have to decide exactly which Arduino board to use. The Arduino sketch that we ultimately generate should be compatible with any board, but knowing which board we are using in advance allows TVM to adjust its compilation strategy to get better performance. There is one catch - we need enough **memory** (flash and RAM) to be able to run our model. We won't ever be able to run a complex vision model like a MobileNet on an Arduino Uno - that board only has 2 kB of RAM and 32 kB of flash! Our model has ~200,000 parameters, so there is just no way it could fit. For this tutorial, we will use the Nano 33 BLE, which has 1 MB of flash memory and 256 KB of RAM. However, any other Arduino with those specs or better should also work. Generating our project ^^^^^^^^^^^^^^^^^^^^^^ Next, we'll compile the model to TVM's MLF (model library format) intermediate representation, which consists of C/C++ code and is designed for autotuning. To improve performance, we'll tell TVM that we're compiling for the ``nrf52840`` microprocessor (the one the Nano 33 BLE uses). We'll also tell it to use the C runtime (abbreviated ``crt``) and to use ahead-of-time memory allocation (abbreviated ``aot``, which helps reduce the model's memory footprint). Lastly, we will disable vectorization with ``"tir.disable_vectorize": True``, as C has no native vectorized types. Once we have set these configuration parameters, we will call ``tvm.relay.build`` to compile our Relay model into the MLF intermediate representation. From here, we just need to call ``tvm.micro.generate_project`` and pass in the Arduino template project to finish compilation. .. GENERATED FROM PYTHON SOURCE LINES 451-487 .. code-block:: default import shutil import tflite import tvm # Method to load model is different in TFLite 1 vs 2 try: # TFLite 2.1 and above tflite_model = tflite.Model.GetRootAsModel(quantized_model, 0) except AttributeError: # Fall back to TFLite 1.14 method tflite_model = tflite.Model.Model.GetRootAsModel(quantized_model, 0) # Convert to the Relay intermediate representation mod, params = tvm.relay.frontend.from_tflite(tflite_model) # Set configuration flags to improve performance target = tvm.target.target.micro("nrf52840") runtime = tvm.relay.backend.Runtime("crt") executor = tvm.relay.backend.Executor("aot", {"unpacked-api": True}) # Convert to the MLF intermediate representation with tvm.transform.PassContext(opt_level=3, config={"tir.disable_vectorize": True}): mod = tvm.relay.build(mod, target, runtime=runtime, executor=executor, params=params) # Generate an Arduino project from the MLF intermediate representation shutil.rmtree(f"{FOLDER}/models/project", ignore_errors=True) arduino_project = tvm.micro.generate_project( tvm.micro.get_microtvm_template_projects("arduino"), mod, f"{FOLDER}/models/project", { "arduino_board": "nano33ble", "arduino_cli_cmd": "/content/bin/arduino-cli", "project_type": "example_project", }, ) .. GENERATED FROM PYTHON SOURCE LINES 488-529 Testing our Arduino Project --------------------------- Consider the following two 224x224 images from the author's camera roll - one of a car, one not. We will test our Arduino project by loading both of these images and executing the compiled model on them. .. image:: https://raw.githubusercontent.com/tlc-pack/web-data/main/testdata/microTVM/data/model_train_images_combined.png :align: center :height: 200px :width: 600px Currently, these are 224x224 PNG images we can download from Imgur. Before we can feed in these images, we'll need to resize and convert them to raw data, which can be done with ``imagemagick``. It's also challenging to load raw data onto an Arduino, as only C/CPP files (and similar) are compiled. We can work around this by embedding our raw data in a hard-coded C array with the built-in utility ``bin2c`` that will output a file like below: .. code-block:: c static const unsigned char CAR_IMAGE[] = { 0x22,0x23,0x14,0x22, ... 0x07,0x0e,0x08,0x08 }; We can do both of these things with a few lines of Bash code: .. code-block:: bash %%bash mkdir -p ~/tests curl "https://i.imgur.com/JBbEhxN.png" -o ~/tests/car_224.png convert ~/tests/car_224.png -resize 64 ~/tests/car_64.png stream ~/tests/car_64.png ~/tests/car.raw bin2c -c -st ~/tests/car.raw --name CAR_IMAGE > ~/models/project/car.c curl "https://i.imgur.com/wkh7Dx2.png" -o ~/tests/catan_224.png convert ~/tests/catan_224.png -resize 64 ~/tests/catan_64.png stream ~/tests/catan_64.png ~/tests/catan.raw bin2c -c -st ~/tests/catan.raw --name CATAN_IMAGE > ~/models/project/catan.c .. GENERATED FROM PYTHON SOURCE LINES 531-571 Writing our Arduino Script -------------------------- We now need a little bit of Arduino code to read the two binary arrays we just generated, run the model on them, and log the output to the serial monitor. This file will replace ``arduino_sketch.ino`` as the main file of our sketch. You'll have to copy this code in manually.. .. code-block:: c %%writefile /root/models/project.ino #include "src/model.h" #include "car.c" #include "catan.c" void setup() { Serial.begin(9600); TVMInitialize(); } void loop() { uint8_t result_data[2]; Serial.println("Car results:"); TVMExecute(const_cast(CAR_IMAGE), result_data); Serial.print(result_data[0]); Serial.print(", "); Serial.print(result_data[1]); Serial.println(); Serial.println("Other object results:"); TVMExecute(const_cast(CATAN_IMAGE), result_data); Serial.print(result_data[0]); Serial.print(", "); Serial.print(result_data[1]); Serial.println(); delay(1000); } Compiling Our Code ^^^^^^^^^^^^^^^^^^ Now that our project has been generated, TVM's job is mostly done! We can still call ``arduino_project.build()`` and ``arduino_project.upload()``, but these just use ``arduino-cli``'s compile and flash commands underneath. We could also begin autotuning our model, but that's a subject for a different tutorial. To finish up, we'll verify no compiler errors are thrown by our project: .. GENERATED FROM PYTHON SOURCE LINES 571-576 .. code-block:: default shutil.rmtree(f"{FOLDER}/models/project/build", ignore_errors=True) arduino_project.build() print("Compilation succeeded!") .. rst-class:: sphx-glr-script-out .. code-block:: none Compilation succeeded! .. GENERATED FROM PYTHON SOURCE LINES 582-589 Uploading to Our Device ----------------------- The very last step is uploading our sketch to an Arduino to make sure our code works properly. Unfortunately, we can't do that from Google Colab, so we'll have to download our sketch. This is simple enough to do - we'll just turn our project into a `.zip` archive, and call `files.download`. If you're running on Google Colab, you'll have to uncomment the last two lines to download the file after writing it. .. GENERATED FROM PYTHON SOURCE LINES 589-596 .. code-block:: default ZIP_FOLDER = f"{FOLDER}/models/project" shutil.make_archive(ZIP_FOLDER, "zip", ZIP_FOLDER) # from google.colab import files # files.download(f"{FOLDER}/models/project.zip") .. GENERATED FROM PYTHON SOURCE LINES 617-651 From here, we'll need to open it in the Arduino IDE. You'll have to download the IDE as well as the SDK for whichever board you are using. For certain boards like the Sony SPRESENSE, you may have to change settings to control how much memory you want the board to use. Expected Results ^^^^^^^^^^^^^^^^ If all works as expected, you should see the following output on a Serial monitor: .. code-block:: Car results: 255, 0 Other object results: 0, 255 The first number represents the model's confidence that the object **is** a car and ranges from 0-255. The second number represents the model's confidence that the object **is not** a car and is also 0-255. These results mean the model is very sure that the first image is a car, and the second image is not (which is correct). Hence, our model is working! Summary ------- In this tutorial, we used transfer learning to quickly train an image recognition model to identify cars. We modified its input dimensions and last few layers to make it better at this, and to make it faster and smaller. We then quantified the model and compiled it using TVM to create an Arduino sketch. Lastly, we tested the model using two static images to prove it works as intended. Next Steps ^^^^^^^^^^ From here, we could modify the model to read live images from the camera - we have another Arduino tutorial for how to do that `on GitHub `_. Alternatively, we could also `use TVM's autotuning capabilities `_ to dramatically improve the model's performance. .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 4 minutes 13.387 seconds) .. _sphx_glr_download_how_to_work_with_microtvm_micro_train.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: micro_train.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: micro_train.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_