.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "how_to/work_with_microtvm/micro_autotune.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_how_to_work_with_microtvm_micro_autotune.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_how_to_work_with_microtvm_micro_autotune.py:


.. _tutorial-micro-autotune:

Autotuning with microTVM
=========================
**Authors**:
`Andrew Reusch <https://github.com/areusch>`_,
`Mehrdad Hessar <https://github.com/mehrdadh>`_

This tutorial explains how to autotune a model using the C runtime.

.. GENERATED FROM PYTHON SOURCE LINES 29-41

.. code-block:: default


    import os
    import json
    import numpy as np
    import pathlib

    import tvm
    from tvm.relay.backend import Runtime

    use_physical_hw = bool(os.getenv("TVM_MICRO_USE_HW"))


.. GENERATED FROM PYTHON SOURCE LINES 47-53

Defining the model
###################

 To begin with, define a model in Relay to be executed on-device. Then create an IRModule from relay model and
 fill parameters with random numbers.


.. GENERATED FROM PYTHON SOURCE LINES 53-78

.. code-block:: default


    data_shape = (1, 3, 10, 10)
    weight_shape = (6, 3, 5, 5)

    data = tvm.relay.var("data", tvm.relay.TensorType(data_shape, "float32"))
    weight = tvm.relay.var("weight", tvm.relay.TensorType(weight_shape, "float32"))

    y = tvm.relay.nn.conv2d(
        data,
        weight,
        padding=(2, 2),
        kernel_size=(5, 5),
        kernel_layout="OIHW",
        out_dtype="float32",
    )
    f = tvm.relay.Function([data, weight], y)

    relay_mod = tvm.IRModule.from_expr(f)
    relay_mod = tvm.relay.transform.InferType()(relay_mod)

    weight_sample = np.random.rand(
        weight_shape[0], weight_shape[1], weight_shape[2], weight_shape[3]
    ).astype("float32")
    params = {"weight": weight_sample}


.. GENERATED FROM PYTHON SOURCE LINES 79-90

Defining the target
######################
 Now we define the TVM target that describes the execution environment. This looks very similar
 to target definitions from other microTVM tutorials. Alongside this we pick the C Runtime to code
 generate our model against.

 When running on physical hardware, choose a target and a board that
 describe the hardware. There are multiple hardware targets that could be selected from
 PLATFORM list in this tutorial. You can chose the platform by passing --platform argument when running
 this tutorial.


.. GENERATED FROM PYTHON SOURCE LINES 90-107

.. code-block:: default


    RUNTIME = Runtime("crt", {"system-lib": True})
    TARGET = tvm.target.target.micro("host")

    # Compiling for physical hardware
    # --------------------------------------------------------------------------
    #  When running on physical hardware, choose a TARGET and a BOARD that describe the hardware. The
    #  STM32L4R5ZI Nucleo target and board is chosen in the example below.
    if use_physical_hw:
        boards_file = pathlib.Path(tvm.micro.get_microtvm_template_projects("zephyr")) / "boards.json"
        with open(boards_file) as f:
            boards = json.load(f)

        BOARD = os.getenv("TVM_MICRO_BOARD", default="nucleo_l4r5zi")
        TARGET = tvm.target.target.micro(boards[BOARD]["model"])


.. GENERATED FROM PYTHON SOURCE LINES 108-117

Extracting tuning tasks
########################
 Not all operators in the Relay program printed above can be tuned. Some are so trivial that only
 a single implementation is defined; others don't make sense as tuning tasks. Using
 `extract_from_program`, you can produce a list of tunable tasks.

 Because task extraction involves running the compiler, we first configure the compiler's
 transformation passes; we'll apply the same configuration later on during autotuning.


.. GENERATED FROM PYTHON SOURCE LINES 117-123

.. code-block:: default


    pass_context = tvm.transform.PassContext(opt_level=3, config={"tir.disable_vectorize": True})
    with pass_context:
        tasks = tvm.autotvm.task.extract_from_program(relay_mod["main"], {}, TARGET)
    assert len(tasks) > 0


.. GENERATED FROM PYTHON SOURCE LINES 124-134

Configuring microTVM
#####################
 Before autotuning, we need to define a module loader and then pass that to
 a `tvm.autotvm.LocalBuilder`. Then we create a `tvm.autotvm.LocalRunner` and use
 both builder and runner to generates multiple measurements for auto tunner.

 In this tutorial, we have the option to use x86 host as an example or use different targets
 from Zephyr RTOS. If you choose pass `--platform=host` to this tutorial it will uses x86. You can
 choose other options by choosing from `PLATFORM` list.


.. GENERATED FROM PYTHON SOURCE LINES 134-172

.. code-block:: default


    module_loader = tvm.micro.AutoTvmModuleLoader(
        template_project_dir=pathlib.Path(tvm.micro.get_microtvm_template_projects("crt")),
        project_options={"verbose": False},
    )
    builder = tvm.autotvm.LocalBuilder(
        n_parallel=1,
        build_kwargs={"build_option": {"tir.disable_vectorize": True}},
        do_fork=True,
        build_func=tvm.micro.autotvm_build_func,
        runtime=RUNTIME,
    )
    runner = tvm.autotvm.LocalRunner(number=1, repeat=1, timeout=100, module_loader=module_loader)

    measure_option = tvm.autotvm.measure_option(builder=builder, runner=runner)

    # Compiling for physical hardware
    if use_physical_hw:
        module_loader = tvm.micro.AutoTvmModuleLoader(
            template_project_dir=pathlib.Path(tvm.micro.get_microtvm_template_projects("zephyr")),
            project_options={
                "zephyr_board": BOARD,
                "west_cmd": "west",
                "verbose": False,
                "project_type": "host_driven",
            },
        )
        builder = tvm.autotvm.LocalBuilder(
            n_parallel=1,
            build_kwargs={"build_option": {"tir.disable_vectorize": True}},
            do_fork=False,
            build_func=tvm.micro.autotvm_build_func,
            runtime=RUNTIME,
        )
        runner = tvm.autotvm.LocalRunner(number=1, repeat=1, timeout=100, module_loader=module_loader)

        measure_option = tvm.autotvm.measure_option(builder=builder, runner=runner)


.. GENERATED FROM PYTHON SOURCE LINES 173-177

Run Autotuning
#########################
 Now we can run autotuning separately on each extracted task on microTVM device.


.. GENERATED FROM PYTHON SOURCE LINES 177-195

.. code-block:: default


    autotune_log_file = pathlib.Path("microtvm_autotune.log.txt")
    if os.path.exists(autotune_log_file):
        os.remove(autotune_log_file)

    num_trials = 10
    for task in tasks:
        tuner = tvm.autotvm.tuner.GATuner(task)
        tuner.tune(
            n_trial=num_trials,
            measure_option=measure_option,
            callbacks=[
                tvm.autotvm.callback.log_to_file(str(autotune_log_file)),
                tvm.autotvm.callback.progress_bar(num_trials, si_prefix="M"),
            ],
            si_prefix="M",
        )


.. GENERATED FROM PYTHON SOURCE LINES 196-202

Timing the untuned program
###########################
 For comparison, let's compile and run the graph without imposing any autotuning schedules. TVM
 will select a randomly-tuned implementation for each operator, which should not perform as well as
 the tuned operator.


.. GENERATED FROM PYTHON SOURCE LINES 202-240

.. code-block:: default


    with pass_context:
        lowered = tvm.relay.build(relay_mod, target=TARGET, runtime=RUNTIME, params=params)

    temp_dir = tvm.contrib.utils.tempdir()
    project = tvm.micro.generate_project(
        str(tvm.micro.get_microtvm_template_projects("crt")),
        lowered,
        temp_dir / "project",
        {"verbose": False},
    )

    # Compiling for physical hardware
    if use_physical_hw:
        temp_dir = tvm.contrib.utils.tempdir()
        project = tvm.micro.generate_project(
            str(tvm.micro.get_microtvm_template_projects("zephyr")),
            lowered,
            temp_dir / "project",
            {
                "zephyr_board": BOARD,
                "west_cmd": "west",
                "verbose": False,
                "project_type": "host_driven",
            },
        )

    project.build()
    project.flash()
    with tvm.micro.Session(project.transport()) as session:
        debug_module = tvm.micro.create_local_debug_executor(
            lowered.get_graph_json(), session.get_system_lib(), session.device
        )
        debug_module.set_input(**lowered.get_params())
        print("########## Build without Autotuning ##########")
        debug_module.run()
        del debug_module


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    ########## Build without Autotuning ##########
    Node Name                                     Ops                                           Time(us)  Time(%)  Shape              Inputs  Outputs  Measurements(us)  
    ---------                                     ---                                           --------  -------  -----              ------  -------  ----------------  
    tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  311.0     98.72    (1, 2, 10, 10, 3)  2       1        [311.0]           
    tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       3.066     0.973    (1, 6, 10, 10)     1       1        [3.066]           
    tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.967     0.307    (1, 1, 10, 10, 3)  1       1        [0.967]           
    Total_time                                    -                                             315.033   -        -                  -       -        -                 


.. GENERATED FROM PYTHON SOURCE LINES 241-244

Timing the tuned program
#########################
 Once autotuning completes, you can time execution of the entire program using the Debug Runtime:

.. GENERATED FROM PYTHON SOURCE LINES 244-282

.. code-block:: default


    with tvm.autotvm.apply_history_best(str(autotune_log_file)):
        with pass_context:
            lowered_tuned = tvm.relay.build(relay_mod, target=TARGET, runtime=RUNTIME, params=params)

    temp_dir = tvm.contrib.utils.tempdir()
    project = tvm.micro.generate_project(
        str(tvm.micro.get_microtvm_template_projects("crt")),
        lowered_tuned,
        temp_dir / "project",
        {"verbose": False},
    )

    # Compiling for physical hardware
    if use_physical_hw:
        temp_dir = tvm.contrib.utils.tempdir()
        project = tvm.micro.generate_project(
            str(tvm.micro.get_microtvm_template_projects("zephyr")),
            lowered_tuned,
            temp_dir / "project",
            {
                "zephyr_board": BOARD,
                "west_cmd": "west",
                "verbose": False,
                "project_type": "host_driven",
            },
        )

    project.build()
    project.flash()
    with tvm.micro.Session(project.transport()) as session:
        debug_module = tvm.micro.create_local_debug_executor(
            lowered_tuned.get_graph_json(), session.get_system_lib(), session.device
        )
        debug_module.set_input(**lowered_tuned.get_params())
        print("########## Build with Autotuning ##########")
        debug_module.run()
        del debug_module


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    ########## Build with Autotuning ##########
    Node Name                                     Ops                                           Time(us)  Time(%)  Shape              Inputs  Outputs  Measurements(us)  
    ---------                                     ---                                           --------  -------  -----              ------  -------  ----------------  
    tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  223.6     98.728   (1, 1, 10, 10, 6)  2       1        [223.6]           
    tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       1.926     0.85     (1, 6, 10, 10)     1       1        [1.926]           
    tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.955     0.422    (1, 1, 10, 10, 3)  1       1        [0.955]           
    Total_time                                    -                                             226.481   -        -                  -       -        -                 


.. _sphx_glr_download_how_to_work_with_microtvm_micro_autotune.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: micro_autotune.py <micro_autotune.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: micro_autotune.ipynb <micro_autotune.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_