.. DO NOT EDIT. THIS FILE WAS AUTOMATICALLY GENERATED BY
.. TVM'S MONKEY-PATCHED VERSION OF SPHINX-GALLERY. TO MAKE
.. CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "how_to/tutorials/e2e_opt_model.py"

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        You can click :ref:`here <sphx_glr_download_how_to_tutorials_e2e_opt_model.py>` to run the Jupyter notebook locally.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_how_to_tutorials_e2e_opt_model.py:


.. _optimize_model:

End-to-End Optimize Model
=========================
This tutorial demonstrates how to optimize a machine learning model using Apache TVM. We will
use a pre-trained ResNet-18 model from PyTorch and end-to-end optimize it using TVM's Relax API.
Please note that default end-to-end optimization may not suit complex models.

.. GENERATED FROM PYTHON SOURCE LINES 30-34

Preparation
-----------
First, we prepare the model and input information. We use a pre-trained ResNet-18 model from
PyTorch.

.. GENERATED FROM PYTHON SOURCE LINES 34-44

.. code-block:: Python


    import os

    import numpy as np
    import torch
    from torch.export import export
    from torchvision.models.resnet import ResNet18_Weights, resnet18

    torch_model = resnet18(weights=ResNet18_Weights.DEFAULT).eval()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /workspace/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
      0%|          | 0.00/44.7M [00:00<?, ?B/s]     67%|██████▋   | 30.0M/44.7M [00:00<00:00, 314MB/s]    100%|██████████| 44.7M/44.7M [00:00<00:00, 330MB/s]


.. GENERATED FROM PYTHON SOURCE LINES 45-62

Review Overall Flow
-------------------
.. figure:: https://raw.githubusercontent.com/tlc-pack/web-data/main/images/design/tvm_overall_flow.svg
   :align: center
   :width: 80%

The overall flow consists of the following steps:

- **Construct or Import a Model**: Construct a neural network model or import a pre-trained
  model from other frameworks (e.g. PyTorch, ONNX), and create the TVM IRModule, which contains
  all the information needed for compilation, including high-level Relax functions for
  computational graph, and low-level TensorIR functions for tensor program.
- **Perform Composable Optimizations**: Perform a series of optimization transformations,
  such as graph optimizations, tensor program optimizations, and library dispatching.
- **Build and Universal Deployment**: Build the optimized model to a deployable module to the
  universal runtime, and execute it on different devices, such as CPU, GPU, or other accelerators.


.. GENERATED FROM PYTHON SOURCE LINES 65-69

Convert the model to IRModule
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Next step, we convert the model to an IRModule using the Relax frontend for PyTorch for further
optimization.

.. GENERATED FROM PYTHON SOURCE LINES 69-89

.. code-block:: Python


    import tvm
    from tvm import relax
    from tvm.relax.frontend.torch import from_exported_program

    # Give an example argument to torch.export
    example_args = (torch.randn(1, 3, 224, 224, dtype=torch.float32),)

    # Skip running in CI environment
    IS_IN_CI = os.getenv("CI", "") == "true"

    if not IS_IN_CI:
        # Convert the model to IRModule
        with torch.no_grad():
            exported_program = export(torch_model, example_args)
            mod = from_exported_program(exported_program, keep_params_as_input=True)

        mod, params = relax.frontend.detach_params(mod)
        mod.show()


.. GENERATED FROM PYTHON SOURCE LINES 90-118

IRModule Optimization
---------------------
Apache TVM provides a flexible way to optimize the IRModule. Everything centered
around IRModule optimization can be composed with existing pipelines. Note that each
transformation can be combined as an optimization pipeline via ``tvm.ir.transform.Sequential``.

In this tutorial, we focus on the end-to-end optimization of the model via auto-tuning. We
leverage MetaSchedule to tune the model and store the tuning logs to the database. We also
apply the database to the model to get the best performance.

The ResNet18 model will be divided into 20 independent tuning tasks during compilation.
To ensure each task receives adequate tuning resources in one iteration while providing
early feedback:

- To quickly observe tuning progress, each task is allocated a maximum of 16 trials per
  iteration (controlled by ``MAX_TRIALS_PER_TASK=16``). We should set ``TOTAL_TRIALS``
  to at least ``320 (20 tasks * 16 trials)`` ensures every task receives one full iteration
  of tuning. We set it to 512 in our configuration to allow for several more iterations,
  aiming to explore a wider parameter space and potentially achieve better performance.
- If ``MAX_TRIALS_PER_TASK == None``, the system defaults to ``TOTAL_TRIALS`` trials per
  task per iteration. An insufficient ``TOTAL_TRIALS`` setting may lead to undersubscribed
  tuning, potentially skipping some tasks entirely. Explicitly setting both parameters
  avoids this issue and provides deterministic resource allocation across all tasks.

Note: These parameter settings are optimized for quick tutorial demonstration. For production
deployments requiring higher performance, we recommend adjusting both ``MAX_TRIALS_PER_TASK``
and ``TOTAL_TRIALS`` to larger values. This allows more extensive search space exploration
and typically yields better performance outcomes.

.. GENERATED FROM PYTHON SOURCE LINES 118-136

.. code-block:: Python


    TOTAL_TRIALS = 512  # Change to 20000 for better performance if needed
    MAX_TRIALS_PER_TASK = 16  # Change to more trials per task for better performance if needed
    target = tvm.target.Target("nvidia/geforce-rtx-3090-ti")  # Change to your target device
    work_dir = "tuning_logs"

    if not IS_IN_CI:
        mod = relax.get_pipeline(
            "static_shape_tuning",
            target=target,
            work_dir=work_dir,
            total_trials=TOTAL_TRIALS,
            max_trials_per_task=MAX_TRIALS_PER_TASK,
        )(mod)

        # Only show the main function
        mod["main"].show()


.. GENERATED FROM PYTHON SOURCE LINES 137-141

Build and Deploy
----------------
Finally, we build the optimized model and deploy it to the target device.
We skip this step in the CI environment.

.. GENERATED FROM PYTHON SOURCE LINES 141-154

.. code-block:: Python


    if not IS_IN_CI:
        with target:
            mod = tvm.s_tir.transform.DefaultGPUSchedule()(mod)
        ex = tvm.compile(mod, target=target)
        dev = tvm.device("cuda", 0)
        vm = relax.VirtualMachine(ex, dev)
        # Need to allocate data and params on GPU device
        gpu_data = tvm.runtime.tensor(np.random.rand(1, 3, 224, 224).astype("float32"), dev)
        gpu_params = [tvm.runtime.tensor(p, dev) for p in params["main"]]
        gpu_out = vm["main"](gpu_data, *gpu_params)[0].numpy()

        print(gpu_out.shape)


.. _sphx_glr_download_how_to_tutorials_e2e_opt_model.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: e2e_opt_model.ipynb <e2e_opt_model.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: e2e_opt_model.py <e2e_opt_model.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: e2e_opt_model.zip <e2e_opt_model.zip>`