{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%%shell\n# Installs the latest dev build of TVM from PyPI. If you wish to build\n# from source, see https://tvm.apache.org/docs/install/from_source.html\npip install apache-tvm --pre"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Compiling and Optimizing a Model with the Python Interface (AutoTVM)\n**Author**:\n[Chris Hoge](https://github.com/hogepodge)\n\nIn the [TVMC Tutorial](tvmc_command_line_driver), we covered how to compile, run, and tune a\npre-trained vision model, ResNet-50 v2 using the command line interface for\nTVM, TVMC. TVM is more that just a command-line tool though, it is an\noptimizing framework with APIs available for a number of different languages\nthat gives you tremendous flexibility in working with machine learning models.\n\nIn this tutorial we will cover the same ground we did with TVMC, but show how\nit is done with the Python API. Upon completion of this section, we will have\nused the Python API for TVM to accomplish the following tasks:\n\n* Compile a pre-trained ResNet-50 v2 model for the TVM runtime.\n* Run a real image through the compiled model, and interpret the output and model\n  performance.\n* Tune the model that model on a CPU using TVM.\n* Re-compile an optimized model using the tuning data collected by TVM.\n* Run the image through the optimized model, and compare the output and model\n  performance.\n\nThe goal of this section is to give you an overview of TVM's capabilites and\nhow to use them through the Python API.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "TVM is a deep learning compiler framework, with a number of different modules\navailable for working with deep learning models and operators. In this\ntutorial we will work through how to load, compile, and optimize a model\nusing the Python API.\n\nWe begin by importing a number of dependencies, including ``onnx`` for\nloading and converting the model, helper utilities for downloading test data,\nthe Python Image Library for working with the image data, ``numpy`` for pre\nand post-processing of the image data, the TVM Relay framework, and the TVM\nGraph Executor.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import onnx\nfrom tvm.contrib.download import download_testdata\nfrom PIL import Image\nimport numpy as np\nimport tvm.relay as relay\nimport tvm\nfrom tvm.contrib import graph_executor"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Downloading and Loading the ONNX Model\n\nFor this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a\nconvolutional neural network that is 50 layers deep and designed to classify\nimages. The model we will be using has been pre-trained on more than a\nmillion images with 1000 different classifications. The network has an input\nimage size of 224x224. If you are interested exploring more of how the\nResNet-50 model is structured, we recommend downloading\n[Netron](https://netron.app), a freely available ML model viewer.\n\nTVM provides a helper library to download pre-trained models. By providing a\nmodel URL, file name, and model type through the module, TVM will download\nthe model and save it to disk. For the instance of an ONNX model, you can\nthen load it into memory using the ONNX runtime.\n\n<div class=\"alert alert-info\"><h4>Working with Other Model Formats</h4><p>TVM supports many popular model formats. A list can be found in the\n`Compile Deep Learning Models <tutorial-frontend>` section of the TVM\nDocumentation.</p></div>"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "model_url = (\n    \"https://github.com/onnx/models/raw/bd206494e8b6a27b25e5cf7199dbcdbfe9d05d1c/\"\n    \"vision/classification/resnet/model/\"\n    \"resnet50-v2-7.onnx\"\n)\n\nmodel_path = download_testdata(model_url, \"resnet50-v2-7.onnx\", module=\"onnx\")\nonnx_model = onnx.load(model_path)\n\n# Seed numpy's RNG to get consistent results\nnp.random.seed(0)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Downloading, Preprocessing, and Loading the Test Image\n\nEach model is particular when it comes to expected tensor shapes, formats and\ndata types. For this reason, most models require some pre and\npost-processing, to ensure the input is valid and to interpret the output.\nTVMC has adopted NumPy's ``.npz`` format for both input and output data.\n\nAs input for this tutorial, we will use the image of a cat, but you can feel\nfree to substitute this image for any of your choosing.\n\n<img src=\"https://s3.amazonaws.com/model-server/inputs/kitten.jpg\" height=\"224px\" width=\"224px\" align=\"center\">\n\nDownload the image data, then convert it to a numpy array to use as an input to the model.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "img_url = \"https://s3.amazonaws.com/model-server/inputs/kitten.jpg\"\nimg_path = download_testdata(img_url, \"imagenet_cat.png\", module=\"data\")\n\n# Resize it to 224x224\nresized_image = Image.open(img_path).resize((224, 224))\nimg_data = np.asarray(resized_image).astype(\"float32\")\n\n# Our input image is in HWC layout while ONNX expects CHW input, so convert the array\nimg_data = np.transpose(img_data, (2, 0, 1))\n\n# Normalize according to the ImageNet input specification\nimagenet_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1))\nimagenet_stddev = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1))\nnorm_img_data = (img_data / 255 - imagenet_mean) / imagenet_stddev\n\n# Add the batch dimension, as we are expecting 4-dimensional input: NCHW.\nimg_data = np.expand_dims(norm_img_data, axis=0)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Compile the Model With Relay\n\nThe next step is to compile the ResNet model. We begin by importing the model\nto relay using the `from_onnx` importer. We then build the model, with\nstandard optimizations, into a TVM library.  Finally, we create a TVM graph\nruntime module from the library.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "target = \"llvm\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<div class=\"alert alert-info\"><h4>Defining the Correct Target</h4><p>Specifying the correct target can have a huge impact on the performance of\nthe compiled module, as it can take advantage of hardware features\navailable on the target. For more information, please refer to\n`Auto-tuning a convolutional network for x86 CPU <tune_relay_x86>`.\nWe recommend identifying which CPU you are running, along with optional\nfeatures, and set the target appropriately. For example, for some\nprocessors ``target = \"llvm -mcpu=skylake\"``, or ``target = \"llvm\n-mcpu=skylake-avx512\"`` for processors with the AVX-512 vector instruction\nset.</p></div>"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# The input name may vary across model types. You can use a tool\n# like Netron to check input names\ninput_name = \"data\"\nshape_dict = {input_name: img_data.shape}\n\nmod, params = relay.frontend.from_onnx(onnx_model, shape_dict)\n\nwith tvm.transform.PassContext(opt_level=3):\n    lib = relay.build(mod, target=target, params=params)\n\ndev = tvm.device(str(target), 0)\nmodule = graph_executor.GraphModule(lib[\"default\"](dev))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Execute on the TVM Runtime\nNow that we've compiled the model, we can use the TVM runtime to make\npredictions with it. To use TVM to run the model and make predictions, we\nneed two things:\n\n- The compiled model, which we just produced.\n- Valid input to the model to make predictions on.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "dtype = \"float32\"\nmodule.set_input(input_name, img_data)\nmodule.run()\noutput_shape = (1, 1000)\ntvm_output = module.get_output(0, tvm.nd.empty(output_shape)).numpy()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Collect Basic Performance Data\nWe want to collect some basic performance data associated with this\nunoptimized model and compare it to a tuned model later. To help account for\nCPU noise, we run the computation in multiple batches in multiple\nrepetitions, then gather some basis statistics on the mean, median, and\nstandard deviation.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import timeit\n\ntiming_number = 10\ntiming_repeat = 10\nunoptimized = (\n    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))\n    * 1000\n    / timing_number\n)\nunoptimized = {\n    \"mean\": np.mean(unoptimized),\n    \"median\": np.median(unoptimized),\n    \"std\": np.std(unoptimized),\n}\n\nprint(unoptimized)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Postprocess the output\n\nAs previously mentioned, each model will have its own particular way of\nproviding output tensors.\n\nIn our case, we need to run some post-processing to render the outputs from\nResNet-50 v2 into a more human-readable form, using the lookup-table provided\nfor the model.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from scipy.special import softmax\n\n# Download a list of labels\nlabels_url = \"https://s3.amazonaws.com/onnx-model-zoo/synset.txt\"\nlabels_path = download_testdata(labels_url, \"synset.txt\", module=\"data\")\n\nwith open(labels_path, \"r\") as f:\n    labels = [l.rstrip() for l in f]\n\n# Open the output and read the output tensor\nscores = softmax(tvm_output)\nscores = np.squeeze(scores)\nranks = np.argsort(scores)[::-1]\nfor rank in ranks[0:5]:\n    print(\"class='%s' with probability=%f\" % (labels[rank], scores[rank]))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This should produce the following output:\n\n```bash\n# class='n02123045 tabby, tabby cat' with probability=0.610553\n# class='n02123159 tiger cat' with probability=0.367179\n# class='n02124075 Egyptian cat' with probability=0.019365\n# class='n02129604 tiger, Panthera tigris' with probability=0.001273\n# class='n04040759 radiator' with probability=0.000261\n```\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Tune the model\nThe previous model was compiled to work on the TVM runtime, but did not\ninclude any platform specific optimization. In this section, we will show you\nhow to build an optimized model using TVM to target your working platform.\n\nIn some cases, we might not get the expected performance when running\ninferences using our compiled module. In cases like this, we can make use of\nthe auto-tuner, to find a better configuration for our model and get a boost\nin performance. Tuning in TVM refers to the process by which a model is\noptimized to run faster on a given target. This differs from training or\nfine-tuning in that it does not affect the accuracy of the model, but only\nthe runtime performance. As part of the tuning process, TVM will try running\nmany different operator implementation variants to see which perform best.\nThe results of these runs are stored in a tuning records file.\n\nIn the simplest form, tuning requires you to provide three things:\n\n- the target specification of the device you intend to run this model on\n- the path to an output file in which the tuning records will be stored\n- a path to the model to be tuned.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import tvm.auto_scheduler as auto_scheduler\nfrom tvm.autotvm.tuner import XGBTuner\nfrom tvm import autotvm"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Set up some basic parameters for the runner. The runner takes compiled code\nthat is generated with a specific set of parameters and measures the\nperformance of it. ``number`` specifies the number of different\nconfigurations that we will test, while ``repeat`` specifies how many\nmeasurements we will take of each configuration. ``min_repeat_ms`` is a value\nthat specifies how long need to run configuration test. If the number of\nrepeats falls under this time, it will be increased. This option is necessary\nfor accurate tuning on GPUs, and is not required for CPU tuning. Setting this\nvalue to 0 disables it. The ``timeout`` places an upper limit on how long to\nrun training code for each tested configuration.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "number = 10\nrepeat = 1\nmin_repeat_ms = 0  # since we're tuning on a CPU, can be set to 0\ntimeout = 10  # in seconds\n\n# create a TVM runner\nrunner = autotvm.LocalRunner(\n    number=number,\n    repeat=repeat,\n    timeout=timeout,\n    min_repeat_ms=min_repeat_ms,\n    enable_cpu_cache_flush=True,\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Create a simple structure for holding tuning options. We use an XGBoost\nalgorithim for guiding the search. For a production job, you will want to set\nthe number of trials to be larger than the value of 20 used here. For CPU we\nrecommend 1500, for GPU 3000-4000. The number of trials required can depend\non the particular model and processor, so it's worth spending some time\nevaluating performance across a range of values to find the best balance\nbetween tuning time and model optimization. Because running tuning is time\nintensive we set number of trials to 10, but do not recommend a value this\nsmall. The ``early_stopping`` parameter is the minimum number of trails to\nrun before a condition that stops the search early can be applied. The\nmeasure option indicates where trial code will be built, and where it will be\nrun. In this case, we're using the ``LocalRunner`` we just created and a\n``LocalBuilder``. The ``tuning_records`` option specifies a file to write\nthe tuning data to.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "tuning_option = {\n    \"tuner\": \"xgb\",\n    \"trials\": 20,\n    \"early_stopping\": 100,\n    \"measure_option\": autotvm.measure_option(\n        builder=autotvm.LocalBuilder(build_func=\"default\"), runner=runner\n    ),\n    \"tuning_records\": \"resnet-50-v2-autotuning.json\",\n}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<div class=\"alert alert-info\"><h4>Defining the Tuning Search Algorithm</h4><p>By default this search is guided using an `XGBoost Grid` algorithm.\nDepending on your model complexity and amount of time available, you might\nwant to choose a different algorithm.</p></div>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<div class=\"alert alert-info\"><h4>Setting Tuning Parameters</h4><p>In this example, in the interest of time, we set the number of trials and\nearly stopping to 20 and 100. You will likely see more performance improvements if\nyou set these values to be higher but this comes at the expense of time\nspent tuning. The number of trials required for convergence will vary\ndepending on the specifics of the model and the target platform.</p></div>"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# begin by extracting the tasks from the onnx model\ntasks = autotvm.task.extract_from_program(mod[\"main\"], target=target, params=params)\n\n# Tune the extracted tasks sequentially.\nfor i, task in enumerate(tasks):\n    prefix = \"[Task %2d/%2d] \" % (i + 1, len(tasks))\n\n    # choose tuner\n    tuner = \"xgb\"\n\n    # create tuner\n    if tuner == \"xgb\":\n        tuner_obj = XGBTuner(task, loss_type=\"reg\")\n    elif tuner == \"xgb_knob\":\n        tuner_obj = XGBTuner(task, loss_type=\"reg\", feature_type=\"knob\")\n    elif tuner == \"xgb_itervar\":\n        tuner_obj = XGBTuner(task, loss_type=\"reg\", feature_type=\"itervar\")\n    elif tuner == \"xgb_curve\":\n        tuner_obj = XGBTuner(task, loss_type=\"reg\", feature_type=\"curve\")\n    elif tuner == \"xgb_rank\":\n        tuner_obj = XGBTuner(task, loss_type=\"rank\")\n    elif tuner == \"xgb_rank_knob\":\n        tuner_obj = XGBTuner(task, loss_type=\"rank\", feature_type=\"knob\")\n    elif tuner == \"xgb_rank_itervar\":\n        tuner_obj = XGBTuner(task, loss_type=\"rank\", feature_type=\"itervar\")\n    elif tuner == \"xgb_rank_curve\":\n        tuner_obj = XGBTuner(task, loss_type=\"rank\", feature_type=\"curve\")\n    elif tuner == \"xgb_rank_binary\":\n        tuner_obj = XGBTuner(task, loss_type=\"rank-binary\")\n    elif tuner == \"xgb_rank_binary_knob\":\n        tuner_obj = XGBTuner(task, loss_type=\"rank-binary\", feature_type=\"knob\")\n    elif tuner == \"xgb_rank_binary_itervar\":\n        tuner_obj = XGBTuner(task, loss_type=\"rank-binary\", feature_type=\"itervar\")\n    elif tuner == \"xgb_rank_binary_curve\":\n        tuner_obj = XGBTuner(task, loss_type=\"rank-binary\", feature_type=\"curve\")\n    elif tuner == \"ga\":\n        tuner_obj = GATuner(task, pop_size=50)\n    elif tuner == \"random\":\n        tuner_obj = RandomTuner(task)\n    elif tuner == \"gridsearch\":\n        tuner_obj = GridSearchTuner(task)\n    else:\n        raise ValueError(\"Invalid tuner: \" + tuner)\n\n    tuner_obj.tune(\n        n_trial=min(tuning_option[\"trials\"], len(task.config_space)),\n        early_stopping=tuning_option[\"early_stopping\"],\n        measure_option=tuning_option[\"measure_option\"],\n        callbacks=[\n            autotvm.callback.progress_bar(tuning_option[\"trials\"], prefix=prefix),\n            autotvm.callback.log_to_file(tuning_option[\"tuning_records\"]),\n        ],\n    )"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The output from this tuning process will look something like this:\n\n```bash\n# [Task  1/24]  Current/Best:   10.71/  21.08 GFLOPS | Progress: (60/1000) | 111.77 s Done.\n# [Task  1/24]  Current/Best:    9.32/  24.18 GFLOPS | Progress: (192/1000) | 365.02 s Done.\n# [Task  2/24]  Current/Best:   22.39/ 177.59 GFLOPS | Progress: (960/1000) | 976.17 s Done.\n# [Task  3/24]  Current/Best:   32.03/ 153.34 GFLOPS | Progress: (800/1000) | 776.84 s Done.\n# [Task  4/24]  Current/Best:   11.96/ 156.49 GFLOPS | Progress: (960/1000) | 632.26 s Done.\n# [Task  5/24]  Current/Best:   23.75/ 130.78 GFLOPS | Progress: (800/1000) | 739.29 s Done.\n# [Task  6/24]  Current/Best:   38.29/ 198.31 GFLOPS | Progress: (1000/1000) | 624.51 s Done.\n# [Task  7/24]  Current/Best:    4.31/ 210.78 GFLOPS | Progress: (1000/1000) | 701.03 s Done.\n# [Task  8/24]  Current/Best:   50.25/ 185.35 GFLOPS | Progress: (972/1000) | 538.55 s Done.\n# [Task  9/24]  Current/Best:   50.19/ 194.42 GFLOPS | Progress: (1000/1000) | 487.30 s Done.\n# [Task 10/24]  Current/Best:   12.90/ 172.60 GFLOPS | Progress: (972/1000) | 607.32 s Done.\n# [Task 11/24]  Current/Best:   62.71/ 203.46 GFLOPS | Progress: (1000/1000) | 581.92 s Done.\n# [Task 12/24]  Current/Best:   36.79/ 224.71 GFLOPS | Progress: (1000/1000) | 675.13 s Done.\n# [Task 13/24]  Current/Best:    7.76/ 219.72 GFLOPS | Progress: (1000/1000) | 519.06 s Done.\n# [Task 14/24]  Current/Best:   12.26/ 202.42 GFLOPS | Progress: (1000/1000) | 514.30 s Done.\n# [Task 15/24]  Current/Best:   31.59/ 197.61 GFLOPS | Progress: (1000/1000) | 558.54 s Done.\n# [Task 16/24]  Current/Best:   31.63/ 206.08 GFLOPS | Progress: (1000/1000) | 708.36 s Done.\n# [Task 17/24]  Current/Best:   41.18/ 204.45 GFLOPS | Progress: (1000/1000) | 736.08 s Done.\n# [Task 18/24]  Current/Best:   15.85/ 222.38 GFLOPS | Progress: (980/1000) | 516.73 s Done.\n# [Task 19/24]  Current/Best:   15.78/ 203.41 GFLOPS | Progress: (1000/1000) | 587.13 s Done.\n# [Task 20/24]  Current/Best:   30.47/ 205.92 GFLOPS | Progress: (980/1000) | 471.00 s Done.\n# [Task 21/24]  Current/Best:   46.91/ 227.99 GFLOPS | Progress: (308/1000) | 219.18 s Done.\n# [Task 22/24]  Current/Best:   13.33/ 207.66 GFLOPS | Progress: (1000/1000) | 761.74 s Done.\n# [Task 23/24]  Current/Best:   53.29/ 192.98 GFLOPS | Progress: (1000/1000) | 799.90 s Done.\n# [Task 24/24]  Current/Best:   25.03/ 146.14 GFLOPS | Progress: (1000/1000) | 1112.55 s Done.\n```\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Compiling an Optimized Model with Tuning Data\n\nAs an output of the tuning process above, we obtained the tuning records\nstored in ``resnet-50-v2-autotuning.json``. The compiler will use the results to\ngenerate high performance code for the model on your specified target.\n\nNow that tuning data for the model has been collected, we can re-compile the\nmodel using optimized operators to speed up our computations.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "with autotvm.apply_history_best(tuning_option[\"tuning_records\"]):\n    with tvm.transform.PassContext(opt_level=3, config={}):\n        lib = relay.build(mod, target=target, params=params)\n\ndev = tvm.device(str(target), 0)\nmodule = graph_executor.GraphModule(lib[\"default\"](dev))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Verify that the optimized model runs and produces the same results:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "dtype = \"float32\"\nmodule.set_input(input_name, img_data)\nmodule.run()\noutput_shape = (1, 1000)\ntvm_output = module.get_output(0, tvm.nd.empty(output_shape)).numpy()\n\nscores = softmax(tvm_output)\nscores = np.squeeze(scores)\nranks = np.argsort(scores)[::-1]\nfor rank in ranks[0:5]:\n    print(\"class='%s' with probability=%f\" % (labels[rank], scores[rank]))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Verifying that the predictions are the same:\n\n```bash\n# class='n02123045 tabby, tabby cat' with probability=0.610550\n# class='n02123159 tiger cat' with probability=0.367181\n# class='n02124075 Egyptian cat' with probability=0.019365\n# class='n02129604 tiger, Panthera tigris' with probability=0.001273\n# class='n04040759 radiator' with probability=0.000261\n```\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Comparing the Tuned and Untuned Models\nWe want to collect some basic performance data associated with this optimized\nmodel to compare it to the unoptimized model. Depending on your underlying\nhardware, number of iterations, and other factors, you should see a performance\nimprovement in comparing the optimized model to the unoptimized model.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import timeit\n\ntiming_number = 10\ntiming_repeat = 10\noptimized = (\n    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))\n    * 1000\n    / timing_number\n)\noptimized = {\"mean\": np.mean(optimized), \"median\": np.median(optimized), \"std\": np.std(optimized)}\n\n\nprint(\"optimized: %s\" % (optimized))\nprint(\"unoptimized: %s\" % (unoptimized))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Final Remarks\n\nIn this tutorial, we gave a short example of how to use the TVM Python API\nto compile, run, and tune a model. We also discussed the need for pre and\npost-processing of inputs and outputs. After the tuning process, we\ndemonstrated how to compare the performance of the unoptimized and optimize\nmodels.\n\nHere we presented a simple example using ResNet-50 v2 locally. However, TVM\nsupports many more features including cross-compilation, remote execution and\nprofiling/benchmarking.\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.19"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}