.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "how_to/deploy_models/deploy_prequantized.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_how_to_deploy_models_deploy_prequantized.py: Deploy a Framework-prequantized Model with TVM ============================================== **Author**: `Masahiro Masuda `_ This is a tutorial on loading models quantized by deep learning frameworks into TVM. Pre-quantized model import is one of the quantization support we have in TVM. More details on the quantization story in TVM can be found `here `_. Here, we demonstrate how to load and run models quantized by PyTorch, MXNet, and TFLite. Once loaded, we can run compiled, quantized models on any hardware TVM supports. .. GENERATED FROM PYTHON SOURCE LINES 30-32 .. code-block:: default .. GENERATED FROM PYTHON SOURCE LINES 38-39 First, necessary imports .. GENERATED FROM PYTHON SOURCE LINES 39-51 .. code-block:: default from PIL import Image import numpy as np import torch from torchvision.models.quantization import mobilenet as qmobilenet import tvm from tvm import relay from tvm.contrib.download import download_testdata .. GENERATED FROM PYTHON SOURCE LINES 52-53 Helper functions to run the demo .. GENERATED FROM PYTHON SOURCE LINES 53-106 .. code-block:: default def get_transform(): import torchvision.transforms as transforms normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) return transforms.Compose( [ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), normalize, ] ) def get_real_image(im_height, im_width): img_url = "https://github.com/dmlc/mxnet.js/blob/main/data/cat.png?raw=true" img_path = download_testdata(img_url, "cat.png", module="data") return Image.open(img_path).resize((im_height, im_width)) def get_imagenet_input(): im = get_real_image(224, 224) preprocess = get_transform() pt_tensor = preprocess(im) return np.expand_dims(pt_tensor.numpy(), 0) def get_synset(): synset_url = "".join( [ "https://gist.githubusercontent.com/zhreshold/", "4d0b62f3d01426887599d4f7ede23ee5/raw/", "596b27d23537e5a1b5751d2b0481ef172f58b539/", "imagenet1000_clsid_to_human.txt", ] ) synset_name = "imagenet1000_clsid_to_human.txt" synset_path = download_testdata(synset_url, synset_name, module="data") with open(synset_path) as f: return eval(f.read()) def run_tvm_model(mod, params, input_name, inp, target="llvm"): with tvm.transform.PassContext(opt_level=3): lib = relay.build(mod, target=target, params=params) runtime = tvm.contrib.graph_executor.GraphModule(lib["default"](tvm.device(target, 0))) runtime.set_input(input_name, inp) runtime.run() return runtime.get_output(0).numpy(), runtime .. GENERATED FROM PYTHON SOURCE LINES 107-109 A mapping from label to class name, to verify that the outputs from models below are reasonable .. GENERATED FROM PYTHON SOURCE LINES 109-111 .. code-block:: default synset = get_synset() .. GENERATED FROM PYTHON SOURCE LINES 112-113 Everyone's favorite cat image for demonstration .. GENERATED FROM PYTHON SOURCE LINES 113-115 .. code-block:: default inp = get_imagenet_input() .. GENERATED FROM PYTHON SOURCE LINES 116-128 Deploy a quantized PyTorch Model -------------------------------- First, we demonstrate how to load deep learning models quantized by PyTorch, using our PyTorch frontend. Please refer to the PyTorch static quantization tutorial below to learn about their quantization workflow. https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html We use this function to quantize PyTorch models. In short, this function takes a floating point model and converts it to uint8. The model is per-channel quantized. .. GENERATED FROM PYTHON SOURCE LINES 128-139 .. code-block:: default def quantize_model(model, inp): model.fuse_model() model.qconfig = torch.quantization.get_default_qconfig("fbgemm") torch.quantization.prepare(model, inplace=True) # Dummy calibration model(inp) torch.quantization.convert(model, inplace=True) .. GENERATED FROM PYTHON SOURCE LINES 140-144 Load quantization-ready, pretrained Mobilenet v2 model from torchvision ----------------------------------------------------------------------- We choose mobilenet v2 because this model was trained with quantization aware training. Other models require a full post training calibration. .. GENERATED FROM PYTHON SOURCE LINES 144-146 .. code-block:: default qmodel = qmobilenet.mobilenet_v2(pretrained=True).eval() .. rst-class:: sphx-glr-script-out .. code-block:: none Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /workspace/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth 0%| | 0.00/13.6M [00:00` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: deploy_prequantized.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_