{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%shell\n# Installs the latest dev build of TVM from PyPI. If you wish to build\n# from source, see https://tvm.apache.org/docs/install/from_source.html\npip install apache-tvm --pre" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n# 2D Convolution Optimization\n**Author**: [Thierry Moreau](https://homes.cs.washington.edu/~moreau/)\n\nThis tutorial provides an overview on how to use TVM to map a 2D convolution\nworkload efficiently on the VTA design.\nWe recommend covering the `vta-mat-mult-opt` tutorial first.\n\n2D convolution is dominant in most computer vision deep neural networks.\nIn this tutorial, we will demonstrate TVM schedule optimizations to map\n2D convolution operators in NCHW layout onto VTA.\nWe also introduce the notion of latency hiding, which allows us to\nmaximize VTA's compute and memory resource utilization.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## RPC Setup\nWe start by programming the Pynq's FPGA and building its RPC runtime.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from __future__ import absolute_import, print_function\n\nimport os\nimport tvm\nimport tvm.testing\nfrom tvm import te\nimport vta\nimport numpy as np\n\nfrom tvm import rpc\nfrom tvm.contrib import utils\nfrom vta.testing import simulator\n\n# Load VTA parameters from the 3rdparty/vta-hw/config/vta_config.json file\nenv = vta.get_env()\n\n# We read the Pynq RPC host IP address and port number from the OS environment\nhost = os.environ.get(\"VTA_RPC_HOST\", \"192.168.2.99\")\nport = int(os.environ.get(\"VTA_RPC_PORT\", \"9091\"))\n\n# We configure both the bitstream and the runtime system on the Pynq\n# to match the VTA configuration specified by the vta_config.json file.\nif env.TARGET == \"pynq\":\n\n # Make sure that TVM was compiled with RPC=1\n assert tvm.runtime.enabled(\"rpc\")\n remote = rpc.connect(host, port)\n\n # Reconfigure the JIT runtime\n vta.reconfig_runtime(remote)\n\n # Program the FPGA with a pre-compiled VTA bitstream.\n # You can program the FPGA with your own custom bitstream\n # by passing the path to the bitstream file instead of None.\n vta.program_fpga(remote, bitstream=None)\n\n# In simulation mode, host the RPC server locally.\nelif env.TARGET in [\"sim\", \"tsim\"]:\n remote = rpc.LocalSession()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computation Declaration\nAs a first step, we need to describe our 2D convolution computation\nin NCHW format.\n\nWe define the 2D convolution shape by the batch size,\nspatial dimensions, input channels, output channels, kernel dimensions,\nkernel dimensions, padding dimensions, and stride dimensions.\n\nWe pick the shape of the 9th convolutional layer of the ResNet-18\narchitecture as our convolution workload parameters.\n\nWe've added extra operators to the 2D convolution that apply\nshifting and clipping to the output in order to mimic a fixed-point\nconvolution followed by a rectified linear activation.\nWe describe the TVM dataflow graph of the 2D convolution layer below:\n\n\n\nThis computation is intentionally too large to fit onto VTA's on-chip\nbuffers all at once. Therefore in the scheduling phase we'll\nrely on computation blocking strategies to break the computation down into\nmanageable chunks.\n\n
*Spatial padding*\n\n Note that we'll need to import the TOPI library to apply spatial padding\n on the input feature map tensor.\n Spatial padding facilitates blocking in the context of 2D convolutions\n due to the fact that the same (x, y) spatial location of the input\n feature map of any given layer is read more than once if the convolution\n kernel window size is greater than one.\n On CPUs, and GPUs, one way to increase efficiency of memory accesses\n when parallelizing work is spatial packing, which requires data re-layout.\n VTA load DMA engine can insert padding automatically so that the original\n input feature map does not have to be re-packed in memory.\n\n We show the effect of VTA's on the fly spatial padding when data is being\n loaded from DRAM into VTA's SRAM, following a 2D strided and padded memory\n read.\n\n .. image:: https://raw.githubusercontent.com/uwsampl/web-data/main/vta/tutorial/padding.png\n :align: center\n :width: 480px