{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%%shell\n# Installs the latest dev build of TVM from PyPI, with CUDA enabled. To use this,\n# you must request a Google Colab instance with a GPU by going to Runtime ->\n# Change runtime type -> Hardware accelerator -> GPU. If you wish to build from\n# source, see https://tvm.apache.org/docs/install/from_source.html\npip install tlcpack-nightly-cu113 --pre -f https://tlcpack.ai/wheels"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n\n# Introduction to TOPI\n**Author**: [Ehsan M. Kermani](https://github.com/ehsanmok)\n\nThis is an introductory tutorial to TVM Operator Inventory (TOPI).\nTOPI provides numpy-style generic operations and schedules with higher abstractions than TVM.\nIn this tutorial, we will see how TOPI can save us from writing boilerplate code in TVM.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import tvm\nimport tvm.testing\nfrom tvm import te\nfrom tvm import topi\nimport numpy as np"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Basic example\nLet's revisit the sum of rows operation (equivalent to :code:`B = numpy.sum(A, axis=1)`') \\\nTo compute the sum of rows of a two dimensional TVM tensor A, we should\nspecify the symbolic operation as well as schedule as follows\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "n = te.var(\"n\")\nm = te.var(\"m\")\nA = te.placeholder((n, m), name=\"A\")\nk = te.reduce_axis((0, m), \"k\")\nB = te.compute((n,), lambda i: te.sum(A[i, k], axis=k), name=\"B\")\ns = te.create_schedule(B.op)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "and to examine the IR code in human readable format, we can do\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(tvm.lower(s, [A], simple_mode=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "However, for such a common operation we had to define the reduce axis ourselves as well as explicit computation with\n:code:`te.compute`. Imagine for more complicated operations how much details we need to provide.\nFortunately, we can replace those two lines with simple :code:`topi.sum` much like :code:`numpy.sum`\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "C = topi.sum(A, axis=1)\nts = te.create_schedule(C.op)\nprint(tvm.lower(ts, [A], simple_mode=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Numpy-style operator overloading\nWe can add two tensors using :code:`topi.broadcast_add` that have correct (broadcastable with specific) shapes.\nEven shorter, TOPI provides operator overloading for such common operations. For example,\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "x, y = 100, 10\na = te.placeholder((x, y, y), name=\"a\")\nb = te.placeholder((y, y), name=\"b\")\nc = a + b  # same as topi.broadcast_add\nd = a * b  # same as topi.broadcast_mul"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Overloaded with the same syntax, TOPI handles broadcasting a primitive (`int`, `float`) to a tensor :code:`d - 3.14`.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Generic schedules and fusing operations\nUp to now, we have seen an example of how TOPI can save us from writing explicit computations in lower level API.\nBut it doesn't stop here. Still we did the scheduling as before. TOPI also provides higher level\nscheduling recipes depending on a given context. For example, for CUDA,\nwe can schedule the following series of operations ending with :code:`topi.sum` using only\n:code:`topi.generic.schedule_reduce`\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "e = topi.elemwise_sum([c, d])\nf = e / 2.0\ng = topi.sum(f)\nwith tvm.target.cuda():\n    sg = topi.cuda.schedule_reduce(g)\n    print(tvm.lower(sg, [a, b], simple_mode=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "As you can see, scheduled stages of computation have been accumulated and we can examine them by\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(sg.stages)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can test the correctness by comparing with :code:`numpy` result as follows\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "func = tvm.build(sg, [a, b, g], \"cuda\")\ndev = tvm.cuda(0)\na_np = np.random.uniform(size=(x, y, y)).astype(a.dtype)\nb_np = np.random.uniform(size=(y, y)).astype(b.dtype)\ng_np = np.sum(np.add(a_np + b_np, a_np * b_np) / 2.0)\na_nd = tvm.nd.array(a_np, dev)\nb_nd = tvm.nd.array(b_np, dev)\ng_nd = tvm.nd.array(np.zeros(g_np.shape, dtype=g_np.dtype), dev)\nfunc(a_nd, b_nd, g_nd)\ntvm.testing.assert_allclose(g_nd.numpy(), g_np, rtol=1e-5)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "TOPI also provides common neural nets operations such as _softmax_ with optimized schedule\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "tarray = te.placeholder((512, 512), name=\"tarray\")\nsoftmax_topi = topi.nn.softmax(tarray)\nwith tvm.target.Target(\"cuda\"):\n    sst = topi.cuda.schedule_softmax(softmax_topi)\n    print(tvm.lower(sst, [tarray], simple_mode=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Fusing convolutions\nWe can fuse :code:`topi.nn.conv2d` and :code:`topi.nn.relu` together.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>TOPI functions are all generic functions. They have different implementations\n   for different backends to optimize for performance.\n   For each backend, it is necessary to call them under a target scope for both\n   compute declaration and schedule. TVM will choose the right function to call with\n   the target information.</p></div>\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "data = te.placeholder((1, 3, 224, 224))\nkernel = te.placeholder((10, 3, 5, 5))\n\nwith tvm.target.Target(\"cuda\"):\n    conv = topi.cuda.conv2d_nchw(data, kernel, 1, 2, 1)\n    out = topi.nn.relu(conv)\n    sconv = topi.cuda.schedule_conv2d_nchw([out])\n    print(tvm.lower(sconv, [data, kernel], simple_mode=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Summary\nIn this tutorial, we have seen\n\n- How to use TOPI API for common operations with numpy-style operators.\n- How TOPI facilitates generic schedules and operator fusion for a context, to generate optimized kernel codes.\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.19"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}