.. DO NOT EDIT. THIS FILE WAS AUTOMATICALLY GENERATED BY
.. TVM'S MONKEY-PATCHED VERSION OF SPHINX-GALLERY. TO MAKE
.. CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "deep_dive/tensor_ir/tutorials/tir_creation.py"

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        You can click :ref:`here <sphx_glr_download_deep_dive_tensor_ir_tutorials_tir_creation.py>` to run the Jupyter notebook locally.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_deep_dive_tensor_ir_tutorials_tir_creation.py:


.. _tir-creation:

TensorIR Creation
-----------------
In this section, we will introduce the methods to write a TensorIR function
in Apache TVM. This tutorial presumes familiarity with the fundamental concepts of TensorIR.
If not already acquainted, please refer to :ref:`tirx-learning` initially.

.. note::

    This tutorial concentrates on the construction of **standalone** TensorIR functions. The
    techniques presented here are not requisite for end users to compile Relax models.

.. GENERATED FROM PYTHON SOURCE LINES 36-54

Create TensorIR using TVMScript
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The most straightforward way to create a TensorIR function via TVMScript.
TVMScript is a TVM Python dialect that represents TensorIR in TVM.

.. important::

    While TVMScript employs Python syntax and AST, ensuring full compatibility
    with Python tools like auto-completion and linting, it is not a native Python
    language and cannot be executed by a Python interpreter.

    More precisely, the decorator **@tvm.script** extracts the Python AST from
    the decorated function, subsequently parsing it into TensorIR.

Standard Format
***************
Let's take an example of ``mm_relu`` from :ref:`tirx-learning`. Here is the complete
format of the ir_module and in TVMScript:

.. GENERATED FROM PYTHON SOURCE LINES 54-94

.. code-block:: Python


    import numpy as np
    import tvm_ffi

    import tvm
    from tvm.script import ir as I
    from tvm.script import tirx as T


    @I.ir_module
    class MyModule:
        @T.prim_func(s_tir=True)
        def mm_relu(
            A: T.Buffer((128, 128), "float32"),
            B: T.Buffer((128, 128), "float32"),
            C: T.Buffer((128, 128), "float32"),
        ):
            Y = T.alloc_buffer((128, 128), dtype="float32")
            for i in range(128):
                for j in range(128):
                    for k in range(128):
                        with T.sblock("Y"):
                            vi = T.axis.spatial(128, i)
                            vj = T.axis.spatial(128, j)
                            vk = T.axis.reduce(128, k)
                            T.reads(A[vi, vk], B[vk, vj])
                            T.writes(Y[vi, vj])
                            with T.init():
                                Y[vi, vj] = T.float32(0)
                            Y[vi, vj] = Y[vi, vj] + A[vi, vk] * B[vk, vj]
            for i in range(128):
                for j in range(128):
                    with T.sblock("C"):
                        vi = T.axis.spatial(128, i)
                        vj = T.axis.spatial(128, j)
                        T.reads(Y[vi, vj])
                        T.writes(C[vi, vj])
                        C[vi, vj] = T.max(Y[vi, vj], T.float32(0))


.. GENERATED FROM PYTHON SOURCE LINES 95-104

Concise with Syntactic Sugar
****************************
For ease of writing, we can employ the following syntactic sugar to
streamline the code:

- Utilize ``T.grid`` to condense nested loops;
- Employ ``T.axis.remap`` to abbreviate block iterator annotations;
- Exclude ``T.reads`` and ``T.writes`` for blocks whose content can
  be inferred from the block body;

.. GENERATED FROM PYTHON SOURCE LINES 104-127

.. code-block:: Python


    @I.ir_module
    class ConciseModule:
        @T.prim_func(s_tir=True)
        def mm_relu(
            A: T.Buffer((128, 128), "float32"),
            B: T.Buffer((128, 128), "float32"),
            C: T.Buffer((128, 128), "float32"),
        ):
            Y = T.alloc_buffer((128, 128), dtype="float32")
            for i, j, k in T.grid(128, 128, 128):
                with T.sblock("Y"):
                    vi, vj, vk = T.axis.remap("SSR", [i, j, k])
                    with T.init():
                        Y[vi, vj] = T.float32(0)
                    Y[vi, vj] = Y[vi, vj] + A[vi, vk] * B[vk, vj]
            for i, j in T.grid(128, 128):
                with T.sblock("C"):
                    vi, vj = T.axis.remap("SS", [i, j])
                    C[vi, vj] = T.max(Y[vi, vj], T.float32(0))


.. GENERATED FROM PYTHON SOURCE LINES 128-129

We can use the following code to verify that the two modules are equivalent:

.. GENERATED FROM PYTHON SOURCE LINES 129-132

.. code-block:: Python


    print(tvm_ffi.structural_equal(MyModule, ConciseModule))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    True


.. GENERATED FROM PYTHON SOURCE LINES 133-138

Interactive with Python Variables
*********************************
Despite TVMScript not being executed by a Python interpreter, limited
interaction with Python is feasible. For instance, Python variables can
be used to ascertain the shape and data type of a TensorIR.

.. GENERATED FROM PYTHON SOURCE LINES 138-166

.. code-block:: Python


    # Python variables
    M = N = K = 128
    dtype = "float32"


    # IRModule in TVMScript
    @I.ir_module
    class ConciseModuleFromPython:
        @T.prim_func(s_tir=True)
        def mm_relu(
            A: T.Buffer((M, K), dtype),
            B: T.Buffer((K, N), dtype),
            C: T.Buffer((M, N), dtype),
        ):
            Y = T.alloc_buffer((M, N), dtype)
            for i, j, k in T.grid(M, N, K):
                with T.sblock("Y"):
                    vi, vj, vk = T.axis.remap("SSR", [i, j, k])
                    with T.init():
                        Y[vi, vj] = T.cast(T.float32(0), dtype)
                    Y[vi, vj] = Y[vi, vj] + A[vi, vk] * B[vk, vj]
            for i, j in T.grid(M, N):
                with T.sblock("C"):
                    vi, vj = T.axis.remap("SS", [i, j])
                    C[vi, vj] = T.max(Y[vi, vj], T.cast(T.float32(0), dtype))


.. GENERATED FROM PYTHON SOURCE LINES 167-168

Check the equivalence:

.. GENERATED FROM PYTHON SOURCE LINES 168-172

.. code-block:: Python


    print(tvm_ffi.structural_equal(ConciseModule, ConciseModuleFromPython))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    True


.. GENERATED FROM PYTHON SOURCE LINES 173-178

TensorIR Function with Dynamic Shapes
*************************************
Despite TVMScript not being executed by a Python interpreter, limited
interaction with Python is feasible. For instance, Python variables can
be used to ascertain the shape and data type of a TensorIR.

.. GENERATED FROM PYTHON SOURCE LINES 178-206

.. code-block:: Python


    @I.ir_module
    class DynamicShapeModule:
        @T.prim_func(s_tir=True)
        def mm_relu(a: T.handle, b: T.handle, c: T.handle):
            # Dynamic shape definition
            M = T.int32()
            N = T.int32()
            K = T.int32()

            # Bind the input buffers with the dynamic shapes
            A = T.match_buffer(a, [M, K], dtype)
            B = T.match_buffer(b, [K, N], dtype)
            C = T.match_buffer(c, [M, N], dtype)
            Y = T.alloc_buffer((M, N), dtype)
            for i, j, k in T.grid(M, N, K):
                with T.sblock("Y"):
                    vi, vj, vk = T.axis.remap("SSR", [i, j, k])
                    with T.init():
                        Y[vi, vj] = T.cast(T.float32(0), dtype)
                    Y[vi, vj] = Y[vi, vj] + A[vi, vk] * B[vk, vj]
            for i, j in T.grid(M, N):
                with T.sblock("C"):
                    vi, vj = T.axis.remap("SS", [i, j])
                    C[vi, vj] = T.max(Y[vi, vj], T.cast(T.float32(0), dtype))


.. GENERATED FROM PYTHON SOURCE LINES 207-208

Now let's check the runtime dynamic shape inference:

.. GENERATED FROM PYTHON SOURCE LINES 208-224

.. code-block:: Python


    def evaluate_dynamic_shape(lib: tvm.runtime.Module, m: int, n: int, k: int):
        A = tvm.runtime.tensor(np.random.uniform(size=(m, k)).astype("float32"))
        B = tvm.runtime.tensor(np.random.uniform(size=(k, n)).astype("float32"))
        C = tvm.runtime.tensor(np.zeros((m, n), dtype="float32"))
        lib(A, B, C)
        return C.numpy()


    # Compile lib only once
    dyn_shape_lib = tvm.compile(DynamicShapeModule, target="llvm")
    # Able to handle different shapes
    print(evaluate_dynamic_shape(dyn_shape_lib, m=4, n=4, k=4))
    print(evaluate_dynamic_shape(dyn_shape_lib, m=64, n=64, k=128))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [[0.8231925  0.74687773 0.61615294 1.0123911 ]
     [0.7160674  0.45636386 0.6850383  0.6155127 ]
     [1.005914   0.96243465 0.6228967  1.653728  ]
     [1.002558   1.1868575  0.73863345 0.7221608 ]]
    [[31.91493  36.009766 31.308624 ... 33.880024 35.974846 34.691463]
     [29.066854 34.266792 31.542082 ... 31.765827 35.56741  33.41558 ]
     [30.660873 35.36475  34.314415 ... 32.7977   36.34936  35.827236]
     ...
     [26.84151  30.554972 29.886185 ... 30.959145 31.836    31.593372]
     [26.493942 32.096836 28.361483 ... 30.639572 32.666187 32.601116]
     [28.802551 35.955505 32.694454 ... 32.20954  36.73205  34.373337]]


.. GENERATED FROM PYTHON SOURCE LINES 225-245

Create TensorIR using Tensor Expression
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Often, the specifics of TensorIR are disregarded in favor of expressing the computation more
succinctly, leading to the pragmatic generation of TensorIR. This is where Tensor Expression
(TE) becomes relevant.

Tensor Expression (TE) serves as a domain-specific language delineating a sequence of
computations through an expression-like API.

.. note::

  Tensor Expression comprises two components within the TVM stack: the expression and the
  schedule. The expression is the domain-specific language embodying the computation pattern,
  precisely what we're addressing in this section. Conversely, the TE schedule is the legacy
  scheduling method, has been superseded by the TensorIR schedule in the current TVM stack.

Create Static-Shape Functions
*****************************
We use the same example of ``mm_relu`` from the last subsection to demonstrate the
TE creation method.

.. GENERATED FROM PYTHON SOURCE LINES 245-254

.. code-block:: Python


    from tvm import te

    A = te.placeholder((128, 128), "float32", name="A")
    B = te.placeholder((128, 128), "float32", name="B")
    k = te.reduce_axis((0, 128), "k")
    Y = te.compute((128, 128), lambda i, j: te.sum(A[i, k] * B[k, j], axis=k), name="Y")
    C = te.compute((128, 128), lambda i, j: te.max(Y[i, j], 0), name="C")


.. GENERATED FROM PYTHON SOURCE LINES 255-268

Here ``te.compute`` takes the signature ``te.compute(output_shape, fcompute)``.
And the fcompute function describes how we want to compute the value of each
element ``Y[i, j]`` for a given index:

.. code:: python

  lambda i, j: te.sum(A[i, k] * B[k, j], axis=k)

The aforementioned lambda expression encapsulates the computation:
:math:`Y_{i, j} = \sum_k A_{i, k} \times B_{k, j}`. Upon defining the computation,
we can formulate a TensorIR function by incorporating the pertinent parameters of interest.
In this specific instance, we aim to construct a function with two input parameters **A, B**
and one output parameter **C**.

.. GENERATED FROM PYTHON SOURCE LINES 268-273

.. code-block:: Python


    te_func = te.create_prim_func([A, B, C]).with_attr({"global_symbol": "mm_relu"})
    TEModule = tvm.IRModule({"mm_relu": te_func})
    TEModule.show()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    # from tvm.script import ir as I
    # from tvm.script import tirx as T
    # from tvm.tirx.layout import Axis

    @I.ir_module
    class Module:
        @T.prim_func(s_tir=True)
        def mm_relu(A: T.Buffer((128, 128), "float32"), B: T.Buffer((128, 128), "float32"), C: T.Buffer((128, 128), "float32")):
            T.func_attr({"tirx.noalias": True})
            # with T.sblock("root"):
            Y = T.sblock_alloc_buffer((128, 128))
            for i, j, k in T.grid(128, 128, 128):
                with T.sblock("Y"):
                    v_i, v_j, v_k = T.axis.remap("SSR", [i, j, k])
                    T.reads(A[v_i, v_k], B[v_k, v_j])
                    T.writes(Y[v_i, v_j])
                    with T.init():
                        Y[v_i, v_j] = T.float32(0.0)
                    Y[v_i, v_j] = Y[v_i, v_j] + A[v_i, v_k] * B[v_k, v_j]
            for i, j in T.grid(128, 128):
                with T.sblock("C"):
                    v_i, v_j = T.axis.remap("SS", [i, j])
                    T.reads(Y[v_i, v_j])
                    T.writes(C[v_i, v_j])
                    C[v_i, v_j] = T.max(Y[v_i, v_j], T.float32(0.0))


.. GENERATED FROM PYTHON SOURCE LINES 274-278

Create Dynamic-Shape Functions
******************************
We can also create a dynamic-shape function using Tensor Expression. The only difference
is that we need to specify the shape of the input tensors as symbolic variables.

.. GENERATED FROM PYTHON SOURCE LINES 278-290

.. code-block:: Python


    # Declare symbolic variables
    M, N, K = te.var("m"), te.var("n"), te.var("k")
    A = te.placeholder((M, N), "float32", name="A")
    B = te.placeholder((K, N), "float32", name="B")
    k = te.reduce_axis((0, K), "k")
    Y = te.compute((M, N), lambda i, j: te.sum(A[i, k] * B[k, j], axis=k), name="Y")
    C = te.compute((M, N), lambda i, j: te.max(Y[i, j], 0), name="C")

    dyn_te_func = te.create_prim_func([A, B, C]).with_attr({"global_symbol": "mm_relu"})
    DynamicTEModule = tvm.IRModule({"mm_relu": dyn_te_func})
    DynamicTEModule.show()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    # from tvm.script import ir as I
    # from tvm.script import tirx as T
    # from tvm.tirx.layout import Axis

    @I.ir_module
    class Module:
        @T.prim_func(s_tir=True)
        def mm_relu(var_A: T.handle, var_B: T.handle, var_C: T.handle):
            T.func_attr({"tirx.noalias": True})
            m, n = T.int32(), T.int32()
            A = T.match_buffer(var_A, (m, n))
            k = T.int32()
            B = T.match_buffer(var_B, (k, n))
            C = T.match_buffer(var_C, (m, n))
            # with T.sblock("root"):
            Y = T.sblock_alloc_buffer((m, n))
            for i, j, k_1 in T.grid(m, n, k):
                with T.sblock("Y"):
                    v_i, v_j, v_k = T.axis.remap("SSR", [i, j, k_1])
                    T.reads(A[v_i, v_k], B[v_k, v_j])
                    T.writes(Y[v_i, v_j])
                    with T.init():
                        Y[v_i, v_j] = T.float32(0.0)
                    Y[v_i, v_j] = Y[v_i, v_j] + A[v_i, v_k] * B[v_k, v_j]
            for i, j in T.grid(m, n):
                with T.sblock("C"):
                    v_i, v_j = T.axis.remap("SS", [i, j])
                    T.reads(Y[v_i, v_j])
                    T.writes(C[v_i, v_j])
                    C[v_i, v_j] = T.max(Y[v_i, v_j], T.float32(0.0))


.. _sphx_glr_download_deep_dive_tensor_ir_tutorials_tir_creation.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: tir_creation.ipynb <tir_creation.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: tir_creation.py <tir_creation.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: tir_creation.zip <tir_creation.zip>`