# Tensor Program Abstraction¶

Before we dive into the details of TensorIR, let’s first introduce what is a primitive tensor function. Primitive tensor functions are functions that correspond to a single “unit” of computational operation. For example, a convolution operation can be a primitive tensor function, and a fused convolution + relu operation can also be a primitive tensor function. Usually, a typical abstraction for primitive tensor function implementation contains the following elements: multi-dimensional buffers, loop nests that drive the tensor computations, and finally, the compute statements themselves.

```
from tvm.script import tir as T
@T.prim_func
def main(
A: T.Buffer((128,), "float32"),
B: T.Buffer((128,), "float32"),
C: T.Buffer((128,), "float32"),
) -> None:
for i in range(128):
with T.block("C"):
vi = T.axis.spatial(128, i)
C[vi] = A[vi] + B[vi]
```

## Key Elements of Tensor Programs¶

The demonstrated primitive tensor function calculates the element-wise sum of two vectors. The function:

Accepts three

**multi-dimensional buffers**as parameters, and generates one**multi-dimensional buffer**as output.Incorporates a solitary

**loop nest**`i`

that facilitates the computation.Features a singular

**compute statement**that calculates the element-wise sum of the two vectors.

## Extra Structure in TensorIR¶

Crucially, we are unable to execute arbitrary transformations on the program, as certain computations rely on the loop’s sequence. Fortunately, the majority of primitive tensor functions we focus on possess favorable properties, such as independence among loop iterations. For instance, the aforementioned program includes block and iteration annotations:

The

**block annotation**`with T.block("C")`

signifies that the block is the fundamental computation unit designated for scheduling. A block may encompass a single computation statement, multiple computation statements with loops, or opaque intrinsics such as Tensor Core instructions.The

**iteration annotation**`T.axis.spatial`

, indicating that variable`vi`

is mapped to`i`

, and all iterations are independent.

While this information isn’t crucial for *executing* the specific program, it proves useful when
transforming the program. Consequently, we can confidently parallelize or reorder loops associated
with `vi`

, provided we traverse all the index elements from 0 to 128.