Overview
Apache TVM is a machine learning compilation framework, following the principle of Python-first development and universal deployment. It takes in pre-trained machine learning models, compiles and generates deployable modules that can be embedded and run everywhere. Apache TVM also enables customizing optimization processes to introduce new optimizations, libraries, codegen and more.
Key Principle
Python-first: the optimization process is fully customizable in Python. It is easy to customize the optimization pipeline without recompiling the TVM stack.
Composable: the optimization process is composable. It is easy to compose new optimization passes, libraries and codegen to the existing pipeline.
Key Goals
Optimize performance of ML workloads, composing libraries and codegen.
Deploy ML workloads to a diverse set of new environments, including new runtime and new hardware.
Continuously improve and customize ML deployment pipeline in Python by quickly customizing library dispatching, bringing in customized operators and code generation.
Key Flow
Here is a typical flow of using TVM to deploy a machine learning model. For a runnable example, please refer to Quick Start
Import/construct an ML model
TVM supports importing models from various frameworks, such as PyTorch, TensorFlow for generic ML models. Meanwhile, we can create models directly using Relax frontend for scenarios of large language models.
Perform composable optimization transformations via
pipelines
The pipeline encapsulates a collection of transformations to achieve two goals:
Graph Optimizations: such as operator fusion, and layout rewrites.
Tensor Program Optimization: Map the operators to low-level implementations (both library or codegen)
Note
The two are goals but not the stages of the pipeline. The two optimizations are performed at the same level, or separately in two stages.
Build and universal deploy
Apache TVM aims to provide a universal deployment solution to bring machine learning everywhere with every language with minimum runtime support. TVM runtime can work in non-Python environments, so it works on mobile, edge devices or even bare metal devices. Additionally, TVM runtime comes with native data structures, and can also have zero copy exchange with the existing ecosystem (PyTorch, TensorFlow, TensorRT, etc.) using DLPack support.