.. note::
    :class: sphx-glr-download-link-note

    Click :ref:`here <sphx_glr_download_how_to_tune_with_autotvm_tune_conv2d_cuda.py>` to download the full example code
.. rst-class:: sphx-glr-example-title

.. _sphx_glr_how_to_tune_with_autotvm_tune_conv2d_cuda.py:


Tuning High Performance Convolution on NVIDIA GPUs
=========================================================================
**Author**: `Lianmin Zheng <https://github.com/merrymercy>`_

This is an advanced tutorial for writing high performance tunable template for
NVIDIA GPU. By running auto-tuner on this template, we can outperform the
vendor provided library CuDNN in many cases.

Note that this tutorial will not run on Windows or recent versions of macOS. To
get it to run, you will need to wrap the body of this tutorial in a :code:`if
__name__ == "__main__":` block.

Install dependencies
--------------------
To use autotvm package in tvm, we need to install some extra dependencies.
(change "3" to "2" if you use python2):

.. code-block:: bash

  pip3 install --user psutil xgboost tornado cloudpickle

To make TVM run faster in tuning, it is recommended to use cython
as FFI of tvm. In the root directory of tvm, execute

.. code-block:: bash

  pip3 install --user cython
  sudo make cython3

Now return to python code. Import packages.


.. code-block:: default


    import logging
    import sys
    import numpy as np

    import tvm
    from tvm import te, topi, testing
    from tvm.topi.testing import conv2d_nchw_python
    import tvm.testing

    from tvm import autotvm


Step 1:  Define the search space
--------------------------------
There are plenty of useful schedule primitives in tvm. You can also find
some tutorials that describe them in more details, such as
(1). :ref:`opt-conv-gpu`
(2). `Optimizing DepthwiseConv on NVIDIA GPU <https://tvm.apache.org/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example>`_

However, their implementations are manually tuned for some special input
shapes. In this section, we build a large enough space to cover
the techniques used in these tutorials. Then we rely on the efficient auto-tuner
to search through this space and pick some good configurations.

If you are familiar with writing cuda schedule, you can find the following
template is very general. Actually this template can be easily modified
to tune other operators such as depthwise convolution and gemm.
In order to fully understand this template, you should be familiar with
the schedule primitives and auto tuning API. You can refer to the above
tutorials and :ref:`autotvm tutorial <tutorial-autotvm-matmul-x86>`

It is worth noting that the search space for a conv2d operator
can be very large (at the level of 10^9 for some input shapes)


.. code-block:: default


    @autotvm.template("tutorial/conv2d_no_batching")
    def conv2d_no_batching(N, H, W, CO, CI, KH, KW, stride, padding):
        assert N == 1, "Only consider batch_size = 1 in this template"

        data = te.placeholder((N, CI, H, W), name="data")
        kernel = te.placeholder((CO, CI, KH, KW), name="kernel")
        conv = topi.nn.conv2d_nchw(data, kernel, stride, padding, dilation=1, out_dtype="float32")
        s = te.create_schedule([conv.op])

        ##### space definition begin #####
        n, f, y, x = s[conv].op.axis
        rc, ry, rx = s[conv].op.reduce_axis

        cfg = autotvm.get_config()
        cfg.define_split("tile_f", f, num_outputs=4)
        cfg.define_split("tile_y", y, num_outputs=4)
        cfg.define_split("tile_x", x, num_outputs=4)
        cfg.define_split("tile_rc", rc, num_outputs=3)
        cfg.define_split("tile_ry", ry, num_outputs=3)
        cfg.define_split("tile_rx", rx, num_outputs=3)
        cfg.define_knob("auto_unroll_max_step", [0, 512, 1500])
        cfg.define_knob("unroll_explicit", [0, 1])
        ##### space definition end #####

        # inline padding
        pad_data = s[conv].op.input_tensors[0]
        s[pad_data].compute_inline()
        data, raw_data = pad_data, data

        output = conv
        OL = s.cache_write(conv, "local")

        # create cache stage
        AA = s.cache_read(data, "shared", [OL])
        WW = s.cache_read(kernel, "shared", [OL])
        AL = s.cache_read(AA, "local", [OL])
        WL = s.cache_read(WW, "local", [OL])

        # tile and bind spatial axes
        n, f, y, x = s[output].op.axis
        bf, vf, tf, fi = cfg["tile_f"].apply(s, output, f)
        by, vy, ty, yi = cfg["tile_y"].apply(s, output, y)
        bx, vx, tx, xi = cfg["tile_x"].apply(s, output, x)
        kernel_scope = n  # this is the scope to attach global config inside this kernel

        s[output].bind(bf, te.thread_axis("blockIdx.z"))
        s[output].bind(by, te.thread_axis("blockIdx.y"))
        s[output].bind(bx, te.thread_axis("blockIdx.x"))
        s[output].bind(vf, te.thread_axis("vthread"))
        s[output].bind(vy, te.thread_axis("vthread"))
        s[output].bind(vx, te.thread_axis("vthread"))
        s[output].bind(tf, te.thread_axis("threadIdx.z"))
        s[output].bind(ty, te.thread_axis("threadIdx.y"))
        s[output].bind(tx, te.thread_axis("threadIdx.x"))
        s[output].reorder(n, bf, by, bx, vf, vy, vx, tf, ty, tx, fi, yi, xi)
        s[OL].compute_at(s[output], tx)

        # tile reduction axes
        n, f, y, x = s[OL].op.axis
        rc, ry, rx = s[OL].op.reduce_axis
        rco, rcm, rci = cfg["tile_rc"].apply(s, OL, rc)
        ryo, rym, ryi = cfg["tile_rx"].apply(s, OL, ry)
        rxo, rxm, rxi = cfg["tile_ry"].apply(s, OL, rx)
        s[OL].reorder(rco, ryo, rxo, rcm, rym, rxm, rci, ryi, rxi, n, f, y, x)

        s[AA].compute_at(s[OL], rxo)
        s[WW].compute_at(s[OL], rxo)
        s[AL].compute_at(s[OL], rxm)
        s[WL].compute_at(s[OL], rxm)

        # cooperative fetching
        for load in [AA, WW]:
            n, f, y, x = s[load].op.axis
            fused = s[load].fuse(n, f, y, x)
            tz, fused = s[load].split(fused, nparts=cfg["tile_f"].size[2])
            ty, fused = s[load].split(fused, nparts=cfg["tile_y"].size[2])
            tx, fused = s[load].split(fused, nparts=cfg["tile_x"].size[2])
            s[load].bind(tz, te.thread_axis("threadIdx.z"))
            s[load].bind(ty, te.thread_axis("threadIdx.y"))
            s[load].bind(tx, te.thread_axis("threadIdx.x"))

        # tune unroll
        s[output].pragma(kernel_scope, "auto_unroll_max_step", cfg["auto_unroll_max_step"].val)
        s[output].pragma(kernel_scope, "unroll_explicit", cfg["unroll_explicit"].val)

        return s, [raw_data, kernel, conv]


Step 2:  Search through the space
---------------------------------
We pick the last layer on resnet as test case.
Since our space is very large, :code:`XGBoostTuner` is most suitable
for our case. Here we only do 20 trials for demonstration.
In practice, making 1000 trials usually can find some good kernels
for this template


.. code-block:: default


    # logging config (for printing tuning log to screen)
    logging.getLogger("autotvm").setLevel(logging.DEBUG)
    logging.getLogger("autotvm").addHandler(logging.StreamHandler(sys.stdout))

    # the last layer in resnet
    N, H, W, CO, CI, KH, KW, strides, padding = 1, 7, 7, 512, 512, 3, 3, (1, 1), (1, 1)
    task = autotvm.task.create(
        "tutorial/conv2d_no_batching", args=(N, H, W, CO, CI, KH, KW, strides, padding), target="cuda"
    )
    print(task.config_space)

    # Use local gpu, measure 10 times for every config to reduce variance
    # The timeout of compiling a program is 10 seconds, the timeout for running is 4 seconds
    measure_option = autotvm.measure_option(
        builder=autotvm.LocalBuilder(),
        runner=autotvm.LocalRunner(repeat=3, min_repeat_ms=100, timeout=4),
    )

    # Begin tuning, log records to file `conv2d.log`
    # During tuning we will also try many invalid configs, so you are expected to
    # see many error reports. As long as you can see non-zero GFLOPS, it is okay.
    tuner = autotvm.tuner.XGBTuner(task)
    tuner.tune(
        n_trial=20,
        measure_option=measure_option,
        callbacks=[autotvm.callback.log_to_file("conv2d.log")],
    )


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    ConfigSpace (len=10454400, space_map=
       0 tile_f: Split(policy=factors, product=512, num_outputs=4) len=220
       1 tile_y: Split(policy=factors, product=7, num_outputs=4) len=4
       2 tile_x: Split(policy=factors, product=7, num_outputs=4) len=4
       3 tile_rc: Split(policy=factors, product=512, num_outputs=3) len=55
       4 tile_ry: Split(policy=factors, product=3, num_outputs=3) len=3
       5 tile_rx: Split(policy=factors, product=3, num_outputs=3) len=3
       6 auto_unroll_max_step: OtherOption([0, 512, 1500]) len=3
       7 unroll_explicit: OtherOption([0, 1]) len=2
    )
    Get devices for measurement successfully!
    No: 1   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  24: TVMFuncCall\n        at /workspace/src/runtime/c_runtime_api.cc:474\n  23: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /workspace/include/tvm/runtime/packed_func.h:1151\n  22: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  21: operator()\n        at /workspace/include/tvm/runtime/packed_func.h:1480\n  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >\n        at /workspace/include/tvm/runtime/packed_func.h:1421\n  19: run<>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  18: run<tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1397\n  13: operator()\n        at /workspace/src/driver/driver_api.cc:395\n  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)\n        at /workspace/src/driver/driver_api.cc:382\n  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)\n        at /workspace/src/driver/driver_api.cc:282\n  10: tvm::transform::Pass::operator()(tvm::IRModule) const\n        at /workspace/src/ir/transform.cc:255\n  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:487\n  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/tir/ir/transform.cc:100\n  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const\n        at /workspace/include/tvm/runtime/packed_func.h:1498\n  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)\n        at /workspace/include/tvm/runtime/packed_func.h:1444\n  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const\n        at /workspace/include/tvm/runtime/packed_func.h:1369\n  2: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  1: _M_invoke\n        at /usr/include/c++/7/bits/std_function.h:316\n  0: operator()\n        at /workspace/src/runtime/c_runtime_api.cc:525\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback\n  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel',),), error_no=MeasureErrorNo.INSTANTIATION_ERROR, all_cost=2.9277219772338867, timestamp=1636191411.7611656)    [('tile_f', [-1, 4, 4, 2]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 128, 2]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)],None,7999494
    No: 2   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  24: TVMFuncCall\n        at /workspace/src/runtime/c_runtime_api.cc:474\n  23: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /workspace/include/tvm/runtime/packed_func.h:1151\n  22: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  21: operator()\n        at /workspace/include/tvm/runtime/packed_func.h:1480\n  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >\n        at /workspace/include/tvm/runtime/packed_func.h:1421\n  19: run<>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  18: run<tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1397\n  13: operator()\n        at /workspace/src/driver/driver_api.cc:395\n  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)\n        at /workspace/src/driver/driver_api.cc:382\n  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)\n        at /workspace/src/driver/driver_api.cc:282\n  10: tvm::transform::Pass::operator()(tvm::IRModule) const\n        at /workspace/src/ir/transform.cc:255\n  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:487\n  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/tir/ir/transform.cc:100\n  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const\n        at /workspace/include/tvm/runtime/packed_func.h:1498\n  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)\n        at /workspace/include/tvm/runtime/packed_func.h:1444\n  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const\n        at /workspace/include/tvm/runtime/packed_func.h:1369\n  2: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  1: _M_invoke\n        at /usr/include/c++/7/bits/std_function.h:316\n  0: operator()\n        at /workspace/src/runtime/c_runtime_api.cc:525\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback\n  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel',),), error_no=MeasureErrorNo.INSTANTIATION_ERROR, all_cost=2.882194757461548, timestamp=1636191411.7611866)     [('tile_f', [-1, 1, 8, 2]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 1, 64]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,5194279
    No: 3   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  24: TVMFuncCall\n        at /workspace/src/runtime/c_runtime_api.cc:474\n  23: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /workspace/include/tvm/runtime/packed_func.h:1151\n  22: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  21: operator()\n        at /workspace/include/tvm/runtime/packed_func.h:1480\n  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >\n        at /workspace/include/tvm/runtime/packed_func.h:1421\n  19: run<>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  18: run<tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1397\n  13: operator()\n        at /workspace/src/driver/driver_api.cc:395\n  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)\n        at /workspace/src/driver/driver_api.cc:382\n  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)\n        at /workspace/src/driver/driver_api.cc:282\n  10: tvm::transform::Pass::operator()(tvm::IRModule) const\n        at /workspace/src/ir/transform.cc:255\n  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:487\n  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/tir/ir/transform.cc:100\n  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const\n        at /workspace/include/tvm/runtime/packed_func.h:1498\n  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)\n        at /workspace/include/tvm/runtime/packed_func.h:1444\n  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const\n        at /workspace/include/tvm/runtime/packed_func.h:1369\n  2: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  1: _M_invoke\n        at /usr/include/c++/7/bits/std_function.h:316\n  0: operator()\n        at /workspace/src/runtime/c_runtime_api.cc:525\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback\n  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel',),), error_no=MeasureErrorNo.INSTANTIATION_ERROR, all_cost=2.8102121353149414, timestamp=1636191411.7611966)    [('tile_f', [-1, 8, 32, 1]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 2, 64]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9069983
    No: 4   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  24: TVMFuncCall\n        at /workspace/src/runtime/c_runtime_api.cc:474\n  23: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /workspace/include/tvm/runtime/packed_func.h:1151\n  22: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  21: operator()\n        at /workspace/include/tvm/runtime/packed_func.h:1480\n  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >\n        at /workspace/include/tvm/runtime/packed_func.h:1421\n  19: run<>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  18: run<tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1397\n  13: operator()\n        at /workspace/src/driver/driver_api.cc:395\n  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)\n        at /workspace/src/driver/driver_api.cc:382\n  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)\n        at /workspace/src/driver/driver_api.cc:282\n  10: tvm::transform::Pass::operator()(tvm::IRModule) const\n        at /workspace/src/ir/transform.cc:255\n  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:487\n  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/tir/ir/transform.cc:100\n  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const\n        at /workspace/include/tvm/runtime/packed_func.h:1498\n  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)\n        at /workspace/include/tvm/runtime/packed_func.h:1444\n  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const\n        at /workspace/include/tvm/runtime/packed_func.h:1369\n  2: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  1: _M_invoke\n        at /usr/include/c++/7/bits/std_function.h:316\n  0: operator()\n        at /workspace/src/runtime/c_runtime_api.cc:525\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback\n  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel',),), error_no=MeasureErrorNo.INSTANTIATION_ERROR, all_cost=2.673253297805786, timestamp=1636191411.761205)      [('tile_f', [-1, 16, 16, 1]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 16, 32]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,736818
    No: 5   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  24: TVMFuncCall\n        at /workspace/src/runtime/c_runtime_api.cc:474\n  23: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /workspace/include/tvm/runtime/packed_func.h:1151\n  22: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  21: operator()\n        at /workspace/include/tvm/runtime/packed_func.h:1480\n  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >\n        at /workspace/include/tvm/runtime/packed_func.h:1421\n  19: run<>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  18: run<tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1397\n  13: operator()\n        at /workspace/src/driver/driver_api.cc:395\n  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)\n        at /workspace/src/driver/driver_api.cc:382\n  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)\n        at /workspace/src/driver/driver_api.cc:282\n  10: tvm::transform::Pass::operator()(tvm::IRModule) const\n        at /workspace/src/ir/transform.cc:255\n  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:487\n  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/tir/ir/transform.cc:100\n  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const\n        at /workspace/include/tvm/runtime/packed_func.h:1498\n  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)\n        at /workspace/include/tvm/runtime/packed_func.h:1444\n  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const\n        at /workspace/include/tvm/runtime/packed_func.h:1369\n  2: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  1: _M_invoke\n        at /usr/include/c++/7/bits/std_function.h:316\n  0: operator()\n        at /workspace/src/runtime/c_runtime_api.cc:525\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback\n  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel',),), error_no=MeasureErrorNo.INSTANTIATION_ERROR, all_cost=0.014344453811645508, timestamp=1636191413.4318082)  [('tile_f', [-1, 4, 4, 32]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 1, 128]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2885496
    No: 6   GFLOPS: 60.26/60.26     result: MeasureResult(costs=(0.003841781068965517,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.627082109451294, timestamp=1636191415.8784292)        [('tile_f', [-1, 1, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 4]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,3754080
    No: 7   GFLOPS: 0.00/60.26      result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  24: TVMFuncCall\n        at /workspace/src/runtime/c_runtime_api.cc:474\n  23: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /workspace/include/tvm/runtime/packed_func.h:1151\n  22: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  21: operator()\n        at /workspace/include/tvm/runtime/packed_func.h:1480\n  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >\n        at /workspace/include/tvm/runtime/packed_func.h:1421\n  19: run<>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  18: run<tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1397\n  13: operator()\n        at /workspace/src/driver/driver_api.cc:395\n  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)\n        at /workspace/src/driver/driver_api.cc:382\n  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)\n        at /workspace/src/driver/driver_api.cc:282\n  10: tvm::transform::Pass::operator()(tvm::IRModule) const\n        at /workspace/src/ir/transform.cc:255\n  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:487\n  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/tir/ir/transform.cc:100\n  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const\n        at /workspace/include/tvm/runtime/packed_func.h:1498\n  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)\n        at /workspace/include/tvm/runtime/packed_func.h:1444\n  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const\n        at /workspace/include/tvm/runtime/packed_func.h:1369\n  2: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  1: _M_invoke\n        at /usr/include/c++/7/bits/std_function.h:316\n  0: operator()\n        at /workspace/src/runtime/c_runtime_api.cc:525\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback\n  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel',),), error_no=MeasureErrorNo.INSTANTIATION_ERROR, all_cost=0.012397050857543945, timestamp=1636191414.2555656)  [('tile_f', [-1, 1, 16, 32]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 256, 1]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6225319
    No: 8   GFLOPS: 0.00/60.26      result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  24: TVMFuncCall\n        at /workspace/src/runtime/c_runtime_api.cc:474\n  23: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /workspace/include/tvm/runtime/packed_func.h:1151\n  22: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  21: operator()\n        at /workspace/include/tvm/runtime/packed_func.h:1480\n  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >\n        at /workspace/include/tvm/runtime/packed_func.h:1421\n  19: run<>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  18: run<tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1397\n  13: operator()\n        at /workspace/src/driver/driver_api.cc:395\n  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)\n        at /workspace/src/driver/driver_api.cc:382\n  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)\n        at /workspace/src/driver/driver_api.cc:282\n  10: tvm::transform::Pass::operator()(tvm::IRModule) const\n        at /workspace/src/ir/transform.cc:255\n  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:487\n  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/tir/ir/transform.cc:100\n  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const\n        at /workspace/include/tvm/runtime/packed_func.h:1498\n  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)\n        at /workspace/include/tvm/runtime/packed_func.h:1444\n  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const\n        at /workspace/include/tvm/runtime/packed_func.h:1369\n  2: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  1: _M_invoke\n        at /usr/include/c++/7/bits/std_function.h:316\n  0: operator()\n        at /workspace/src/runtime/c_runtime_api.cc:525\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback\n  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel',),), error_no=MeasureErrorNo.INSTANTIATION_ERROR, all_cost=0.013075590133666992, timestamp=1636191414.2555819)  [('tile_f', [-1, 2, 1, 32]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 8, 64]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,943546
    No: 9   GFLOPS: 0.00/60.26      result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  24: TVMFuncCall\n        at /workspace/src/runtime/c_runtime_api.cc:474\n  23: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /workspace/include/tvm/runtime/packed_func.h:1151\n  22: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  21: operator()\n        at /workspace/include/tvm/runtime/packed_func.h:1480\n  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >\n        at /workspace/include/tvm/runtime/packed_func.h:1421\n  19: run<>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  18: run<tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1397\n  13: operator()\n        at /workspace/src/driver/driver_api.cc:395\n  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)\n        at /workspace/src/driver/driver_api.cc:382\n  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)\n        at /workspace/src/driver/driver_api.cc:282\n  10: tvm::transform::Pass::operator()(tvm::IRModule) const\n        at /workspace/src/ir/transform.cc:255\n  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:487\n  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/tir/ir/transform.cc:100\n  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const\n        at /workspace/include/tvm/runtime/packed_func.h:1498\n  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)\n        at /workspace/include/tvm/runtime/packed_func.h:1444\n  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const\n        at /workspace/include/tvm/runtime/packed_func.h:1369\n  2: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  1: _M_invoke\n        at /usr/include/c++/7/bits/std_function.h:316\n  0: operator()\n        at /workspace/src/runtime/c_runtime_api.cc:525\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback\n  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel',),), error_no=MeasureErrorNo.INSTANTIATION_ERROR, all_cost=0.010488510131835938, timestamp=1636191416.844652)   [('tile_f', [-1, 4, 16, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 16, 32]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2868708
    No: 10  GFLOPS: 0.00/60.26      result: MeasureResult(costs=(TimeoutError(),), error_no=MeasureErrorNo.BUILD_TIMEOUT, all_cost=10, timestamp=1636191426.8456671)        [('tile_f', [-1, 32, 2, 4]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 4, 2]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4691833
    No: 11  GFLOPS: 0.00/60.26      result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  24: TVMFuncCall\n        at /workspace/src/runtime/c_runtime_api.cc:474\n  23: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /workspace/include/tvm/runtime/packed_func.h:1151\n  22: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  21: operator()\n        at /workspace/include/tvm/runtime/packed_func.h:1480\n  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >\n        at /workspace/include/tvm/runtime/packed_func.h:1421\n  19: run<>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  18: run<tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1397\n  13: operator()\n        at /workspace/src/driver/driver_api.cc:395\n  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)\n        at /workspace/src/driver/driver_api.cc:382\n  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)\n        at /workspace/src/driver/driver_api.cc:282\n  10: tvm::transform::Pass::operator()(tvm::IRModule) const\n        at /workspace/src/ir/transform.cc:255\n  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:487\n  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/tir/ir/transform.cc:100\n  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const\n        at /workspace/include/tvm/runtime/packed_func.h:1498\n  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)\n        at /workspace/include/tvm/runtime/packed_func.h:1444\n  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const\n        at /workspace/include/tvm/runtime/packed_func.h:1369\n  2: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  1: _M_invoke\n        at /usr/include/c++/7/bits/std_function.h:316\n  0: operator()\n        at /workspace/src/runtime/c_runtime_api.cc:525\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback\n  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel',),), error_no=MeasureErrorNo.INSTANTIATION_ERROR, all_cost=0.01113438606262207, timestamp=1636191426.8457015)   [('tile_f', [-1, 1, 2, 64]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 4, 4]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,1042124
    No: 12  GFLOPS: 0.00/60.26      result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  24: TVMFuncCall\n        at /workspace/src/runtime/c_runtime_api.cc:474\n  23: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /workspace/include/tvm/runtime/packed_func.h:1151\n  22: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  21: operator()\n        at /workspace/include/tvm/runtime/packed_func.h:1480\n  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >\n        at /workspace/include/tvm/runtime/packed_func.h:1421\n  19: run<>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  18: run<tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1397\n  13: operator()\n        at /workspace/src/driver/driver_api.cc:395\n  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)\n        at /workspace/src/driver/driver_api.cc:382\n  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)\n        at /workspace/src/driver/driver_api.cc:282\n  10: tvm::transform::Pass::operator()(tvm::IRModule) const\n        at /workspace/src/ir/transform.cc:255\n  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:487\n  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/tir/ir/transform.cc:100\n  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const\n        at /workspace/include/tvm/runtime/packed_func.h:1498\n  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)\n        at /workspace/include/tvm/runtime/packed_func.h:1444\n  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const\n        at /workspace/include/tvm/runtime/packed_func.h:1369\n  2: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  1: _M_invoke\n        at /usr/include/c++/7/bits/std_function.h:316\n  0: operator()\n        at /workspace/src/runtime/c_runtime_api.cc:525\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback\n  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel',),), error_no=MeasureErrorNo.INSTANTIATION_ERROR, all_cost=0.03705739974975586, timestamp=1636191426.845718)    [('tile_f', [-1, 32, 1, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 32, 16]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,10013405
    No: 13  GFLOPS: 0.00/60.26      result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  24: TVMFuncCall\n        at /workspace/src/runtime/c_runtime_api.cc:474\n  23: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /workspace/include/tvm/runtime/packed_func.h:1151\n  22: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  21: operator()\n        at /workspace/include/tvm/runtime/packed_func.h:1480\n  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >\n        at /workspace/include/tvm/runtime/packed_func.h:1421\n  19: run<>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  18: run<tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1397\n  13: operator()\n        at /workspace/src/driver/driver_api.cc:395\n  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)\n        at /workspace/src/driver/driver_api.cc:382\n  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)\n        at /workspace/src/driver/driver_api.cc:282\n  10: tvm::transform::Pass::operator()(tvm::IRModule) const\n        at /workspace/src/ir/transform.cc:255\n  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:487\n  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/tir/ir/transform.cc:100\n  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const\n        at /workspace/include/tvm/runtime/packed_func.h:1498\n  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)\n        at /workspace/include/tvm/runtime/packed_func.h:1444\n  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const\n        at /workspace/include/tvm/runtime/packed_func.h:1369\n  2: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  1: _M_invoke\n        at /usr/include/c++/7/bits/std_function.h:316\n  0: operator()\n        at /workspace/src/runtime/c_runtime_api.cc:525\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback\n  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel',),), error_no=MeasureErrorNo.INSTANTIATION_ERROR, all_cost=0.018039941787719727, timestamp=1636191427.0230396)  [('tile_f', [-1, 8, 8, 2]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 32]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6732082
    No: 14  GFLOPS: 0.00/60.26      result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  24: TVMFuncCall\n        at /workspace/src/runtime/c_runtime_api.cc:474\n  23: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /workspace/include/tvm/runtime/packed_func.h:1151\n  22: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  21: operator()\n        at /workspace/include/tvm/runtime/packed_func.h:1480\n  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >\n        at /workspace/include/tvm/runtime/packed_func.h:1421\n  19: run<>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  18: run<tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1397\n  13: operator()\n        at /workspace/src/driver/driver_api.cc:395\n  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)\n        at /workspace/src/driver/driver_api.cc:382\n  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)\n        at /workspace/src/driver/driver_api.cc:282\n  10: tvm::transform::Pass::operator()(tvm::IRModule) const\n        at /workspace/src/ir/transform.cc:255\n  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:487\n  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/tir/ir/transform.cc:100\n  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const\n        at /workspace/include/tvm/runtime/packed_func.h:1498\n  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)\n        at /workspace/include/tvm/runtime/packed_func.h:1444\n  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const\n        at /workspace/include/tvm/runtime/packed_func.h:1369\n  2: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  1: _M_invoke\n        at /usr/include/c++/7/bits/std_function.h:316\n  0: operator()\n        at /workspace/src/runtime/c_runtime_api.cc:525\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback\n  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel',),), error_no=MeasureErrorNo.INSTANTIATION_ERROR, all_cost=0.02526068687438965, timestamp=1636191427.023084)    [('tile_f', [-1, 2, 4, 32]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 4, 128]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)],None,7536735
    No: 15  GFLOPS: 0.00/60.26      result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  24: TVMFuncCall\n        at /workspace/src/runtime/c_runtime_api.cc:474\n  23: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /workspace/include/tvm/runtime/packed_func.h:1151\n  22: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  21: operator()\n        at /workspace/include/tvm/runtime/packed_func.h:1480\n  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >\n        at /workspace/include/tvm/runtime/packed_func.h:1421\n  19: run<>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  18: run<tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1397\n  13: operator()\n        at /workspace/src/driver/driver_api.cc:395\n  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)\n        at /workspace/src/driver/driver_api.cc:382\n  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)\n        at /workspace/src/driver/driver_api.cc:282\n  10: tvm::transform::Pass::operator()(tvm::IRModule) const\n        at /workspace/src/ir/transform.cc:255\n  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:487\n  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/tir/ir/transform.cc:100\n  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const\n        at /workspace/include/tvm/runtime/packed_func.h:1498\n  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)\n        at /workspace/include/tvm/runtime/packed_func.h:1444\n  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const\n        at /workspace/include/tvm/runtime/packed_func.h:1369\n  2: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  1: _M_invoke\n        at /usr/include/c++/7/bits/std_function.h:316\n  0: operator()\n        at /workspace/src/runtime/c_runtime_api.cc:525\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback\n  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel',),), error_no=MeasureErrorNo.INSTANTIATION_ERROR, all_cost=0.016422510147094727, timestamp=1636191427.0230944)  [('tile_f', [-1, 2, 1, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 128, 4]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,482121
    No: 16  GFLOPS: 0.00/60.26      result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  24: TVMFuncCall\n        at /workspace/src/runtime/c_runtime_api.cc:474\n  23: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /workspace/include/tvm/runtime/packed_func.h:1151\n  22: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  21: operator()\n        at /workspace/include/tvm/runtime/packed_func.h:1480\n  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >\n        at /workspace/include/tvm/runtime/packed_func.h:1421\n  19: run<>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  18: run<tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1397\n  13: operator()\n        at /workspace/src/driver/driver_api.cc:395\n  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)\n        at /workspace/src/driver/driver_api.cc:382\n  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)\n        at /workspace/src/driver/driver_api.cc:282\n  10: tvm::transform::Pass::operator()(tvm::IRModule) const\n        at /workspace/src/ir/transform.cc:255\n  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:487\n  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/tir/ir/transform.cc:100\n  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const\n        at /workspace/include/tvm/runtime/packed_func.h:1498\n  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)\n        at /workspace/include/tvm/runtime/packed_func.h:1444\n  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const\n        at /workspace/include/tvm/runtime/packed_func.h:1369\n  2: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  1: _M_invoke\n        at /usr/include/c++/7/bits/std_function.h:316\n  0: operator()\n        at /workspace/src/runtime/c_runtime_api.cc:525\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback\n  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel',),), error_no=MeasureErrorNo.INSTANTIATION_ERROR, all_cost=0.2211320400238037, timestamp=1636191427.9682615)    [('tile_f', [-1, 2, 1, 16]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 32, 8]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2824525
    No: 17  GFLOPS: 0.00/60.26      result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  24: TVMFuncCall\n        at /workspace/src/runtime/c_runtime_api.cc:474\n  23: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /workspace/include/tvm/runtime/packed_func.h:1151\n  22: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  21: operator()\n        at /workspace/include/tvm/runtime/packed_func.h:1480\n  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >\n        at /workspace/include/tvm/runtime/packed_func.h:1421\n  19: run<>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  18: run<tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1397\n  13: operator()\n        at /workspace/src/driver/driver_api.cc:395\n  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)\n        at /workspace/src/driver/driver_api.cc:382\n  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)\n        at /workspace/src/driver/driver_api.cc:282\n  10: tvm::transform::Pass::operator()(tvm::IRModule) const\n        at /workspace/src/ir/transform.cc:255\n  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:487\n  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/tir/ir/transform.cc:100\n  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const\n        at /workspace/include/tvm/runtime/packed_func.h:1498\n  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)\n        at /workspace/include/tvm/runtime/packed_func.h:1444\n  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const\n        at /workspace/include/tvm/runtime/packed_func.h:1369\n  2: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  1: _M_invoke\n        at /usr/include/c++/7/bits/std_function.h:316\n  0: operator()\n        at /workspace/src/runtime/c_runtime_api.cc:525\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback\n  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel',),), error_no=MeasureErrorNo.INSTANTIATION_ERROR, all_cost=0.010805130004882812, timestamp=1636191428.0713048)  [('tile_f', [-1, 64, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 8, 8]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4559286
    No: 18  GFLOPS: 0.00/60.26      result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  24: TVMFuncCall\n        at /workspace/src/runtime/c_runtime_api.cc:474\n  23: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /workspace/include/tvm/runtime/packed_func.h:1151\n  22: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  21: operator()\n        at /workspace/include/tvm/runtime/packed_func.h:1480\n  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >\n        at /workspace/include/tvm/runtime/packed_func.h:1421\n  19: run<>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  18: run<tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1382\n  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>\n        at /workspace/include/tvm/runtime/packed_func.h:1397\n  13: operator()\n        at /workspace/src/driver/driver_api.cc:395\n  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)\n        at /workspace/src/driver/driver_api.cc:382\n  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)\n        at /workspace/src/driver/driver_api.cc:282\n  10: tvm::transform::Pass::operator()(tvm::IRModule) const\n        at /workspace/src/ir/transform.cc:255\n  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:487\n  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/ir/transform.cc:267\n  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const\n        at /workspace/src/tir/ir/transform.cc:100\n  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const\n        at /workspace/include/tvm/runtime/packed_func.h:1498\n  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)\n        at /workspace/include/tvm/runtime/packed_func.h:1444\n  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const\n        at /workspace/include/tvm/runtime/packed_func.h:1369\n  2: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const\n        at /usr/include/c++/7/bits/std_function.h:706\n  1: _M_invoke\n        at /usr/include/c++/7/bits/std_function.h:316\n  0: operator()\n        at /workspace/src/runtime/c_runtime_api.cc:525\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback\n  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel',),), error_no=MeasureErrorNo.INSTANTIATION_ERROR, all_cost=0.031464576721191406, timestamp=1636191428.104853)   [('tile_f', [-1, 1, 32, 16]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 1, 512]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9677544
    No: 19  GFLOPS: 0.00/60.26      result: MeasureResult(costs=(RuntimeError('Traceback (most recent call last):\n  60: 0xffffffffffffffff\n  59: _start\n  58: __libc_start_main\n  57: main\n  56: Py_Main\n  55: 0x000000000063886a\n  54: PyObject_Call\n  53: 0x000000000058945c\n  52: 0x0000000000507cd3\n  51: _PyEval_EvalFrameDefault\n  50: 0x000000000050a3fc\n  49: 0x00000000005099ff\n  48: 0x0000000000507cd3\n  47: _PyEval_EvalFrameDefault\n  46: 0x000000000050a22e\n  45: 0x0000000000516284\n  44: 0x0000000000507cd3\n  43: _PyEval_EvalFrameDefault\n  42: 0x000000000050a3fc\n  41: 0x00000000005099ff\n  40: 0x0000000000507cd3\n  39: _PyEval_EvalFrameDefault\n  38: PyObject_Call\n  37: 0x00000000005893d9\n  36: 0x0000000000507cd3\n  35: _PyEval_EvalFrameDefault\n  34: PyObject_CallFunctionObjArgs\n  33: _PyObject_FastCallDict\n  32: 0x0000000000595220\n  31: _PyFunction_FastCallDict\n  30: _PyEval_EvalFrameDefault\n  29: 0x000000000050a22e\n  28: 0x00000000006272e8\n  27: 0x00000000006270cf\n  26: 0x000000000058ea77\n  25: _PyEval_EvalFrameDefault\n  24: 0x000000000050a3fc\n  23: 0x00000000005096c7\n  22: _PyEval_EvalFrameDe',),), error_no=MeasureErrorNo.RUNTIME_DEVICE, all_cost=7.474576711654663, timestamp=1636191435.588759)       [('tile_f', [-1, 8, 2, 16]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 1, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6390073
    No: 20  GFLOPS: 144.48/144.48   result: MeasureResult(costs=(0.001602333111111111,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.0883064270019531, timestamp=1636191436.3377938)       [('tile_f', [-1, 1, 4, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9881539


Finally we can inspect the best config from log file, check correctness,
and measure running time.


.. code-block:: default


    # inspect the best config
    dispatch_context = autotvm.apply_history_best("conv2d.log")
    best_config = dispatch_context.query(task.target, task.workload)
    print("\nBest config:")
    print(best_config)

    # apply history best from log file
    with autotvm.apply_history_best("conv2d.log"):
        with tvm.target.Target("cuda"):
            s, arg_bufs = conv2d_no_batching(N, H, W, CO, CI, KH, KW, strides, padding)
            func = tvm.build(s, arg_bufs)

    # check correctness
    a_np = np.random.uniform(size=(N, CI, H, W)).astype(np.float32)
    w_np = np.random.uniform(size=(CO, CI, KH, KW)).astype(np.float32)
    c_np = conv2d_nchw_python(a_np, w_np, strides, padding)

    dev = tvm.cuda()
    a_tvm = tvm.nd.array(a_np, device=dev)
    w_tvm = tvm.nd.array(w_np, device=dev)
    c_tvm = tvm.nd.empty(c_np.shape, device=dev)
    func(a_tvm, w_tvm, c_tvm)

    tvm.testing.assert_allclose(c_np, c_tvm.numpy(), rtol=1e-2)

    # Evaluate running time. Here we choose a large repeat number (400) to reduce the noise
    # and the overhead of kernel launch. You can also use nvprof to validate the result.
    evaluator = func.time_evaluator(func.entry_name, dev, number=400)
    print("Time cost of this operator: %f" % evaluator(a_tvm, w_tvm, c_tvm).mean)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none


    Best config:
    [('tile_f', [-1, 1, 4, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9881539
    Time cost of this operator: 0.001954


.. _sphx_glr_download_how_to_tune_with_autotvm_tune_conv2d_cuda.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download

     :download:`Download Python source code: tune_conv2d_cuda.py <tune_conv2d_cuda.py>`


  .. container:: sphx-glr-download

     :download:`Download Jupyter notebook: tune_conv2d_cuda.ipynb <tune_conv2d_cuda.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_