Tuning High Performance Convolution on NVIDIA GPUs

Author: Lianmin Zheng

This is an advanced tutorial for writing high performance tunable template for NVIDIA GPU. By running auto-tuner on this template, we can outperform the vendor provided library CuDNN in many cases.

Note that this tutorial will not run on Windows or recent versions of macOS. To get it to run, you will need to wrap the body of this tutorial in a if __name__ == "__main__": block.

Install dependencies

To use autotvm package in tvm, we need to install some extra dependencies. (change “3” to “2” if you use python2):

pip3 install --user psutil xgboost tornado cloudpickle

To make TVM run faster in tuning, it is recommended to use cython as FFI of tvm. In the root directory of tvm, execute

pip3 install --user cython
sudo make cython3

Now return to python code. Import packages.

import logging
import sys
import numpy as np

import tvm
from tvm import te, topi, testing
from tvm.topi.testing import conv2d_nchw_python
import tvm.testing

from tvm import autotvm

Step 1: Define the search space

There are plenty of useful schedule primitives in tvm. You can also find some tutorials that describe them in more details, such as (1). How to optimize convolution on GPU (2). Optimizing DepthwiseConv on NVIDIA GPU

However, their implementations are manually tuned for some special input shapes. In this section, we build a large enough space to cover the techniques used in these tutorials. Then we rely on the efficient auto-tuner to search through this space and pick some good configurations.

If you are familiar with writing cuda schedule, you can find the following template is very general. Actually this template can be easily modified to tune other operators such as depthwise convolution and GEMM. In order to fully understand this template, you should be familiar with the schedule primitives and auto tuning API. You can refer to the above tutorials and autotvm tutorial

It is worth noting that the search space for a conv2d operator can be very large (at the level of 10^9 for some input shapes)

@autotvm.template("tutorial/conv2d_no_batching")
def conv2d_no_batching(N, H, W, CO, CI, KH, KW, stride, padding):
    assert N == 1, "Only consider batch_size = 1 in this template"

    data = te.placeholder((N, CI, H, W), name="data")
    kernel = te.placeholder((CO, CI, KH, KW), name="kernel")
    conv = topi.nn.conv2d_nchw(data, kernel, stride, padding, dilation=1, out_dtype="float32")
    s = te.create_schedule([conv.op])

    ##### space definition begin #####
    n, f, y, x = s[conv].op.axis
    rc, ry, rx = s[conv].op.reduce_axis

    cfg = autotvm.get_config()
    cfg.define_split("tile_f", f, num_outputs=4)
    cfg.define_split("tile_y", y, num_outputs=4)
    cfg.define_split("tile_x", x, num_outputs=4)
    cfg.define_split("tile_rc", rc, num_outputs=3)
    cfg.define_split("tile_ry", ry, num_outputs=3)
    cfg.define_split("tile_rx", rx, num_outputs=3)
    cfg.define_knob("auto_unroll_max_step", [0, 512, 1500])
    cfg.define_knob("unroll_explicit", [0, 1])
    ##### space definition end #####

    # inline padding
    pad_data = s[conv].op.input_tensors[0]
    s[pad_data].compute_inline()
    data, raw_data = pad_data, data

    output = conv
    OL = s.cache_write(conv, "local")

    # create cache stage
    AA = s.cache_read(data, "shared", [OL])
    WW = s.cache_read(kernel, "shared", [OL])
    AL = s.cache_read(AA, "local", [OL])
    WL = s.cache_read(WW, "local", [OL])

    # tile and bind spatial axes
    n, f, y, x = s[output].op.axis
    bf, vf, tf, fi = cfg["tile_f"].apply(s, output, f)
    by, vy, ty, yi = cfg["tile_y"].apply(s, output, y)
    bx, vx, tx, xi = cfg["tile_x"].apply(s, output, x)
    kernel_scope = n  # this is the scope to attach global config inside this kernel

    s[output].bind(bf, te.thread_axis("blockIdx.z"))
    s[output].bind(by, te.thread_axis("blockIdx.y"))
    s[output].bind(bx, te.thread_axis("blockIdx.x"))
    s[output].bind(vf, te.thread_axis("vthread"))
    s[output].bind(vy, te.thread_axis("vthread"))
    s[output].bind(vx, te.thread_axis("vthread"))
    s[output].bind(tf, te.thread_axis("threadIdx.z"))
    s[output].bind(ty, te.thread_axis("threadIdx.y"))
    s[output].bind(tx, te.thread_axis("threadIdx.x"))
    s[output].reorder(n, bf, by, bx, vf, vy, vx, tf, ty, tx, fi, yi, xi)
    s[OL].compute_at(s[output], tx)

    # tile reduction axes
    n, f, y, x = s[OL].op.axis
    rc, ry, rx = s[OL].op.reduce_axis
    rco, rcm, rci = cfg["tile_rc"].apply(s, OL, rc)
    ryo, rym, ryi = cfg["tile_rx"].apply(s, OL, ry)
    rxo, rxm, rxi = cfg["tile_ry"].apply(s, OL, rx)
    s[OL].reorder(rco, ryo, rxo, rcm, rym, rxm, rci, ryi, rxi, n, f, y, x)

    s[AA].compute_at(s[OL], rxo)
    s[WW].compute_at(s[OL], rxo)
    s[AL].compute_at(s[OL], rxm)
    s[WL].compute_at(s[OL], rxm)

    # cooperative fetching
    for load in [AA, WW]:
        n, f, y, x = s[load].op.axis
        fused = s[load].fuse(n, f, y, x)
        tz, fused = s[load].split(fused, nparts=cfg["tile_f"].size[2])
        ty, fused = s[load].split(fused, nparts=cfg["tile_y"].size[2])
        tx, fused = s[load].split(fused, nparts=cfg["tile_x"].size[2])
        s[load].bind(tz, te.thread_axis("threadIdx.z"))
        s[load].bind(ty, te.thread_axis("threadIdx.y"))
        s[load].bind(tx, te.thread_axis("threadIdx.x"))

    # tune unroll
    s[output].pragma(kernel_scope, "auto_unroll_max_step", cfg["auto_unroll_max_step"].val)
    s[output].pragma(kernel_scope, "unroll_explicit", cfg["unroll_explicit"].val)

    return s, [raw_data, kernel, conv]

Step 2: Search through the space

We pick the last layer on resnet as test case. Since our space is very large, XGBoostTuner is most suitable for our case. Here we only do 20 trials for demonstration. In practice, making 1000 trials usually can find some good kernels for this template

# logging config (for printing tuning log to screen)
logging.getLogger("autotvm").setLevel(logging.DEBUG)
logging.getLogger("autotvm").addHandler(logging.StreamHandler(sys.stdout))

# the last layer in resnet
N, H, W, CO, CI, KH, KW, strides, padding = 1, 7, 7, 512, 512, 3, 3, (1, 1), (1, 1)
task = autotvm.task.create(
    "tutorial/conv2d_no_batching", args=(N, H, W, CO, CI, KH, KW, strides, padding), target="cuda"
)
print(task.config_space)

# Use local gpu, measure 10 times for every config to reduce variance
# The timeout of compiling a program is 10 seconds, the timeout for running is 4 seconds
measure_option = autotvm.measure_option(
    builder=autotvm.LocalBuilder(),
    runner=autotvm.LocalRunner(repeat=3, min_repeat_ms=100, timeout=4),
)

# Begin tuning, log records to file `conv2d.log`
# During tuning we will also try many invalid configs, so you are expected to
# see many error reports. As long as you can see non-zero GFLOPS, it is okay.
tuner = autotvm.tuner.XGBTuner(task)
tuner.tune(
    n_trial=20,
    measure_option=measure_option,
    callbacks=[autotvm.callback.log_to_file("conv2d.log")],
)
ConfigSpace (len=10454400, space_map=
   0 tile_f: Split(policy=factors, product=512, num_outputs=4) len=220
   1 tile_y: Split(policy=factors, product=7, num_outputs=4) len=4
   2 tile_x: Split(policy=factors, product=7, num_outputs=4) len=4
   3 tile_rc: Split(policy=factors, product=512, num_outputs=3) len=55
   4 tile_ry: Split(policy=factors, product=3, num_outputs=3) len=3
   5 tile_rx: Split(policy=factors, product=3, num_outputs=3) len=3
   6 auto_unroll_max_step: OtherOption([0, 512, 1500]) len=3
   7 unroll_explicit: OtherOption([0, 1]) len=2
)
waiting for device...
device available
Get devices for measurement successfully!
No: 1   GFLOPS: 0.00/0.00       result: Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
    func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
    func = build(s, args, target_host=task.target_host, runtime=runtime)
  File "/workspace/python/tvm/driver/build_module.py", line 228, in build
    input_mod = lower(inputs, args, name=name, binds=binds)
  File "/workspace/python/tvm/driver/build_module.py", line 134, in lower
    return ffi.lower_schedule(inp, args, name, binds, simple_mode)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 4, 4, 2]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 128, 2]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)],None,7999494
No: 2   GFLOPS: 0.00/0.00       result: Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
    func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
    func = build(s, args, target_host=task.target_host, runtime=runtime)
  File "/workspace/python/tvm/driver/build_module.py", line 228, in build
    input_mod = lower(inputs, args, name=name, binds=binds)
  File "/workspace/python/tvm/driver/build_module.py", line 134, in lower
    return ffi.lower_schedule(inp, args, name, binds, simple_mode)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 1, 8, 2]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 1, 64]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,5194279
No: 3   GFLOPS: 0.00/0.00       result: Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
    func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
    func = build(s, args, target_host=task.target_host, runtime=runtime)
  File "/workspace/python/tvm/driver/build_module.py", line 228, in build
    input_mod = lower(inputs, args, name=name, binds=binds)
  File "/workspace/python/tvm/driver/build_module.py", line 134, in lower
    return ffi.lower_schedule(inp, args, name, binds, simple_mode)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 8, 32, 1]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 2, 64]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9069983
No: 4   GFLOPS: 0.00/0.00       result: Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
    func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
    func = build(s, args, target_host=task.target_host, runtime=runtime)
  File "/workspace/python/tvm/driver/build_module.py", line 228, in build
    input_mod = lower(inputs, args, name=name, binds=binds)
  File "/workspace/python/tvm/driver/build_module.py", line 134, in lower
    return ffi.lower_schedule(inp, args, name, binds, simple_mode)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 16, 16, 1]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 16, 32]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,736818
No: 5   GFLOPS: 0.00/0.00       result: Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
    func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
    func = build(s, args, target_host=task.target_host, runtime=runtime)
  File "/workspace/python/tvm/driver/build_module.py", line 228, in build
    input_mod = lower(inputs, args, name=name, binds=binds)
  File "/workspace/python/tvm/driver/build_module.py", line 134, in lower
    return ffi.lower_schedule(inp, args, name, binds, simple_mode)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 4, 4, 32]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 1, 128]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2885496
No: 6   GFLOPS: 42.35/42.35     result: MeasureResult(costs=(0.005466644736842105,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.599562644958496, timestamp=1658865455.9685578)        [('tile_f', [-1, 1, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 4]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,3754080
No: 7   GFLOPS: 0.00/42.35      result: Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
    func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
    func = build(s, args, target_host=task.target_host, runtime=runtime)
  File "/workspace/python/tvm/driver/build_module.py", line 228, in build
    input_mod = lower(inputs, args, name=name, binds=binds)
  File "/workspace/python/tvm/driver/build_module.py", line 134, in lower
    return ffi.lower_schedule(inp, args, name, binds, simple_mode)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 1, 16, 32]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 256, 1]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6225319
No: 8   GFLOPS: 0.00/42.35      result: Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
    func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
    func = build(s, args, target_host=task.target_host, runtime=runtime)
  File "/workspace/python/tvm/driver/build_module.py", line 228, in build
    input_mod = lower(inputs, args, name=name, binds=binds)
  File "/workspace/python/tvm/driver/build_module.py", line 134, in lower
    return ffi.lower_schedule(inp, args, name, binds, simple_mode)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 2, 1, 32]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 8, 64]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,943546
No: 9   GFLOPS: 0.00/42.35      result: Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
    func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
    func = build(s, args, target_host=task.target_host, runtime=runtime)
  File "/workspace/python/tvm/driver/build_module.py", line 228, in build
    input_mod = lower(inputs, args, name=name, binds=binds)
  File "/workspace/python/tvm/driver/build_module.py", line 134, in lower
    return ffi.lower_schedule(inp, args, name, binds, simple_mode)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 4, 16, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 16, 32]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2868708
No: 10  GFLOPS: 0.00/42.35      result: Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 142, in build
    res = future.result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/workspace/python/tvm/contrib/popen_pool.py", line 404, in <lambda>
    worker = lambda *args: self._worker_run(*args)
  File "/workspace/python/tvm/contrib/popen_pool.py", line 373, in _worker_run
    return proc.recv()
  File "/workspace/python/tvm/contrib/popen_pool.py", line 297, in recv
    raise TimeoutError()
TimeoutError

        [('tile_f', [-1, 32, 2, 4]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 4, 2]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4691833
No: 11  GFLOPS: 0.00/42.35      result: Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
    func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
    func = build(s, args, target_host=task.target_host, runtime=runtime)
  File "/workspace/python/tvm/driver/build_module.py", line 228, in build
    input_mod = lower(inputs, args, name=name, binds=binds)
  File "/workspace/python/tvm/driver/build_module.py", line 134, in lower
    return ffi.lower_schedule(inp, args, name, binds, simple_mode)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 1, 2, 64]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 4, 4]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,1042124
No: 12  GFLOPS: 0.00/42.35      result: Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
    func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
    func = build(s, args, target_host=task.target_host, runtime=runtime)
  File "/workspace/python/tvm/driver/build_module.py", line 228, in build
    input_mod = lower(inputs, args, name=name, binds=binds)
  File "/workspace/python/tvm/driver/build_module.py", line 134, in lower
    return ffi.lower_schedule(inp, args, name, binds, simple_mode)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 32, 1, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 32, 16]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,10013405
No: 13  GFLOPS: 0.00/42.35      result: Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
    func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
    func = build(s, args, target_host=task.target_host, runtime=runtime)
  File "/workspace/python/tvm/driver/build_module.py", line 228, in build
    input_mod = lower(inputs, args, name=name, binds=binds)
  File "/workspace/python/tvm/driver/build_module.py", line 134, in lower
    return ffi.lower_schedule(inp, args, name, binds, simple_mode)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 8, 8, 2]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 32]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6732082
No: 14  GFLOPS: 0.00/42.35      result: Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
    func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
    func = build(s, args, target_host=task.target_host, runtime=runtime)
  File "/workspace/python/tvm/driver/build_module.py", line 228, in build
    input_mod = lower(inputs, args, name=name, binds=binds)
  File "/workspace/python/tvm/driver/build_module.py", line 134, in lower
    return ffi.lower_schedule(inp, args, name, binds, simple_mode)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 2, 4, 32]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 4, 128]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)],None,7536735
No: 15  GFLOPS: 0.00/42.35      result: Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
    func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
    func = build(s, args, target_host=task.target_host, runtime=runtime)
  File "/workspace/python/tvm/driver/build_module.py", line 228, in build
    input_mod = lower(inputs, args, name=name, binds=binds)
  File "/workspace/python/tvm/driver/build_module.py", line 134, in lower
    return ffi.lower_schedule(inp, args, name, binds, simple_mode)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 2, 1, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 128, 4]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,482121
No: 16  GFLOPS: 0.00/42.35      result: Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
    func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
    func = build(s, args, target_host=task.target_host, runtime=runtime)
  File "/workspace/python/tvm/driver/build_module.py", line 228, in build
    input_mod = lower(inputs, args, name=name, binds=binds)
  File "/workspace/python/tvm/driver/build_module.py", line 134, in lower
    return ffi.lower_schedule(inp, args, name, binds, simple_mode)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 2, 1, 16]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 32, 8]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2824525
No: 17  GFLOPS: 0.00/42.35      result: Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
    func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
    func = build(s, args, target_host=task.target_host, runtime=runtime)
  File "/workspace/python/tvm/driver/build_module.py", line 228, in build
    input_mod = lower(inputs, args, name=name, binds=binds)
  File "/workspace/python/tvm/driver/build_module.py", line 134, in lower
    return ffi.lower_schedule(inp, args, name, binds, simple_mode)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 64, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 8, 8]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4559286
No: 18  GFLOPS: 0.00/42.35      result: Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
    func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
    func = build(s, args, target_host=task.target_host, runtime=runtime)
  File "/workspace/python/tvm/driver/build_module.py", line 228, in build
    input_mod = lower(inputs, args, name=name, binds=binds)
  File "/workspace/python/tvm/driver/build_module.py", line 134, in lower
    return ffi.lower_schedule(inp, args, name, binds, simple_mode)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

Traceback (most recent call last):
  24: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  22: Call
        at ../include/tvm/runtime/packed_func.h:1213
  21: operator()
        at ../include/tvm/runtime/packed_func.h:1731
  20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
        at ../include/tvm/runtime/packed_func.h:1671
  19: run<>
        at ../include/tvm/runtime/packed_func.h:1631
  18: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1631
  14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at ../include/tvm/runtime/packed_func.h:1646
  13: operator()
        at ../src/driver/driver_api.cc:391
  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
        at ../src/driver/driver_api.cc:378
  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
        at ../src/driver/driver_api.cc:278
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
        at ../src/ir/transform.cc:258
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:453
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/ir/transform.cc:274
  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at ../src/tir/ir/transform.cc:100
  5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
        at ../include/tvm/runtime/packed_func.h:1750
  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
        at ../include/tvm/runtime/packed_func.h:1694
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/c_runtime_api.cc:534
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 1, 32, 16]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 1, 512]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9677544
No: 19  GFLOPS: 0.00/42.35      result: Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 738, in __call__
    yield remote, remote.load_module(os.path.split(build_result.filename)[1])
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 702, in run_through_rpc
    costs = time_f(*args).results
  File "/workspace/python/tvm/runtime/module.py", line 351, in evaluator
    blob = feval(*args)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 262, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 251, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  4: TVMFuncCall
        at ../src/runtime/c_runtime_api.cc:477
  3: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  2: tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../src/runtime/rpc/rpc_module.cc:129
  1: tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)
        at ../src/runtime/rpc/rpc_endpoint.cc:1009
  0: tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)>)
        at ../src/runtime/rpc/rpc_endpoint.cc:801
  File "../src/runtime/rpc/rpc_endpoint.cc", line 801
TVMError:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (code == RPCCode::kReturn) is false: code=kShutdown

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 702, in run_through_rpc
    costs = time_f(*args).results
  File "/usr/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 742, in __call__
    remote.remove(build_result.filename)
  File "/workspace/python/tvm/rpc/client.py", line 143, in remove
    self._remote_funcs["remove"] = self.get_function("tvm.rpc.server.remove")
  File "/workspace/python/tvm/rpc/client.py", line 71, in get_function
    return self._sess.get_function(name)
  File "/workspace/python/tvm/runtime/module.py", line 171, in get_function
    self.handle, c_str(name), ctypes.c_int(query_imports), ctypes.byref(ret_handle)
  File "/workspace/python/tvm/_ffi/base.py", line 348, in check_call
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  52: 0xffffffffffffffff
  51: _start
  50: __libc_start_main
  49: _Py_UnixMain
  48: 0x0000000000650da0
  47: 0x0000000000650afa
  46: _PyFunction_FastCallDict
  45: _PyEval_EvalCodeWithName
  44: _PyEval_EvalFrameDefault
  43: _PyFunction_FastCallKeywords
  42: _PyEval_EvalCodeWithName
  41: _PyEval_EvalFrameDefault
  40: _PyMethodDef_RawFastCallKeywords
  39: 0x0000000000546369
  38: _PyEval_EvalCodeWithName
  37: _PyEval_EvalFrameDefault
  36: _PyFunction_FastCallKeywords
  35: _PyEval_EvalCodeWithName
  34: _PyEval_EvalFrameDefault
  33: _PyFunction_FastCallDict
  32: _PyEval_EvalCodeWithName
  31: _PyEval_EvalFrameDefault
  30: _PyObject_FastCallDict
  29: 0x00000000004c06e1
  28: _PyFunction_FastCallDict
  27: _PyEval_EvalFrameDefault
  26: _PyMethodDescr_FastCallKeywords
  25: 0x00000000005dcb58
  24: 0x00000000005dc83f
  23: 0x00000000004ba127
  22: _PyEval_EvalFrameDefault
  21: _PyFunction_FastCallKeywords
  20: _PyEval_EvalFrameDefault
  19: _PyFunction_FastCallKeywords
  18: _PyEval_EvalFrameDefault
  17: _PyFunction_FastCallKeywords
  16: _PyEval_EvalCodeWithName
  15: _PyEval_EvalFrameDefault
  14: 0x0000000000537c30
  13: _PyObject_FastCallKeywords
  12: 0x00007f517fe40fa2
  11: _ctypes_callproc
  10: ffi_call
  9: ffi_call_unix64
  8: TVMModGetFunction
        at ../src/runtime/c_runtime_api.cc:408
  7: tvm::runtime::ModuleNode::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
        at ../src/runtime/module.cc:66
  6: tvm::runtime::RPCModuleNode::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)
        at ../src/runtime/rpc/rpc_module.cc:181
  5: tvm::runtime::RPCClientSession::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
        at ../src/runtime/rpc/rpc_endpoint.cc:1004
  4: tvm::runtime::TVMRetValue tvm::runtime::RPCEndpoint::SysCallRemote<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(tvm::runtime::RPCCode, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
        at ../src/runtime/rpc/rpc_endpoint.h:211
  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(int&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const
        at ../include/tvm/runtime/packed_func.h:1618
  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at ../include/tvm/runtime/packed_func.h:1217
  1: Call
        at ../include/tvm/runtime/packed_func.h:1213
  0: operator()
        at ../src/runtime/rpc/rpc_endpoint.cc:681
  File "../src/runtime/rpc/rpc_endpoint.cc", line 681
TVMError:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (code == RPCCode::kReturn) is false: code=1

Traceback (most recent call last):
  52: 0xffffffffffffffff
  51: _start
  50: __libc_start_main
  49: _Py_UnixMain
  48: 0x0000000000650da0
  47: 0x0000000000650afa
  46: _PyFunction_FastCallDict
  45: _PyEval_EvalCodeWithName
  44: _PyEval_EvalFrameDefault
  43: _PyFunction_FastCallKeywords
  42: _PyEval_EvalCodeWithName
  41: _PyEval_EvalFrameDefault
  40: _PyMethodDef_RawFastCallKeywords
  39: 0x0000000000546369
  38: _PyEval_EvalCodeWithName
  37: _PyEval_EvalFrameDefault
  36: _PyFunction_FastCallKeywords
  35: _PyEval_EvalCodeWithName
  34: _PyEval_EvalFrameDefault
  33: _PyFunction_FastCallDict
  32: _PyEval_EvalCodeWithName
  31: _PyEval_EvalFrameDefault
  30: _PyObject_FastCallDict
  29: 0x00000000004c06e1
  28: _PyFunction_FastCallDict
  27: _PyEval_EvalFrameDefault
  26: _PyMethodDescr_FastCallKeywords
  25: 0x00000000005dcb58
  24: 0x00000000005dc83f
  23: 0x00000000004ba127
  22: _PyEval_EvalFrameDefault
  21: _PyFunction_FastCallKeywords
  20: _PyEval_EvalFrameDefault
  19: _PyFunction_FastCall      [('tile_f', [-1, 8, 2, 16]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 1, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6390073
No: 20  GFLOPS: 144.22/144.22   result: MeasureResult(costs=(0.00160524616,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.4239659309387207, timestamp=1658865481.9063) [('tile_f', [-1, 1, 4, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9881539

Finally we can inspect the best config from log file, check correctness, and measure running time.

# inspect the best config
dispatch_context = autotvm.apply_history_best("conv2d.log")
best_config = dispatch_context.query(task.target, task.workload)
print("\nBest config:")
print(best_config)

# apply history best from log file
with autotvm.apply_history_best("conv2d.log"):
    with tvm.target.Target("cuda"):
        s, arg_bufs = conv2d_no_batching(N, H, W, CO, CI, KH, KW, strides, padding)
        func = tvm.build(s, arg_bufs)

# check correctness
a_np = np.random.uniform(size=(N, CI, H, W)).astype(np.float32)
w_np = np.random.uniform(size=(CO, CI, KH, KW)).astype(np.float32)
c_np = conv2d_nchw_python(a_np, w_np, strides, padding)

dev = tvm.cuda()
a_tvm = tvm.nd.array(a_np, device=dev)
w_tvm = tvm.nd.array(w_np, device=dev)
c_tvm = tvm.nd.empty(c_np.shape, device=dev)
func(a_tvm, w_tvm, c_tvm)

tvm.testing.assert_allclose(c_np, c_tvm.numpy(), rtol=1e-2)

# Evaluate running time. Here we choose a large repeat number (400) to reduce the noise
# and the overhead of kernel launch. You can also use nvprof to validate the result.
evaluator = func.time_evaluator(func.entry_name, dev, number=400)
print("Time cost of this operator: %f" % evaluator(a_tvm, w_tvm, c_tvm).mean)
Finish loading 20 records

Best config:
[('tile_f', [-1, 1, 4, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9881539
Finish loading 20 records
Time cost of this operator: 0.002014

Gallery generated by Sphinx-Gallery