tvm.relay.vision

Vision network related operators.

Functions

get_valid_counts(data, score_threshold[, …])

Get valid count of bounding boxes given a score threshold.

multibox_prior(data[, sizes, ratios, steps, …])

Generate prior(anchor) boxes from data, sizes and ratios.

multibox_transform_loc(cls_prob, loc_pred, …)

Location transformation for multibox detection

non_max_suppression(data, valid_count[, …])

Non-maximum suppression operator for object detection.

proposal(cls_prob, bbox_pred, im_info, …)

Proposal operator.

roi_align(data, rois, pooled_size, spatial_scale)

ROI align operator.

roi_pool(data, rois, pooled_size, spatial_scale)

ROI pool operator.

yolo_reorg(data, stride)

Yolo reorg operation used in darknet models.

tvm.relay.vision.get_valid_counts(data, score_threshold, id_index=0, score_index=1)

Get valid count of bounding boxes given a score threshold. Also moves valid boxes to the top of input data.

Parameters
  • data (relay.Expr) – Input data. 3-D tensor with shape [batch_size, num_anchors, 6].

  • score_threshold (optional, float) – Lower limit of score for valid bounding boxes.

  • id_index (optional, int) – index of the class categories, -1 to disable.

  • score_index (optional, int) – Index of the scores/confidence of boxes.

Returns

  • valid_count (relay.Expr) – 1-D tensor for valid number of boxes.

  • out_tensor (relay.Expr) – Rearranged data tensor.

tvm.relay.vision.multibox_prior(data, sizes=1.0, ratios=1.0, steps=- 1.0, - 1.0, offsets=0.5, 0.5, clip=False)

Generate prior(anchor) boxes from data, sizes and ratios.

Parameters
  • data (relay.Expr) – The input data tensor.

  • sizes (tuple of float, optional) – Tuple of sizes for anchor boxes.

  • ratios (tuple of float, optional) – Tuple of ratios for anchor boxes.

  • steps (Tuple of float, optional) – Priorbox step across y and x, -1 for auto calculation.

  • offsets (tuple of int, optional) – Priorbox center offsets, y and x respectively.

  • clip (boolean, optional) – Whether to clip out-of-boundary boxes.

Returns

out – 3-D tensor with shape [1, h_in * w_in * (num_sizes + num_ratios - 1), 4]

Return type

relay.Expr

tvm.relay.vision.multibox_transform_loc(cls_prob, loc_pred, anchor, clip=True, threshold=0.01, variances=0.1, 0.1, 0.2, 0.2)

Location transformation for multibox detection

Parameters
  • cls_prob (tvm.relay.Expr) – Class probabilities.

  • loc_pred (tvm.relay.Expr) – Location regression predictions.

  • anchor (tvm.relay.Expr) – Prior anchor boxes.

  • clip (boolean, optional) – Whether to clip out-of-boundary boxes.

  • threshold (double, optional) – Threshold to be a positive prediction.

  • variances (Tuple of float, optional) – variances to be decoded from box regression output.

Returns

ret

Return type

tuple of tvm.relay.Expr

tvm.relay.vision.non_max_suppression(data, valid_count, max_output_size=- 1, iou_threshold=0.5, force_suppress=False, top_k=- 1, coord_start=2, score_index=1, id_index=0, return_indices=True, invalid_to_bottom=False)

Non-maximum suppression operator for object detection.

Parameters
  • data (relay.Expr) – 3-D tensor with shape [batch_size, num_anchors, 6]. The last dimension should be in format of [class_id, score, box_left, box_top, box_right, box_bottom].

  • valid_count (relay.Expr) – 1-D tensor for valid number of boxes.

  • max_output_size (int, optional) – Max number of output valid boxes for each instance. By default all valid boxes are returned.

  • iou_threshold (float, optional) – Non-maximum suppression threshold.

  • force_suppress (bool, optional) – Suppress all detections regardless of class_id.

  • top_k (int, optional) – Keep maximum top k detections before nms, -1 for no limit.

  • coord_start (int, optional) – The starting index of the consecutive 4 coordinates.

  • score_index (int, optional) – Index of the scores/confidence of boxes.

  • id_index (int, optional) – index of the class categories, -1 to disable.

  • return_indices (bool, optional) – Whether to return box indices in input data.

  • invalid_to_bottom (bool, optional) – Whether to move all valid bounding boxes to the top.

Returns

out – 3-D tensor with shape [batch_size, num_anchors, 6].

Return type

relay.Expr

tvm.relay.vision.proposal(cls_prob, bbox_pred, im_info, scales, ratios, feature_stride, threshold, rpn_pre_nms_top_n, rpn_post_nms_top_n, rpn_min_size, iou_loss)

Proposal operator.

Parameters
  • cls_prob (relay.Expr) – 4-D tensor with shape [batch, 2 * num_anchors, height, width].

  • bbox_pred (relay.Expr) – 4-D tensor with shape [batch, 4 * num_anchors, height, width].

  • im_info (relay.Expr) – 2-D tensor with shape [batch, 3]. The last dimension should be in format of [im_height, im_width, im_scale]

  • scales (list/tuple of float) – Scales of anchor windoes.

  • ratios (list/tuple of float) – Ratios of anchor windoes.

  • feature_stride (int) – The size of the receptive field each unit in the convolution layer of the rpn, for example the product of all stride’s prior to this layer.

  • threshold (float) – Non-maximum suppression threshold.

  • rpn_pre_nms_top_n (int) – Number of top scoring boxes to apply NMS. -1 to use all boxes.

  • rpn_post_nms_top_n (int) – Number of top scoring boxes to keep after applying NMS to RPN proposals.

  • rpn_min_size (int) – Minimum height or width in proposal.

  • iou_loss (bool) – Usage of IoU loss.

Returns

output – 2-D tensor with shape [batch * rpn_post_nms_top_n, 5]. The last dimension is in format of [batch_index, w_start, h_start, w_end, h_end].

Return type

relay.Expr

tvm.relay.vision.roi_align(data, rois, pooled_size, spatial_scale, sample_ratio=- 1, layout='NCHW')

ROI align operator.

Parameters
  • data (relay.Expr) – 4-D tensor with shape [batch, channel, height, width]

  • rois (relay.Expr) – 2-D tensor with shape [num_roi, 5]. The last dimension should be in format of [batch_index, w_start, h_start, w_end, h_end]

  • pooled_size (list/tuple of two ints) – output size

  • spatial_scale (float) – Ratio of input feature map height (or w) to raw image height (or w). Equals the reciprocal of total stride in convolutional layers, which should be in range (0.0, 1.0]

  • sample_ratio (int) – Optional sampling ratio of ROI align, using adaptive size by default.

Returns

output – 4-D tensor with shape [num_roi, channel, pooled_size, pooled_size]

Return type

relay.Expr

tvm.relay.vision.roi_pool(data, rois, pooled_size, spatial_scale, layout='NCHW')

ROI pool operator.

Parameters
  • data (relay.Expr) – 4-D tensor with shape [batch, channel, height, width]

  • rois (relay.Expr) – 2-D tensor with shape [num_roi, 5]. The last dimension should be in format of [batch_index, w_start, h_start, w_end, h_end]

  • pooled_size (list/tuple of two ints) – output size

  • spatial_scale (float) – Ratio of input feature map height (or w) to raw image height (or w). Equals the reciprocal of total stride in convolutional layers, which should be in range (0.0, 1.0]

Returns

output – 4-D tensor with shape [num_roi, channel, pooled_size, pooled_size]

Return type

relay.Expr

tvm.relay.vision.yolo_reorg(data, stride)

Yolo reorg operation used in darknet models. This layer shuffles the input tensor values based on the stride value. Along with the shuffling, it does the shape transform. If ‘(n, c, h, w)’ is the data shape and ‘s’ is stride, output shape is ‘(n, c*s*s, h/s, w/s)’.

Example:

data(1, 4, 2, 2) = [[[[ 0  1] [ 2  3]]
                    [[ 4  5] [ 6  7]]
                    [[ 8  9] [10 11]]
                    [[12 13] [14 15]]]]
stride = 2
ret(1, 16, 1, 1) = [[[[ 0]]  [[ 2]]  [[ 8]]  [[10]]
                    [[ 1]]  [[ 3]]  [[ 9]]  [[11]]
                    [[ 4]]  [[ 6]]  [[12]]  [[14]]
                    [[ 5]]  [[ 7]]  [[13]]  [[15]]]]

Note

stride=1 has no significance for reorg operation.

Parameters
  • data (relay.Expr) – The input data tensor.

  • stride (int) – The stride value for reorganisation.

Returns

ret – The computed result.

Return type

relay.Expr