tvm.relay.vision¶
Vision network related operators.
Functions:
|
Non-maximum suppression operator for object detection, corresponding to ONNX NonMaxSuppression and TensorFlow combined_non_max_suppression. |
|
Get valid count of bounding boxes given a score threshold. |
|
Generate prior(anchor) boxes from data, sizes and ratios. |
|
Location transformation for multibox detection |
|
Non-maximum suppression operator for object detection. |
|
Proposal operator. |
|
ROI align operator. |
|
ROI pool operator. |
|
Yolo reorg operation used in darknet models. |
- tvm.relay.vision.all_class_non_max_suppression(boxes, scores, max_output_boxes_per_class=- 1, iou_threshold=- 1.0, score_threshold=- 1.0, output_format='onnx')¶
Non-maximum suppression operator for object detection, corresponding to ONNX NonMaxSuppression and TensorFlow combined_non_max_suppression. NMS is performed for each class separately.
- Parameters
boxes (relay.Expr) – 3-D tensor with shape (batch_size, num_boxes, 4)
scores (relay.Expr) – 3-D tensor with shape (batch_size, num_classes, num_boxes)
max_output_boxes_per_class (int or relay.Expr, optional) – The maxinum number of output selected boxes per class
iou_threshold (float or relay.Expr, optionaIl) – IoU test threshold
score_threshold (float or relay.Expr, optional) – Score threshold to filter out low score boxes early
output_format (string, optional) – “onnx” or “tensorflow”. Specify by which frontends the outputs are intented to be consumed.
- Returns
out – If output_format is “onnx”, the output is a relay.Tuple of two tensors, the first is indices of size (batch_size * num_class* num_boxes , 3) and the second is a scalar tensor num_total_detection of shape (1,) representing the total number of selected boxes. The three values in indices encode batch, class, and box indices. Rows of indices are ordered such that selected boxes from batch 0, class 0 come first, in descending of scores, followed by boxes from batch 0, class 1 etc. Out of batch_size * num_class* num_boxes rows of indices, only the first num_total_detection rows are valid.
If output_format is “tensorflow”, the output is a relay.Tuple of three tensors, the first is indices of size (batch_size, num_class * num_boxes , 2), the second is scores of size (batch_size, num_class * num_boxes), and the third is num_total_detection of size (batch_size,) representing the total number of selected boxes per batch. The two values in indices encode class and box indices. Of num_class * num_boxes boxes in indices at batch b, only the first num_total_detection[b] entries are valid. The second axis of indices and scores are sorted within each class by box scores, but not across classes. So the box indices and scores for the class 0 come first in a sorted order, followed by the class 1 etc.
- Return type
relay.Tuple
- tvm.relay.vision.get_valid_counts(data, score_threshold, id_index=0, score_index=1)¶
Get valid count of bounding boxes given a score threshold. Also moves valid boxes to the top of input data.
- Parameters
data (relay.Expr) – Input data. 3-D tensor with shape [batch_size, num_anchors, 6].
score_threshold (optional, float) – Lower limit of score for valid bounding boxes.
id_index (optional, int) – index of the class categories, -1 to disable.
score_index (optional, int) – Index of the scores/confidence of boxes.
- Returns
valid_count (relay.Expr) – 1-D tensor for valid number of boxes.
out_tensor (relay.Expr) – Rearranged data tensor.
out_indices (relay.Expr) – Indices in input data
- tvm.relay.vision.multibox_prior(data, sizes=(1.0,), ratios=(1.0,), steps=(- 1.0, - 1.0), offsets=(0.5, 0.5), clip=False)¶
Generate prior(anchor) boxes from data, sizes and ratios.
- Parameters
data (relay.Expr) – The input data tensor.
sizes (tuple of float, optional) – Tuple of sizes for anchor boxes.
ratios (tuple of float, optional) – Tuple of ratios for anchor boxes.
steps (Tuple of float, optional) – Priorbox step across y and x, -1 for auto calculation.
offsets (tuple of int, optional) – Priorbox center offsets, y and x respectively.
clip (boolean, optional) – Whether to clip out-of-boundary boxes.
- Returns
out – 3-D tensor with shape [1, h_in * w_in * (num_sizes + num_ratios - 1), 4]
- Return type
relay.Expr
- tvm.relay.vision.multibox_transform_loc(cls_prob, loc_pred, anchor, clip=True, threshold=0.01, variances=(0.1, 0.1, 0.2, 0.2))¶
Location transformation for multibox detection
- Parameters
cls_prob (tvm.relay.Expr) – Class probabilities.
loc_pred (tvm.relay.Expr) – Location regression predictions.
anchor (tvm.relay.Expr) – Prior anchor boxes.
clip (boolean, optional) – Whether to clip out-of-boundary boxes.
threshold (double, optional) – Threshold to be a positive prediction.
variances (Tuple of float, optional) – variances to be decoded from box regression output.
- Returns
ret
- Return type
tuple of tvm.relay.Expr
- tvm.relay.vision.non_max_suppression(data, valid_count, indices, max_output_size=- 1, iou_threshold=0.5, force_suppress=False, top_k=- 1, coord_start=2, score_index=1, id_index=0, return_indices=True, invalid_to_bottom=False)¶
Non-maximum suppression operator for object detection.
- Parameters
data (relay.Expr) – 3-D tensor with shape [batch_size, num_anchors, 6] or [batch_size, num_anchors, 5]. The last dimension should be in format of [class_id, score, box_left, box_top, box_right, box_bottom] or [score, box_left, box_top, box_right, box_bottom]. It could be the second output out_tensor of get_valid_counts.
valid_count (relay.Expr) – 1-D tensor for valid number of boxes. It could be the output valid_count of get_valid_counts.
indices (relay.Expr) – 2-D tensor with shape [batch_size, num_anchors], represents the index of box in original data. It could be the third output out_indices of get_valid_counts. The values in the second dimension are like the output of arange(num_anchors) if get_valid_counts is not used before non_max_suppression.
max_output_size (int or relay.Expr, optional) – Max number of output valid boxes for each instance. Return all valid boxes if the value of max_output_size is less than 0.
iou_threshold (float or relay.Expr, optional) – Non-maximum suppression threshold.
force_suppress (bool, optional) – Suppress all detections regardless of class_id.
top_k (int, optional) – Keep maximum top k detections before nms, -1 for no limit.
coord_start (int, optional) – The starting index of the consecutive 4 coordinates.
score_index (int, optional) – Index of the scores/confidence of boxes.
id_index (int, optional) – index of the class categories, -1 to disable.
return_indices (bool, optional) – Whether to return box indices in input data.
invalid_to_bottom (bool, optional) – Whether to move all valid bounding boxes to the top.
- Returns
out – return relay.Expr if return_indices is disabled, a 3-D tensor with shape [batch_size, num_anchors, 6] or [batch_size, num_anchors, 5]. If return_indices is True, return relay.Tuple of two 2-D tensors, with shape [batch_size, num_anchors] and [batch_size, num_valid_anchors] respectively.
- Return type
relay.Expr or relay.Tuple
- tvm.relay.vision.proposal(cls_prob, bbox_pred, im_info, scales, ratios, feature_stride, threshold, rpn_pre_nms_top_n, rpn_post_nms_top_n, rpn_min_size, iou_loss)¶
Proposal operator.
- Parameters
cls_prob (relay.Expr) – 4-D tensor with shape [batch, 2 * num_anchors, height, width].
bbox_pred (relay.Expr) – 4-D tensor with shape [batch, 4 * num_anchors, height, width].
im_info (relay.Expr) – 2-D tensor with shape [batch, 3]. The last dimension should be in format of [im_height, im_width, im_scale]
scales (list/tuple of float) – Scales of anchor windows.
ratios (list/tuple of float) – Ratios of anchor windows.
feature_stride (int) – The size of the receptive field each unit in the convolution layer of the rpn, for example the product of all stride’s prior to this layer.
threshold (float) – Non-maximum suppression threshold.
rpn_pre_nms_top_n (int) – Number of top scoring boxes to apply NMS. -1 to use all boxes.
rpn_post_nms_top_n (int) – Number of top scoring boxes to keep after applying NMS to RPN proposals.
rpn_min_size (int) – Minimum height or width in proposal.
iou_loss (bool) – Usage of IoU loss.
- Returns
output – 2-D tensor with shape [batch * rpn_post_nms_top_n, 5]. The last dimension is in format of [batch_index, w_start, h_start, w_end, h_end].
- Return type
relay.Expr
- tvm.relay.vision.roi_align(data, rois, pooled_size, spatial_scale, sample_ratio=- 1, layout='NCHW', mode='avg')¶
ROI align operator.
- Parameters
data (relay.Expr) – 4-D tensor with shape [batch, channel, height, width]
rois (relay.Expr) – 2-D tensor with shape [num_roi, 5]. The last dimension should be in format of [batch_index, w_start, h_start, w_end, h_end]
pooled_size (list/tuple of two ints) – output size
spatial_scale (float) – Ratio of input feature map height (or w) to raw image height (or w). Equals the reciprocal of total stride in convolutional layers, which should be in range (0.0, 1.0]
sample_ratio (int) – Optional sampling ratio of ROI align, using adaptive size by default.
mode (str, Optional) – The pooling method. Relay supports two methods, ‘avg’ and ‘max’. Default is ‘avg’.
- Returns
output – 4-D tensor with shape [num_roi, channel, pooled_size, pooled_size]
- Return type
relay.Expr
- tvm.relay.vision.roi_pool(data, rois, pooled_size, spatial_scale, layout='NCHW')¶
ROI pool operator.
- Parameters
data (relay.Expr) – 4-D tensor with shape [batch, channel, height, width]
rois (relay.Expr) – 2-D tensor with shape [num_roi, 5]. The last dimension should be in format of [batch_index, w_start, h_start, w_end, h_end]
pooled_size (list/tuple of two ints) – output size
spatial_scale (float) – Ratio of input feature map height (or w) to raw image height (or w). Equals the reciprocal of total stride in convolutional layers, which should be in range (0.0, 1.0]
- Returns
output – 4-D tensor with shape [num_roi, channel, pooled_size, pooled_size]
- Return type
relay.Expr
- tvm.relay.vision.yolo_reorg(data, stride)¶
Yolo reorg operation used in darknet models. This layer shuffles the input tensor values based on the stride value. Along with the shuffling, it does the shape transform. If ‘(n, c, h, w)’ is the data shape and ‘s’ is stride, output shape is ‘(n, c*s*s, h/s, w/s)’.
Example:
data(1, 4, 2, 2) = [[[[ 0 1] [ 2 3]] [[ 4 5] [ 6 7]] [[ 8 9] [10 11]] [[12 13] [14 15]]]] stride = 2 ret(1, 16, 1, 1) = [[[[ 0]] [[ 2]] [[ 8]] [[10]] [[ 1]] [[ 3]] [[ 9]] [[11]] [[ 4]] [[ 6]] [[12]] [[14]] [[ 5]] [[ 7]] [[13]] [[15]]]]
Note
stride=1 has no significance for reorg operation.
- Parameters
data (relay.Expr) – The input data tensor.
stride (int) – The stride value for reorganisation.
- Returns
ret – The computed result.
- Return type
relay.Expr