Shortcuts

torchvision.ops

torchvision.ops implements operators that are specific for Computer Vision.

Note

Those operators currently do not support TorchScript.

torchvision.ops.nms(boxes, scores, iou_threshold)[source]

Performs non-maximum suppression (NMS) on the boxes according to their intersection-over-union (IoU).

NMS iteratively removes lower scoring boxes which have an IoU greater than iou_threshold with another (higher scoring) box.

Parameters
  • boxes (Tensor[N, 4])) – boxes to perform NMS on. They are expected to be in (x1, y1, x2, y2) format

  • scores (Tensor[N]) – scores for each one of the boxes

  • iou_threshold (python:float) – discards all overlapping boxes with IoU > iou_threshold

Returns

keep – int64 tensor with the indices of the elements that have been kept by NMS, sorted in decreasing order of scores

Return type

Tensor

torchvision.ops.roi_align(input, boxes, output_size, spatial_scale=1.0, sampling_ratio=-1)[source]

Performs Region of Interest (RoI) Align operator described in Mask R-CNN

Parameters
  • input (Tensor[N, C, H, W]) – input tensor

  • boxes (Tensor[K, 5] or List[Tensor[L, 4]]) – the box coordinates in (x1, y1, x2, y2) format where the regions will be taken from. If a single Tensor is passed, then the first column should contain the batch index. If a list of Tensors is passed, then each Tensor will correspond to the boxes for an element i in a batch

  • output_size (python:int or Tuple[python:int, python:int]) – the size of the output after the cropping is performed, as (height, width)

  • spatial_scale (python:float) – a scaling factor that maps the input coordinates to the box coordinates. Default: 1.0

  • sampling_ratio (python:int) – number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio grid points are used. If <= 0, then an adaptive number of grid points are used (computed as ceil(roi_width / pooled_w), and likewise for height). Default: -1

Returns

output (Tensor[K, C, output_size[0], output_size[1]])

torchvision.ops.roi_pool(input, boxes, output_size, spatial_scale=1.0)[source]

Performs Region of Interest (RoI) Pool operator described in Fast R-CNN

Parameters
  • input (Tensor[N, C, H, W]) – input tensor

  • boxes (Tensor[K, 5] or List[Tensor[L, 4]]) – the box coordinates in (x1, y1, x2, y2) format where the regions will be taken from. If a single Tensor is passed, then the first column should contain the batch index. If a list of Tensors is passed, then each Tensor will correspond to the boxes for an element i in a batch

  • output_size (python:int or Tuple[python:int, python:int]) – the size of the output after the cropping is performed, as (height, width)

  • spatial_scale (python:float) – a scaling factor that maps the input coordinates to the box coordinates. Default: 1.0

Returns

output (Tensor[K, C, output_size[0], output_size[1]])

class torchvision.ops.RoIAlign(output_size, spatial_scale, sampling_ratio)[source]

See roi_align

class torchvision.ops.RoIPool(output_size, spatial_scale)[source]

See roi_pool

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources