Operators¶
torchvision.ops implements operators, losses and layers that are specific for Computer Vision.
Note
All operators have native support for TorchScript.
Detection and Segmentation Operators¶
The below operators perform pre-processing as well as post-processing required in object detection and segmentation models.
  | 
Performs non-maximum suppression in a batched fashion.  | 
  | 
Compute the bounding boxes around the provided masks.  | 
  | 
Performs non-maximum suppression (NMS) on the boxes according to their intersection-over-union (IoU).  | 
  | 
Performs Region of Interest (RoI) Align operator with average pooling, as described in Mask R-CNN.  | 
  | 
Performs Region of Interest (RoI) Pool operator described in Fast R-CNN  | 
  | 
Performs Position-Sensitive Region of Interest (RoI) Align operator mentioned in Light-Head R-CNN.  | 
  | 
Performs Position-Sensitive Region of Interest (RoI) Pool operator described in R-FCN  | 
  | 
Module that adds a FPN from on top of a set of feature maps.  | 
  | 
Multi-scale RoIAlign pooling, which is useful for detection with or without FPN.  | 
  | 
See   | 
  | 
See   | 
  | 
See   | 
  | 
See   | 
Box Operators¶
These utility functions perform various operations on bounding boxes.
  | 
Computes the area of a set of bounding boxes, which are specified by their (x1, y1, x2, y2) coordinates.  | 
  | 
Converts boxes from given in_fmt to out_fmt.  | 
  | 
Return intersection-over-union (Jaccard index) between two sets of boxes.  | 
  | 
Clip boxes so that they lie inside an image of size size.  | 
  | 
Return complete intersection-over-union (Jaccard index) between two sets of boxes.  | 
  | 
Return distance intersection-over-union (Jaccard index) between two sets of boxes.  | 
  | 
Return generalized intersection-over-union (Jaccard index) between two sets of boxes.  | 
  | 
Remove boxes which contains at least one side smaller than min_size.  | 
Losses¶
The following vision-specific loss functions are implemented:
  | 
Gradient-friendly IoU loss with an additional penalty that is non-zero when the boxes do not overlap.  | 
  | 
Gradient-friendly IoU loss with an additional penalty that is non-zero when the distance between boxes’ centers isn’t zero.  | 
  | 
Gradient-friendly IoU loss with an additional penalty that is non-zero when the boxes do not overlap and scales with the size of their smallest enclosing box.  | 
  | 
Loss used in RetinaNet for dense detection: https://arxiv.org/abs/1708.02002.  | 
Layers¶
TorchVision provides commonly used building blocks as layers:
  | 
Configurable block used for Convolution2d-Normalization-Activation blocks.  | 
  | 
Configurable block used for Convolution3d-Normalization-Activation blocks.  | 
  | 
See   | 
  | 
See   | 
  | 
See   | 
  | 
BatchNorm2d where the batch statistics and the affine parameters are fixed  | 
  | 
This block implements the multi-layer perceptron (MLP) module.  | 
  | 
This module returns a view of the tensor input with its dimensions permuted.  | 
  | 
This block implements the Squeeze-and-Excitation block from https://arxiv.org/abs/1709.01507 (see Fig.  | 
  | 
See   | 
  | 
Performs Deformable Convolution v2, described in Deformable ConvNets v2: More Deformable, Better Results if   | 
  | 
Implements DropBlock2d from “DropBlock: A regularization method for convolutional networks” <https://arxiv.org/abs/1810.12890>.  | 
  | 
Implements DropBlock3d from “DropBlock: A regularization method for convolutional networks” <https://arxiv.org/abs/1810.12890>.  | 
  | 
Implements the Stochastic Depth from “Deep Networks with Stochastic Depth” used for randomly dropping residual branches of residual architectures.  |