CocoCaptions¶
- class torchvision.datasets.CocoCaptions(root: str, annFile: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, transforms: Optional[Callable] = None)[source]¶
- MS Coco Captions Dataset. - It requires the COCO API to be installed. - Parameters:
- root (string) – Root directory where images are downloaded to. 
- annFile (string) – Path to json annotation file. 
- transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, - transforms.PILToTensor
- target_transform (callable, optional) – A function/transform that takes in the target and transforms it. 
- transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version. 
 
 - Example - import torchvision.datasets as dset import torchvision.transforms as transforms cap = dset.CocoCaptions(root = 'dir where images are', annFile = 'json annotation file', transform=transforms.PILToTensor()) print('Number of samples: ', len(cap)) img, target = cap[3] # load 4th sample print("Image Size: ", img.size()) print(target) - Output: - Number of samples: 82783 Image Size: (3L, 427L, 640L) [u'A plane emitting smoke stream flying over a mountain.', u'A plane darts across a bright blue sky behind a mountain covered in snow', u'A plane leaves a contrail above the snowy mountain top.', u'A mountain that has a plane flying overheard in the distance.', u'A mountain view with a plume of smoke in the background'] - Special-members: