torchaudio.datasets¶
All datasets are subclasses of torch.utils.data.Dataset
and have __getitem__ and __len__ methods implemented.
Hence, they can all be passed to a torch.utils.data.DataLoader
which can load multiple samples parallelly using torch.multiprocessing workers.
For example:
yesno_data = torchaudio.datasets.YESNO('.', download=True)
data_loader = torch.utils.data.DataLoader(
    yesno_data,
    batch_size=1,
    shuffle=True,
    num_workers=args.nThreads)
CMU ARCTIC [Kominek et al., 2003] dataset.  | 
|
CMU Pronouncing Dictionary [Weide, 1998] (CMUDict) dataset.  | 
|
CommonVoice [Ardila et al., 2020] dataset.  | 
|
Device Recorded VCTK (Small subset version) [Sarfjoo and Yamagishi, 2018] dataset.  | 
|
Fluent Speech Commands [Lugosch et al., 2019] dataset  | 
|
GTZAN [Tzanetakis et al., 2001] dataset.  | 
|
IEMOCAP [Busso et al., 2008] dataset.  | 
|
LibriMix [Cosentino et al., 2020] dataset.  | 
|
LibriSpeech [Panayotov et al., 2015] dataset.  | 
|
Subset of Libri-light [Kahn et al., 2020] dataset, which was used in HuBERT [Hsu et al., 2021] for supervised fine-tuning.  | 
|
LibriTTS [Zen et al., 2019] dataset.  | 
|
LJSpeech-1.1 [Ito and Johnson, 2017] dataset.  | 
|
MUSDB_HQ [Rafii et al., 2019] dataset.  | 
|
QUESST14 [Miro et al., 2015] dataset.  | 
|
Snips [Coucke et al., 2018] dataset.  | 
|
Speech Commands [Warden, 2018] dataset.  | 
|
Tedlium [Rousseau et al., 2012] dataset (releases 1,2 and 3).  | 
|
VCTK 0.92 [Yamagishi et al., 2019] dataset  | 
|
VoxCeleb1 [Nagrani et al., 2017] dataset for speaker identification task.  | 
|
VoxCeleb1 [Nagrani et al., 2017] dataset for speaker verification task.  | 
|
YesNo [YesNo, n.d.] dataset.  |