crumpets.torch.dataloader module

class crumpets.torch.dataloader.TorchTurboDataLoader(iterable, batch_size, worker_template, nworkers, length=None, num_mini_batches=1, start_iteration=0, device='cuda:0', gpu_augmentation=False, shared_memory=True)[source]

Bases: crumpets.dataloader.TurboDataLoader

TorchTurboDataLoader is a subclass of TurboDataLoader intended for use with the Pytorch framework. It produces torch tensors instead of numpy arrays.

See TurboDataLoader for more details on its operation.

Parameters
  • iterable – An iterable providing a sample per iteration.

  • batch_size – The amount of samples per batch.

  • worker_template – An actual worker instance, determines the kind of processing. Has to inherit crumpets.broker.Worker.

  • nworkers – Number of workers processing the samples simultaneously. worker_template is copied to create them.

  • length – Specifies the length of the dataset. Defaults to the actual length of iterable (if available). If given differs from default, the number of iterations per epoch is modified accordingly.

  • num_mini_batches – Number of mini_batches per batch.

  • start_iteration – Start the iteration counter from this number. Useful when resuming training.

  • shared_memory – Whether to use shared memory to transfer data from workers. If 0 or False, shared memory is disabled. If True, 2*nworkers shared buffers will be used. If any number > 0, that number of buffers will be used. A value of 1 is strongly discouraged to prevent deadlocks. Permanently storing values returned by a loader may also cause deadlocks.

  • device – torch device to use, Defaults to ‘cuda:0’.

  • gpu_augmentation – Use a Randomizer to calculate certain data augmentation operations on GPU. This disables said operations on the CPU side.