Quick Start Guide

This tutorial introduces the basic concepts of how to use crumpets for efficient data processing in Deep Learning.

Crumpets has two important main functionalities. The first one is providing a data processing pipeline using crumpets’ TurboDataLoader. The other is offering a Trainer class, that can be used to train a given network. These will be described in separate. But first at all, how to install it?

1. Installation

Crumpets is pip-installable! Go to the root directory and execute:

>>> pip install .

It might also be useful to install crumpets.torch, which defines torch dependent packages, like a fast TorchTurboDataLoader and DeepLearning Stuff.

>>> python setup-torch.py install

2. Data Processing

This part of the tutorial uses the both examples dataloader_simple.py and dataloader_datadings.py, which can be found in crumpets.examples. A crumpets TurboDataLoader (TDL) loads a given dataset and processes it efficiently. A set of parameters allows modifying its behavior according to one’s individual requirements. Setting up a TDL at least requires an iterable, a batch size, a worker tempalte and the number of workers. Batch size and number of workers are self explaining. So let’s first focus on the iterable and the worker template.

The iterable is what actually defines the dataset. As the name states, this in general can be of any type, but must be something the Loader is able to iterate over, i.e. a set, list, tuple, or such. In specific, the type of worker that is used defines what type the elements in the iterable have to be of, since the workers implement the handling of those. Crumpets predefined workers all expect msgpack packed dictionaries. So, for instance, having the dataset stored in an ImageNet style folder structure (see below), we first need to preprocess that.

Folder Structure:
`– tinyset/
`– cat/
|– 1.jpg
|– 2.jpg
|– 3.jpg
|– 4.jpg
`– elephant/
|– 1.jpg
|– 2.jpg
|– 3.jpg
|– 4.jpg

Preprocessing Code:

def prepare_dataset():
    dsdir = 'tinyset'
    iterable = []
    # walk over all subdirectories containing the classes
    for cls_id, (cls_dir, _, imgs) in enumerate(list(os.walk(dsdir))[1:]):
        # inside a subdirectory specifying a class, walk over all images
        for img_path in imgs:
            # read the image
            with io.FileIO(pt.join(cls_dir, img_path)) as f:
                img = f.read()
            # put it inside a dictionary together with some class id
            dic = {'image': img, 'label': cls_id}
            # pack the dict using msgpack and append it to the result
            iterable.append(msgpack.packb(
               dic, use_bin_type=True,
               default=msgpack_numpy.encode
            ))
    return iterable

But for most datasets this code is unnecessary, because we also offer a python project named datadings. With that one the usual datasets comfortably can be downloaded and preprocessed, using just one command.

Now, having the dataset in the correct format, we have to define a worker_template, which the loader can use to generate worker instances from. The loader later reads from the iterable and sends the elements to its workers, which process them. In parallel, it receives the processed results in a consuming Thread and returns them, if asked for. The worker template can be you own custom implementation, but should at least inherit worker. Usually it is a good idea to directly inherit BufferWorker, as that one already implements the most basic stuff. The most default predefined worker probably is the ClassificationWorker. This one can be used, as the name states, for the standard Deep Learning Task, i.e. Classification. The worker at least requires 2 parameters, image and label. Both are 3-tuples defining the shape, dtype and fill_value of the corresponding input. Shape and dtype are self-explaining. Fill_value is optional. For instance for Imagenet, we might want to define our template like this:

>>> w = ClassificationWorker(
   ((3, 224, 224), np.uint8),
   ((1, ), np.int)
)

Having the worker defined and the data preprocessed, we finally can set up our TDL:

batch_size = 2
epochs = 3
nworkers = 2
sample_size = (3, 224, 224)

# prepare iterable
iterable = prepare_dataset()
num_samples = len(iterable)
cycler = cycle(iterable)  # necessary for multiple epochs

# create loader
loader = TurboDataLoader(
    cycler, batch_size,
    ClassificationWorker(  # Other workers are available, such as SaliencyWorkers
        (sample_size, np.uint8), ((1, ), np.int),
        # this actually means using default augmentations, found in ~crumpets.randomization.randomize_args
        image_rng=AUGMENTATION_TRAIN
    ),
    nworkers,
    length=num_samples,
)

This loader can now easily be used within a with-statement. Note that iterating over the loader returns mini_batches. The default value for that is 1, thus it returns a list of size 1. But one can modify the parameters of the TDL to increase that number to overcome RAM limitations.

with loader:
  for epoch in range(epochs):
     for iteration, mini_batch in loader:
         for sample in mini_batch:
             image = sample['image']
             label = sample['label']

3. Training

The training tutorial refers to both examples pytorch_cifar10.py and pytorch_resnet.py found in crumpets.examples. At the current state of crumpets the only Deep Learning framework that is supported is pytorch. This step of the tutorial thus uses that one and therefore it is necessary to install crumpets-torch in addition to the standard crumpets version. Have a look into the installation section to see how that can be accomplished.

As a first step to get a network trained, we first need to actually define the net. The major part of its implementation is skipped, as this guide is not intended to explain pytorch mechanics. But crumpets requires just a bit of attention when using nets, because of the multi-gpu support and design of the TDL. Loader’s return type is a minibatch of dictionaries. Thus the network must be able to process dictionaries and also return such:

class Net(torch.nn.Module):
    def forward(self, sample):
        x = sample['image'].float()
        x = foo(x)
        sample['output'] = x
        return sample
net = Net()

Crumpets also offers an Unpacker Module for this, therefore equivalent:

class Net(torch.nn.Module):
   def forward(self, sample):
      return foo(sample)
net = Unpacker(Net(), output_key, input_key)

Also, when using pytorch, a slightly modified DataLoader is required, the TorchTurboDataLoader. It basically returns torch tensors instead of numpy arrays and, as said, enables cuda and thus gpu support. The loader can be used in either single or multi-gpu mode, which can be controlled using the devices parameter. If this parameter is just a single string/int/torch.device like 'cuda:0', single mode is used and thus the loader can be used exactly as its more simple ancestor discussed previously. But if the parameter is iterable and potentially contains several cuda devices, it is crucial to wrap a ParallelApply module around the net, since the return type of the loader changes. The ParallelApply module will take care of that and run the net in parallel on multiple gpus if such are available. At default it will also merge the results obtained in the forward passes to be given on the main device. Note that, if the loader shall use cpu exclusively, e.g. if no gpus are available, one can set the devices parameter to 'cpu:0'. There are helper methods in torch, namely is_single_torch_device() and is_gpu_only(), that can be used to check the devices parameter. Setting up a network and loader might look like this:

if not is_cpu_only(torch_devices):
    if is_single_torch_device(torch_devices):
        Unpacker(Net().cuda())
    else:
        network = ParallelApply(Unpacker(Net()))
else:
    network = Unpacker(Net())

# abstract methods, implementation can be found in previous section
train = make_loader(
    train_set, batch_size, devices=devices
)
val = make_loader(
    val_set, batch_size, devices=devices
)

Note that some well known network architectures are reimplemented in crumpets which can be imported and used without having the need of unpackers or additional care. Have a look at crumpets.torch.models.

As usually, training of networks requires an optimization methodology and perhaps a scheduler for varying learning rates and parameters. Again, this is not further explained, as in crumpets this does not differ from standard pytorch. Have a look at pytorch tutorials.

optimizer = SGD([
    {'params': network.parameters(), 'lr': lr},
], momentum=momentum, weight_decay=1e-4)
scheduler = PolyPolicy(optimizer, epochs, 1)

Instead, a special handling again is required when it comes to losses. As often stated, crumpets loaders all return dictionaries. This dictionary may contain different variables depending on the worker’s design. In general and for classification, it consists out of an image input and target label. The default worker for those uses the most common keys, i.e. ‘image’ and ‘label’. Also, if the sample is forwarded through a network, a third value is added to the dictionary. The output of the network. Usually its key is called ‘output’, but that depends on the implementation of the network itself. Anyway, crumpets offers its own loss methods in crumpets.torch.loss, which are minor modifications of the standard torch ones. They are able to handle dictionaries, but require to know the keys:

loss = CrossEntropyLoss(target_key='label', output_key='output')
if cuda:
   loss = loss.cuda()

It is helpful to define further metrics measuring networks quality. Implementations of those can be found in crumpets.torch.metrics. Similar to losses, they need to get the keys passed:

metric = AccuracyMetric(target_key='label', output_key='output')

Finally, all that is left to do, is constructing and running a crumpets Trainer instance, which will take care of the complete training:

trainer = Trainer(
     network=network,
     optimizer=optimizer,
     loss=loss,
     metric=metric,
     train_policy=policy,
     val_policy=None,
     train_iter=train,
     val_iter=val,
     outdir=outdir
 )
 with train:
     with val:
         trainer.train(epochs)

Snapshots, outputs and further logging information can be found in outdir.