202112251111 PyTorch Tricks

#pytorch #programming #tricks #deep learning

Scheduler

Both following schedulers prove to be faster in convergence, with the cost of introduction of few extra hyper-parameters – minimum learning rate, maximum learning rate. 1

Dataloader

  • Using multiple data workers to speed up loading the dataset, but be aware of the data duplicates: 1
    • For map-style dataset, data is retrieved with indices generated by sampler, so no duplication is created;
    • For iterable-style dataset, each worker should have specific handling according to its init function and parameters;
  • pin_memory speeds up data transfer from memory to GPU memory. 1

Automatic Mixed Precision

import torch
# Creates once at the beginning of training
scaler = torch.cuda.amp.GradScaler()

for data, label in data_iter:
   optimizer.zero_grad()
   # Casts operations to mixed precision
   with torch.cuda.amp.autocast():
      loss = model(data)

   # Scales the loss, and calls backward()
   # to create scaled gradients
   scaler.scale(loss).backward()

   # Unscales gradients and calls
   # or skips optimizer.step()
   scaler.step(optimizer)

   # Updates the scale for next iteration
   scaler.update()

[[202109261531 Machine Learning Compilers and Optimizers|Optimizers]]

TorchScript