View the runnable example on GitHub

Convert PyTorch Training Loop to Use TorchNano#

๐Ÿ“š Related Reading

If you have already defined a PyTorch training loop function with a model, optimizers, and dataloaders as parameters, you could refer to this guide to use @nano decorator, which is a simpler way to gain acceleration from BigDL-Nano.

TorchNano API integrates multiple optimizations to accelerate custom PyTorch training loop. As a pure PyTorch user, you could apply few changes to your existing code to use TorchNano.

๐Ÿ“ Note

Before starting your PyTorch application, it is highly recommended to run source bigdl-nano-init to set several environment variables based on your current hardware. Empirically, these variables will bring big performance increase for most PyTorch applications on training workloads.

PyTorch Training Loops Example#

Suppose you would like to finetune a ResNet-18 model (pretrained on ImageNet dataset) on OxfordIIITPet dataset, you may create datasets, the model and define your training loops as follows:

[ ]:
from tqdm import tqdm

def train_loops():
    model = MyPytorchModule()
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
    loss_fuc = torch.nn.CrossEntropyLoss()
    train_loader = create_train_dataloader()

    num_epochs = 5

    for epoch in range(num_epochs):

        model.train()
        train_loss, num = 0, 0
        with tqdm(train_loader, unit="batch") as tepoch:
            for data, target in tepoch:
                tepoch.set_description(f"Epoch {epoch}")
                optimizer.zero_grad()
                output = model(data)
                loss = loss_fuc(output, target)
                loss.backward()
                optimizer.step()
                loss_value = loss.sum()
                train_loss += loss_value
                num += 1
                tepoch.set_postfix(loss=loss_value)
        print(f'Train Epoch: {epoch}, avg_loss: {train_loss / num}')

ย ย ย ย ย  The definition of MyPytorchModule and create_train_dataloader can be found in the runnable example.

Convert to TorchNano#

There are 5 simple steps to convert your PyTorch code to use TorchNano:

  1. Import TorchNano

  2. Subclass TorchNano and override its train method

  3. Move the code for your custom training loops inside the TorchNanoโ€™s train method

  4. Call TorchNanoโ€™s setup method to set up model, optimizer(s), and dataloader(s) for accelerated training

  5. Replace loss.backward() with self.backward(loss)

[ ]:
# Step 1. import TorchNano
from bigdl.nano.pytorch import TorchNano

# Step 2. subclass TorchNano and override its train method
class MyNano(TorchNano):
    def train(self):
        # Step 3. Move the code for your custom training loops inside the train method
        model = MyPytorchModule()
        optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
        loss_fuc = torch.nn.CrossEntropyLoss()
        train_loader = create_train_dataloader()

        # Step 4. call setup method to set up model, optimizer(s),
        #         and dataloader(s) for accelerated training
        model, optimizer, train_loader = self.setup(model, optimizer, train_loader)
        num_epochs = 5

        for epoch in range(num_epochs):

            model.train()
            train_loss, num = 0, 0
            with tqdm(train_loader, unit="batch") as tepoch:
                for data, target in tepoch:
                    tepoch.set_description(f"Epoch {epoch}")
                    optimizer.zero_grad()
                    output = model(data)
                    loss = loss_fuc(output, target)
                    # Step 5. Replace loss.backward() with self.backward(loss)
                    self.backward(loss)
                    optimizer.step()
                    loss_value = loss.sum()
                    train_loss += loss_value
                    num += 1
                    tepoch.set_postfix(loss=loss_value)
            print(f'Train Epoch: {epoch}, avg_loss: {train_loss / num}')

๐Ÿ“ Note

To make sure that the converted TorchNano still has a functional training loop, there are some requirements:

  • there should be one and only one instance of torch.nn.Module as model in the training loop

  • there should be at least one instance of torch.optim.Optimizer as optimizer in the training loop

  • there should be at least one instance of torch.utils.data.DataLoader as dataloader in the training loop

You could then do the training by instantiating MyNano and calling its train method:

[ ]:
MyNano().train()

๐Ÿ“ Note

Due to the optimized environment variables set by source bigdl-nano-init, you could already experience some training acceleration after converting your PyTorch code to use TorchNano.

๐Ÿ“š Related Readings