PyTorch Lightning:实现快速深度学习模型

2023年07月31日 由 Alex 发表 657528 0
在本文中,我将介绍如何在项目工作流中使用PyTorch Lightning,并提供一个使用PyTorch Lightning库实现变分自动编码器(VAE)的示例。



并展示与在PyTorch中编写相同的代码相比,它如何使我们的工作更轻松。

在PyTorch上实现VAE


让我们看看如何使用PyTorch库实现VAE。
import torch
import torch.nn as nn
import torch.nn.functional as F
class VAE(nn.Module):
def __init__(self, input_dim, hidden_dim, latent_dim):
super(VAE, self).__init__()

self.input_dim = input_dim
self.hidden_dim = hidden_dim
self.latent_dim = latent_dim

self.encoder = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.ReLU(),
nn.Linear(hidden_dim, hidden_dim),
nn.ReLU()
)

self.mu = nn.Linear(hidden_dim, latent_dim)
self.logvar = nn.Linear(hidden_dim, latent_dim)

self.decoder = nn.Sequential(
nn.Linear(latent_dim, hidden_dim),
nn.ReLU(),
nn.Linear(hidden_dim, hidden_dim),
nn.ReLU(),
nn.Linear(hidden_dim, input_dim),
nn.Sigmoid()
)

def encode(self, x):
h = self.encoder(x)
mu = self.mu(h)
logvar = self.logvar(h)
return mu, logvar

def reparameterize(self, mu, logvar):
std = torch.exp(0.5 * logvar)
eps = torch.randn_like(std)
z = mu + eps * std
return z

def decode(self, z):
x_hat = self.decoder(z)
return x_hat

def loss_function(self, x_hat, x, mu, logvar):
bce_loss = F.binary_cross_entropy(x_hat, x, reduction='sum')
kld_loss = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
return bce_loss + kld_loss

def forward(self, x):
mu, logvar = self.encode(x)
z = self.reparameterize(mu, logvar)
x_hat = self.decode(z)
return x_hat, mu, logvar

这是一个简单的VAE实现。注意我们如何在VAE类中使用encode, decode, reparameterize, loss_function和forward(实现PyTorch Module所需)方法。

为了训练模型,你必须为每个epoch循环遍历DataLoader,如下所示:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor

vae = VAE(input_dim=784, hidden_dim=256, latent_dim=20)
optimizer = torch.optim.Adam(vae.parameters(), lr=1e-3)

train_dataset = MNIST(root='data/', train=True, transform=ToTensor(), download=True)
train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True)

num_epochs = 10
for epoch in range(num_epochs):
for batch_idx, (x, _) in enumerate(train_dataloader):
x_hat, mu, logvar = vae(x)
loss = vae.loss_function(x_hat, x, mu, logvar)

optimizer.zero_grad()
loss.backward()
optimizer.step()

if batch_idx % 100 == 0:
print(f"Epoch [{epoch}/{num_epochs}] Batch [{batch_idx}/{len(train_dataloader)}] Loss: {loss.item():.4f}")

此外,正如你可能已经知道的那样,你必须确保调整优化器,计算损失,根据loss.backward()计算的梯度更新模型的参数,在验证时禁用梯度计算(使用torch.no_grad())等。

通过使用PyTorch Lightning,所有这些步骤都可以避免,我们可以编写更干净、更好的代码。

使用PyTorch闪电实现VAE
import torch
import torch.nn as nn
import torch.nn.functional as F
import pytorch_lightning as pl
class VAE(pl.LightningModule):
def __init__(self, input_dim, hidden_dim, latent_dim):
super(VAE, self).__init__()

self.input_dim = input_dim
self.hidden_dim = hidden_dim
self.latent_dim = latent_dim

self.encoder = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.ReLU(),
nn.Linear(hidden_dim, hidden_dim),
nn.ReLU()
)

self.mu = nn.Linear(hidden_dim, latent_dim)
self.logvar = nn.Linear(hidden_dim, latent_dim)

self.decoder = nn.Sequential(
nn.Linear(latent_dim, hidden_dim),
nn.ReLU(),
nn.Linear(hidden_dim, hidden_dim),
nn.ReLU(),
nn.Linear(hidden_dim, input_dim),
nn.Sigmoid()
)

def encode(self, x):
h = self.encoder(x)
mu = self.mu(h)
logvar = self.logvar(h)
return mu, logvar

def reparameterize(self, mu, logvar):
std = torch.exp(0.5 * logvar)
eps = torch.randn_like(std)
z = mu + eps * std
return z

def decode(self, z):
x_hat = self.decoder(z)
return x_hat

def forward(self, x):
mu, logvar = self.encode(x)
z = self.reparameterize(mu, logvar)
x_hat = self.decode(z)
return x_hat, mu, logvar

def loss_function(self, x_hat, x, mu, logvar):
bce_loss = F.binary_cross_entropy(x_hat, x, reduction='sum')
kld_loss = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
return bce_loss + kld_loss

def training_step(self, batch, batch_idx):
x, _ = batch
x_hat, mu, logvar = self(x)
loss = self.loss_function(x_hat, x, mu, logvar)
self.log('train_loss', loss)
return loss

def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
return optimizer

VAE类继承自pl.LightningModule而不是n. module。看看我们如何在类中定义了两个额外的方法:training_step和configure_optimizers。这些方法是pl.LightningModule的内置方法,用于定义训练步骤和配置优化器。

为了现在训练模型,我们可以使用下面的代码:
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor
import pytorch_lightning as pl
train_dataset = MNIST(root='data/', train=True, transform=ToTensor(), download=True)
train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True)

vae = VAE(input_dim=784, hidden_dim=256, latent_dim=20)
trainer = pl.Trainer(max_epochs=10)
trainer.fit(vae, train_dataloader)

为了训练模型,我们不必为每个epoch循环遍历DataLoader。梯度调整等都是由训练器对象自动处理的,我们不必担心它。我们只需要定义模型对象,创建一个具有所需epoch数量的pl.Trainer对象并调用fit方法,然后,我们的模型训练就开始了。

同样,为了保存模型,我们可以使用以下代码:
from pytorch_lightning.callbacks import ModelCheckpoint

# Define a callback to save the model after every epoch
checkpoint_callback = ModelCheckpoint(
monitor='val_loss',
dirpath='checkpoints/',
filename='vae-{epoch:02d}-{val_loss:.2f}',
save_top_k=-1, # save all checkpoints
mode='min',
save_last=True # save the model after every epoch
)

# Create a trainer and fit the model
trainer = pl.Trainer(callbacks=[checkpoint_callback])
trainer.fit(model, train_dataloader, val_dataloader)

在本例中,我们定义了一个ModelCheckpoint回调,并设置了save_last=True选项,以便在每个epoch之后保存模型。我们还设置save_top_k=-1来保存所有检查点文件。

在训练期间,PyTorch Lightning将在每个epoch之后自动将模型保存到指定的目录中,并使用包含epoch号和验证丢失的文件名格式。最终的模型文件也将以相同的格式保存,但epoch号设置为' -1 '。

通过使用PyTorch Lightning中的ModelCheckpoint回调,您可以在训练期间的每个epoch之后轻松保存模型,并确保在需要加载模型的先前状态时可以访问所有中间检查点文件。

除此之外,我们还获得了进度条的好处,除非我们使用一些外部库,否则我们无法仅使用PyTorch获得进度条。

 

来源:https://medium.com/@sheikh.sahil12299/building-and-training-deep-learning-models-faster-with-pytorch-lightning-147a69924da8
欢迎关注ATYUN官方公众号
商务合作及内容投稿请联系邮箱:bd@atyun.com
评论 登录
写评论取消
回复取消