I'm trying to build a model for document classification. I'm using BERT
with PyTorch
.
I got the bert model with below code.
bert = AutoModel.from_pretrained('bert-base-uncased')
This is the code for training.
for epoch in range(epochs):
print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))
#train model
train_loss, _ = modhelper.train(proc.train_dataloader)
#evaluate model
valid_loss, _ = modhelper.evaluate()
#save the best model
if valid_loss < best_valid_loss:
best_valid_loss = valid_loss
torch.save(modhelper.model.state_dict(), 'saved_weights.pt')
# append training and validation loss
train_losses.append(train_loss)
valid_losses.append(valid_loss)
print(f'\nTraining Loss: {train_loss:.3f}')
print(f'Validation Loss: {valid_loss:.3f}')
this is my train method, accessible with the object modhelper
.
def train(self, train_dataloader):
self.model.train()
total_loss, total_accuracy = 0, 0
# empty list to save model predictions
total_preds=[]
# iterate over batches
for step, batch in enumerate(train_dataloader):
# progress update after every 50 batches.
if step % 50 == 0 and not step == 0:
print(' Batch {:>5,} of {:>5,}.'.format(step, len(train_dataloader)))
# push the batch to gpu
#batch = [r.to(device) for r in batch]
sent_id, mask, labels = batch
# clear previously calculated gradients
self.model.zero_grad()
print(sent_id.size(), mask.size())
# get model predictions for the current batch
preds = self.model(sent_id, mask) #This line throws the error
# compute the loss between actual and predicted values
self.loss = self.cross_entropy(preds, labels)
# add on to the total loss
total_loss = total_loss + self.loss.item()
# backward pass to calculate the gradients
self.loss.backward()
# clip the the gradients to 1.0. It helps in preventing the exploding gradient problem
torch.nn.utils.clip_grad_norm_(self.model.parameters(), 1.0)
# update parameters
self.optimizer.step()
# model predictions are stored on GPU. So, push it to CPU
#preds=preds.detach().cpu().numpy()
# append the model predictions
total_preds.append(preds)
# compute the training loss of the epoch
avg_loss = total_loss / len(train_dataloader)
# predictions are in the form of (no. of batches, size of batch, no. of classes).
# reshape the predictions in form of (number of samples, no. of classes)
total_preds = np.concatenate(total_preds, axis=0)
#returns the loss and predictions
return avg_loss, total_preds
preds = self.model(sent_id, mask)
this line throws the following error(including full traceback).
Epoch 1 / 1
torch.Size([32, 4000]) torch.Size([32, 4000])
Traceback (most recent call last):
File "<ipython-input-39-17211d5a107c>", line 8, in <module>
train_loss, _ = modhelper.train(proc.train_dataloader)
File "E:\BertTorch\model.py", line 71, in train
preds = self.model(sent_id, mask)
File "E:\BertTorch\venv\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "E:\BertTorch\model.py", line 181, in forward
#pass the inputs to the model
File "E:\BertTorch\venv\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "E:\BertTorch\venv\lib\site-packages\transformers\modeling_bert.py", line 837, in forward
embedding_output = self.embeddings(
File "E:\BertTorch\venv\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "E:\BertTorch\venv\lib\site-packages\transformers\modeling_bert.py", line 201, in forward
embeddings = inputs_embeds + position_embeddings + token_type_embeddings
RuntimeError: The size of tensor a (4000) must match the size of tensor b (512) at non-singleton dimension 1
If you observe I've printed the torch size in the code.
print(sent_id.size(), mask.size())
The output of that line of code is torch.Size([32, 4000]) torch.Size([32, 4000])
.
as we can see that size is the same but it throws the error. Please put your thoughts. Really appreciate it.
please comment if you need further information. I'll be quick to add whatever is required.
embeddings = inputs_embeds + position_embeddings + token_type_embeddings
. Probably there's a shape mismatch between the three variables and hence the error.self.model()
throws the error. But if you look carefully at the stack trace, you can find out where exactly during the forward pass of the model the error occurs.