All Questions
Tagged with bert-language-model pytorch
424
questions
50
votes
10
answers
125k
views
CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)
I got the following error when I ran my PyTorch deep learning model in Google Colab
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
1370 ret = ...
39
votes
3
answers
36k
views
dropout(): argument 'input' (position 1) must be Tensor, not str when using Bert with Huggingface
My code was working fine and when I tried to run it today without changing anything I got the following error:
dropout(): argument 'input' (position 1) must be Tensor, not str
Would appreciate if ...
24
votes
1
answer
49k
views
How does max_length, padding and truncation arguments work in HuggingFace' BertTokenizerFast.from_pretrained('bert-base-uncased')?
I am working with Text Classification problem where I want to use the BERT model as the base followed by Dense layers. I want to know how does the 3 arguments work? For example, if I have 3 sentences ...
24
votes
1
answer
62k
views
PyTorch BERT TypeError: forward() got an unexpected keyword argument 'labels'
Training a BERT model using PyTorch transformers (following the tutorial here).
Following statement in the tutorial
loss = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask, labels=...
22
votes
6
answers
27k
views
AttributeError: module 'torch' has no attribute '_six'. Bert model in Pytorch
I tried to load pre-trained model by using BertModel class in pytorch.
I have _six.py under torch, but it still shows module 'torch' has no attribute '_six'
import torch
from pytorch_pretrained_bert ...
21
votes
1
answer
30k
views
PyTorch: RuntimeError: Input, output and indices must be on the current device
I am running a BERT model on torch. It's a multi-class sentiment classification task with about 30,000 rows. I have already put everything on cuda, but not sure why I'm getting the following run time ...
19
votes
5
answers
68k
views
Pytorch: IndexError: index out of range in self. How to solve?
This training code is based on the run_glue.py script found here:
# Set the seed value all over the place to make this reproducible.
seed_val = 42
random.seed(seed_val)
np.random.seed(seed_val)
torch....
18
votes
1
answer
12k
views
BertForSequenceClassification vs. BertForMultipleChoice for sentence multi-class classification
I'm working on a text classification problem (e.g. sentiment analysis), where I need to classify a text string into one of five classes.
I just started using the Huggingface Transformer package and ...
17
votes
2
answers
33k
views
The size of tensor a (707) must match the size of tensor b (512) at non-singleton dimension 1
I am trying to do text classification using pretrained BERT model. I trained the model on my dataset, and in the phase of testing; I know that BERT can only take to 512 tokens, so I wrote if condition ...
17
votes
2
answers
11k
views
Difficulty in understanding the tokenizer used in Roberta model
from transformers import AutoModel, AutoTokenizer
tokenizer1 = AutoTokenizer.from_pretrained("roberta-base")
tokenizer2 = AutoTokenizer.from_pretrained("bert-base-cased")
sequence = "A Titan RTX has ...
16
votes
3
answers
23k
views
How to understand hidden_states of the returns in BertModel?(huggingface-transformers)
Returns last_hidden_state (torch.FloatTensor of shape (batch_size,
sequence_length, hidden_size)): Sequence of hidden-states at the
output of the last layer of the model.
pooler_output (torch....
15
votes
3
answers
35k
views
Python: BERT Error - Some weights of the model checkpoint at were not used when initializing BertModel
I am creating an entity extraction model in PyTorch using bert-base-uncased but when I try to run the model I get this error:
Error:
Some weights of the model checkpoint at D:\Transformers\bert-entity-...
14
votes
1
answer
14k
views
PyTorch torch.no_grad() versus requires_grad=False
I'm following a PyTorch tutorial which uses the BERT NLP model (feature extractor) from the Huggingface Transformers library. There are two pieces of interrelated code for gradient updates that I don'...
13
votes
4
answers
8k
views
How to fine tune BERT on unlabeled data?
I want to fine tune BERT on a specific domain. I have texts of that domain in text files. How can I use these to fine tune BERT?
I am looking here currently.
My main objective is to get sentence ...
12
votes
4
answers
11k
views
Training TFBertForSequenceClassification with custom X and Y data
I am working on a TextClassification problem, for which I am trying to traing my model on TFBertForSequenceClassification given in huggingface-transformers library.
I followed the example given on ...
12
votes
3
answers
37k
views
OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index']
When I load the BERT pretrained model online I get this error OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index'] found in directory uncased_L-12_H-768_A-12 or '...
12
votes
2
answers
5k
views
Get probability of multi-token word in MASK position
It is relatively easy to get a token's probability according to a language model, as the snippet below shows. You can get the output of a model, restrict yourself to the output of the masked token, ...
11
votes
1
answer
2k
views
what's the difference between "self-attention mechanism" and "full-connection" layer?
I am confused with these two structures. In theory, the output of them are all connected to their input. what magic make 'self-attention mechanism' is more powerful than the full-connection layer?
10
votes
3
answers
9k
views
Using trained BERT Model and Data Preprocessing
When using a pre-trained BERT embeddings from pytorch (which are then fine-tuned), should the text data fed into the model be pre-processed like in any standard NLP task?
For instance, should ...
10
votes
3
answers
12k
views
BertTokenizer - when encoding and decoding sequences extra spaces appear
When using Transformers from HuggingFace I am facing a problem with the encoding and decoding method.
I have a the following string:
test_string = 'text with percentage%'
Then I am running the ...
9
votes
1
answer
23k
views
RuntimeError: The size of tensor a (4000) must match the size of tensor b (512) at non-singleton dimension 1
I'm trying to build a model for document classification. I'm using BERT with PyTorch.
I got the bert model with below code.
bert = AutoModel.from_pretrained('bert-base-uncased')
This is the code for ...
9
votes
1
answer
24k
views
BERT tokenizer & model download
I`m beginner.. I'm working with Bert. However, due to the security of the company network, the following code does not receive the bert model directly.
tokenizer = BertTokenizer.from_pretrained('bert-...
9
votes
1
answer
8k
views
How do I use BertForMaskedLM or BertModel to calculate perplexity of a sentence?
I want to use BertForMaskedLM or BertModel to calculate perplexity of a sentence, so I write code like this:
import numpy as np
import torch
import torch.nn as nn
from transformers import ...
8
votes
1
answer
9k
views
How to calculate perplexity of a sentence using huggingface masked language models?
I have several masked language models (mainly Bert, Roberta, Albert, Electra). I also have a dataset of sentences. How can I get the perplexity of each sentence?
From the huggingface documentation ...
8
votes
3
answers
5k
views
How to compute mean/max of HuggingFace Transformers BERT token embeddings with attention mask?
I'm using the HuggingFace Transformers BERT model, and I want to compute a summary vector (a.k.a. embedding) over the tokens in a sentence, using either the mean or max function. The complication is ...
7
votes
1
answer
5k
views
How exactly should the input file be formatted for the language model finetuning (BERT through Huggingface Transformers)?
I wanted to employ the examples/run_lm_finetuning.py from the Huggingface Transformers repository on a pretrained Bert model. However, from following the documentation it is not evident how a corpus ...
7
votes
2
answers
14k
views
The model did not return a loss from the inputs - LabSE error
I want to fine tune LabSE for Question answering using squad dataset. and i got this error:
ValueError: The model did not return a loss from the inputs, only the following keys: last_hidden_state,...
7
votes
2
answers
6k
views
The essence of learnable positional embedding? Does embedding improve outcomes better?
I was recently reading the bert source code from the hugging face project. I noticed that the so-called "learnable position encoding" seems to refer to a specific nn.Parameter layer when it ...
7
votes
1
answer
8k
views
Mismatched size on BertForSequenceClassification from Transformers and multiclass problem
I just trained a BERT model on a Dataset composed by products and labels (departments) for an e-commerce website. It's a multiclass problem. I used BertForSequenceClassification to predict the ...
7
votes
1
answer
4k
views
ModuleNotFoundError: No module named 'torch.utils._pytree'
I have installed PyTorch 1.7.1, and it works very well. However, when I try to run this code:
import transformers
from transformers import BertTokenizer
from transformers.models.bert.modeling_bert ...
6
votes
1
answer
8k
views
huggingface bert showing poor accuracy / f1 score [pytorch]
I am trying BertForSequenceClassification for a simple article classification task.
No matter how I train it (freeze all layers but the classification layer, all layers trainable, last k layers ...
6
votes
1
answer
34k
views
Pytorch expects each tensor to be equal size
When running this code: embedding_matrix = torch.stack(embeddings)
I got this error:
RuntimeError: stack expects each tensor to be equal size, but got [7, 768] at entry 0 and [8, 768] at entry 1
I'm ...
5
votes
1
answer
13k
views
TypeError: linear(): argument 'input' (position 1) must be Tensor, not str
so ive been trying to work on some example of bert that i found on github as its the first time im trying to use bert and see how it works. The respiratory im working with is the following: https://...
5
votes
1
answer
5k
views
How to get the probability of a particular token(word) in a sentence given the context
I'm trying to calculate the probability or any type of score for words in a sentence using NLP. I've tried this approach with GPT2 model using Huggingface Transformers library, but, I couldn't get ...
5
votes
1
answer
818
views
How to save parameters just related to classifier layer of pretrained bert model due to the memory concerns?
I fine tuned the pretrained model here by freezing all layers except the classifier layers. And I saved weight file with using pytorch as .bin format.
Now instead of loading the 400mb pre-trained ...
5
votes
2
answers
2k
views
BERT-based NER model giving inconsistent prediction when deserialized
I am trying to train an NER model using the HuggingFace transformers library on Colab cloud GPUs, pickle it and load the model on my own CPU to make predictions.
Code
The model is the following:
from ...
5
votes
2
answers
3k
views
Can I use BERT as a feature extractor without any finetuning on my specific data set?
I'm trying to solve a multilabel classification task of 10 classes with a relatively balanced training set consists of ~25K samples and an evaluation set consists of ~5K samples.
I'm using the ...
5
votes
1
answer
2k
views
Does BertForSequenceClassification classify on the CLS vector?
I'm using the Huggingface Transformer package and BERT with PyTorch. I'm trying to do 4-way sentiment classification and am using BertForSequenceClassification to build a model that eventually leads ...
5
votes
1
answer
9k
views
Get the value of '[UNK]' in BERT
I have designed a model based on BERT to solve NER task. I am using transformers library with the "dccuchile/bert-base-spanish-wwm-cased" pre-trained model. The problem comes when my model detect an ...
4
votes
3
answers
5k
views
How to apply max_length to truncate the token sequence from the left in a HuggingFace tokenizer?
In the HuggingFace tokenizer, applying the max_length argument specifies the length of the tokenized text. I believe it truncates the sequence to max_length-2 (if truncation=True) by cutting the ...
4
votes
1
answer
788
views
pytorch model evaluation slow when deployed on kubernetes
I would like to make the result of a text classification model (finBERT pytorch model) available through an endpoint that is deployed on Kubernetes.
The whole pipeline is working but it's super slow ...
4
votes
1
answer
2k
views
Correct Way to Fine-Tune/Train HuggingFace's Model from scratch (PyTorch)
For example, I want to train a BERT model from scratch but using the existing configuration. Is the following code the correct way to do so?
model = BertModel.from_pretrained('bert-base-cased')
model....
4
votes
1
answer
5k
views
Why do we need state_dict = state_dict.copy()
I want to load the weights of a pre-trained model on my local model. I don’t understand why state_dict = state_dict.copy() is necessary if the two networks have the same name state_dict.
# copy ...
4
votes
2
answers
4k
views
How to convert model.safetensor to pytorch_model.bin?
I'm fine tuning a pre-trained bert model and i have a weird problem:
When i'm fine tuning using the CPU, the code saves the model like this:
With the "pytorch_model.bin". But when i use ...
4
votes
1
answer
2k
views
PyTorch tokenizers: how to truncate tokens from left?
As we can see in the below code snippet, specifying max_length and truncation for a tokenizer cuts excess tokens from the left:
tokenizer("hello, my name", truncation=True, max_length=6).input_ids
...
4
votes
1
answer
5k
views
PyTorch GPU memory leak during inference
I am trying to encode documents sentence-wise with a huggingface transformer module. I'm using the very small google/bert_uncased_L-2_H-128_A-2 pretrained model with the following code:
def ...
4
votes
1
answer
3k
views
How to process TransformerEncoderLayer output in pytorch
I am trying to use bio-bert sentence embeddings for text classification of longer pieces of text.
As it currently stands I standardize the number of sentences in each piece of text (some sentences are ...
4
votes
0
answers
469
views
How to train a Masked Language Model with a big text corpus(200GB) using PyTorch?
Recently I am training a masked language model with a big text corpus(200GB) using transformers. The training data is too big to fit into computer equiped with 512GB memory and V100(32GB)*8. Is it ...
4
votes
0
answers
1k
views
Word embeddings with BERT and map tensors to words
I try to aggregate BERT embeddings on the token level. For each token in the corpus vocabulary, I would like to create a list of all their contextual embeddings and average them to get one ...
3
votes
4
answers
18k
views
Cannot import BertModel from transformers
I am trying to import BertModel from transformers, but it fails. This is code I am using
from transformers import BertModel, BertForMaskedLM
This is the error I get
ImportError: cannot import name '...