Questions tagged [bert-language-model]
BERT, or Bidirectional Encoder Representations from Transformers, is a method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. BERT uses Transformers (an attention mechanism that learns contextual relations between words or sub words in a text) to generate a language model.
1,802
questions
94
votes
10
answers
95k
views
How to use Bert for long text classification?
We know that BERT has a max length limit of tokens = 512, So if an article has a length of much bigger than 512, such as 10000 tokens in text
How can BERT be used?
50
votes
10
answers
125k
views
CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)
I got the following error when I ran my PyTorch deep learning model in Google Colab
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
1370 ret = ...
46
votes
5
answers
58k
views
ValueError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]] - Tokenizing BERT / Distilbert Error
def split_data(path):
df = pd.read_csv(path)
return train_test_split(df , test_size=0.1, random_state=100)
train, test = split_data(DATA_DIR)
train_texts, train_labels = train['text'].to_list(), ...
43
votes
2
answers
27k
views
Why Bert transformer uses [CLS] token for classification instead of average over all tokens?
I am doing experiments on bert architecture and found out that most of the fine-tuning task takes the final hidden layer as text representation and later they pass it to other models for the further ...
39
votes
3
answers
36k
views
dropout(): argument 'input' (position 1) must be Tensor, not str when using Bert with Huggingface
My code was working fine and when I tried to run it today without changing anything I got the following error:
dropout(): argument 'input' (position 1) must be Tensor, not str
Would appreciate if ...
31
votes
6
answers
40k
views
How to cluster similar sentences using BERT
For ElMo, FastText and Word2Vec, I'm averaging the word embeddings within a sentence and using HDBSCAN/KMeans clustering to group similar sentences.
A good example of the implementation can be seen ...
24
votes
1
answer
49k
views
How does max_length, padding and truncation arguments work in HuggingFace' BertTokenizerFast.from_pretrained('bert-base-uncased')?
I am working with Text Classification problem where I want to use the BERT model as the base followed by Dense layers. I want to know how does the 3 arguments work? For example, if I have 3 sentences ...
24
votes
1
answer
62k
views
PyTorch BERT TypeError: forward() got an unexpected keyword argument 'labels'
Training a BERT model using PyTorch transformers (following the tutorial here).
Following statement in the tutorial
loss = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask, labels=...
23
votes
3
answers
21k
views
Cased VS uncased BERT models in spacy and train data
I want to use spacy's pretrained BERT model for text classification but I'm a little confused about cased/uncased models. I read somewhere that cased models should only be used when there is a chance ...
22
votes
6
answers
27k
views
AttributeError: module 'torch' has no attribute '_six'. Bert model in Pytorch
I tried to load pre-trained model by using BertModel class in pytorch.
I have _six.py under torch, but it still shows module 'torch' has no attribute '_six'
import torch
from pytorch_pretrained_bert ...
21
votes
1
answer
30k
views
PyTorch: RuntimeError: Input, output and indices must be on the current device
I am running a BERT model on torch. It's a multi-class sentiment classification task with about 30,000 rows. I have already put everything on cuda, but not sure why I'm getting the following run time ...
19
votes
5
answers
68k
views
Pytorch: IndexError: index out of range in self. How to solve?
This training code is based on the run_glue.py script found here:
# Set the seed value all over the place to make this reproducible.
seed_val = 42
random.seed(seed_val)
np.random.seed(seed_val)
torch....
18
votes
1
answer
12k
views
BertForSequenceClassification vs. BertForMultipleChoice for sentence multi-class classification
I'm working on a text classification problem (e.g. sentiment analysis), where I need to classify a text string into one of five classes.
I just started using the Huggingface Transformer package and ...
17
votes
2
answers
33k
views
The size of tensor a (707) must match the size of tensor b (512) at non-singleton dimension 1
I am trying to do text classification using pretrained BERT model. I trained the model on my dataset, and in the phase of testing; I know that BERT can only take to 512 tokens, so I wrote if condition ...
17
votes
5
answers
67k
views
Transformer: Error importing packages. "ImportError: cannot import name 'SAVE_STATE_WARNING' from 'torch.optim.lr_scheduler'"
I am working on a machine learning project on Google Colab, it seems recently there is an issue when trying to import packages from transformers. The error message says:
ImportError: cannot import ...
17
votes
2
answers
11k
views
Difficulty in understanding the tokenizer used in Roberta model
from transformers import AutoModel, AutoTokenizer
tokenizer1 = AutoTokenizer.from_pretrained("roberta-base")
tokenizer2 = AutoTokenizer.from_pretrained("bert-base-cased")
sequence = "A Titan RTX has ...
16
votes
2
answers
31k
views
Download pre-trained sentence-transformers model locally
I am using the SentenceTransformers library (here: https://pypi.org/project/sentence-transformers/#pretrained-models) for creating embeddings of sentences using the pre-trained model bert-base-nli-...
16
votes
3
answers
23k
views
How to understand hidden_states of the returns in BertModel?(huggingface-transformers)
Returns last_hidden_state (torch.FloatTensor of shape (batch_size,
sequence_length, hidden_size)): Sequence of hidden-states at the
output of the last layer of the model.
pooler_output (torch....
15
votes
3
answers
21k
views
BERT sentence embeddings from transformers
I'm trying to get sentence vectors from hidden states in a BERT model. Looking at the huggingface BertModel instructions here, which say:
from transformers import BertTokenizer, BertModel
tokenizer = ...
15
votes
3
answers
35k
views
Python: BERT Error - Some weights of the model checkpoint at were not used when initializing BertModel
I am creating an entity extraction model in PyTorch using bert-base-uncased but when I try to run the model I get this error:
Error:
Some weights of the model checkpoint at D:\Transformers\bert-entity-...
15
votes
6
answers
40k
views
With BERT Text Classification, ValueError: too many dimensions 'str' error occuring
Trying to make a classifier for sentiments of texts with BERT model but getting ValueError : too many dimensions 'str'
That is the DataFrame for values of train data; so they are train_labels
0 notr
...
15
votes
2
answers
10k
views
BertModel transformers outputs string instead of tensor
I'm following this tutorial that codes a sentiment analysis classifier using BERT with the huggingface library and I'm having a very odd behavior. When trying the BERT model with a sample text I get a ...
14
votes
1
answer
14k
views
PyTorch torch.no_grad() versus requires_grad=False
I'm following a PyTorch tutorial which uses the BERT NLP model (feature extractor) from the Huggingface Transformers library. There are two pieces of interrelated code for gradient updates that I don'...
13
votes
4
answers
8k
views
How to fine tune BERT on unlabeled data?
I want to fine tune BERT on a specific domain. I have texts of that domain in text files. How can I use these to fine tune BERT?
I am looking here currently.
My main objective is to get sentence ...
12
votes
2
answers
12k
views
How to train BERT from scratch on a new domain for both MLM and NSP?
I’m trying to train BERT model from scratch using my own dataset using HuggingFace library. I would like to train the model in a way that it has the exact architecture of the original BERT model.
In ...
12
votes
8
answers
37k
views
SSLError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /dslim/bert-base-NER/resolve/main/tokenizer_config.json
I am facing below issue while loading the pretrained BERT model from HuggingFace due to SSL certificate error.
Error:
SSLError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries ...
12
votes
1
answer
8k
views
What is the difference between Sentence Encodings and Contextualized Word Embeddings?
I have seen both terms used while reading papers about BERT and ELMo so I wonder if there is a difference between them.
12
votes
4
answers
11k
views
Training TFBertForSequenceClassification with custom X and Y data
I am working on a TextClassification problem, for which I am trying to traing my model on TFBertForSequenceClassification given in huggingface-transformers library.
I followed the example given on ...
12
votes
3
answers
37k
views
OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index']
When I load the BERT pretrained model online I get this error OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index'] found in directory uncased_L-12_H-768_A-12 or '...
12
votes
2
answers
5k
views
Get probability of multi-token word in MASK position
It is relatively easy to get a token's probability according to a language model, as the snippet below shows. You can get the output of a model, restrict yourself to the output of the masked token, ...
11
votes
2
answers
14k
views
Continual pre-training vs. Fine-tuning a language model with MLM
I have some custom data I want to use to further pre-train the BERT model. I’ve tried the two following approaches so far:
Starting with a pre-trained BERT checkpoint and continuing the pre-training ...
11
votes
1
answer
2k
views
what's the difference between "self-attention mechanism" and "full-connection" layer?
I am confused with these two structures. In theory, the output of them are all connected to their input. what magic make 'self-attention mechanism' is more powerful than the full-connection layer?
11
votes
2
answers
13k
views
How to use Transformers for text classification?
I have two questions about how to use Tensorflow implementation of the Transformers for text classifications.
First, it seems people mostly used only the encoder layer to do the text classification ...
11
votes
3
answers
15k
views
Transformers pretrained model with dropout setting
I'm trying to use transformer's huggingface pretrained model bert-base-uncased, but I want to increace dropout. There isn't any mention to this in from_pretrained method, but colab ran the object ...
11
votes
1
answer
6k
views
what is so special about special tokens?
what exactly is the difference between "token" and a "special token"?
I understand the following:
what is a typical token
what is a typical special token: MASK, UNK, SEP, etc
when ...
11
votes
2
answers
3k
views
Removing SEP token in Bert for text classification
Given a sentiment classification dataset, I want to fine-tune Bert.
As you know that BERT created to predict the next sentence given the current sentence. Thus, to make the network aware of this, ...
10
votes
4
answers
14k
views
Is it necessary to do stopwords removal ,Stemming/Lemmatization for text classification while using Spacy,Bert?
Is stopwords removal ,Stemming and Lemmatization necessary for text classification while using Spacy,Bert or other advanced NLP models for getting the vector embedding of the text ?
text="The ...
10
votes
2
answers
21k
views
How to add new special token to the tokenizer?
I want to build a multi-class classification model for which I have conversational data as input for the BERT model (using bert-base-uncased).
QUERY: I want to ask a question.
ANSWER: Sure, ask away.
...
10
votes
3
answers
9k
views
Using trained BERT Model and Data Preprocessing
When using a pre-trained BERT embeddings from pytorch (which are then fine-tuned), should the text data fed into the model be pre-processed like in any standard NLP task?
For instance, should ...
10
votes
1
answer
14k
views
How to get intermediate layers' output of pre-trained BERT model in HuggingFace Transformers library?
(I'm following this pytorch tutorial about BERT word embeddings, and in the tutorial the author is access the intermediate layers of the BERT model.)
What I want is to access the last, lets say, 4 ...
10
votes
1
answer
3k
views
How to use existing huggingface-transformers model into spacy?
I'm here to ask you guys if it is possible to use an existing trained huggingface-transformers model with spacy.
My first naive attempt was to load it via spacy.load('bert-base-uncased'), it didn't ...
10
votes
3
answers
12k
views
BertTokenizer - when encoding and decoding sequences extra spaces appear
When using Transformers from HuggingFace I am facing a problem with the encoding and decoding method.
I have a the following string:
test_string = 'text with percentage%'
Then I am running the ...
9
votes
2
answers
12k
views
How to find the closest word to a vector using BERT
I am trying to get textual representation(or the closest word) of given word embedding using BERT. Basically I am trying to get similar functionality as in gensim:
>>> your_word_vector = ...
9
votes
1
answer
23k
views
RuntimeError: The size of tensor a (4000) must match the size of tensor b (512) at non-singleton dimension 1
I'm trying to build a model for document classification. I'm using BERT with PyTorch.
I got the bert model with below code.
bert = AutoModel.from_pretrained('bert-base-uncased')
This is the code for ...
9
votes
1
answer
24k
views
BERT tokenizer & model download
I`m beginner.. I'm working with Bert. However, due to the security of the company network, the following code does not receive the bert model directly.
tokenizer = BertTokenizer.from_pretrained('bert-...
9
votes
2
answers
7k
views
How to get all documents per topic in bertopic modeling
I have a dataset and trying to convert it to topics using berTopic modeling but the problem is, i cant get all the docoments of a topic. berTopic is only return 3 docoments per topic.
topic_model = ...
9
votes
2
answers
3k
views
BERT output not deterministic
BERT output is not deterministic.
I expect the output values are deterministic when I put a same input, but my bert model the values are changing. Sounds awkwardly, the same value is returned twice, ...
9
votes
2
answers
12k
views
Outputting attention for bert-base-uncased with huggingface/transformers (torch)
I was following a paper on BERT-based lexical substitution (specifically trying to implement equation (2) - if someone has already implemented the whole paper that would also be great). Thus, I wanted ...
9
votes
4
answers
32k
views
How to resolve ERROR: Could not build wheels for hdbscan, which is required to install pyproject.toml-based projects
I am trying to install bertopic and I got this error:
pip install bertopic
Collecting bertopic
> Using cached bertopic-0.11.0-py2.py3-none-any.whl (76 kB)
> Collecting ...
9
votes
1
answer
7k
views
Clause extraction / long sentence segmentation in python
I'm currently working on a project involving sentence vectors (from a RoBERTa pretrained model). These vectors are lower quality when sentences are long, and my corpus contains many long sentences ...