All Questions
Tagged with bert-language-model nlp
687
questions
94
votes
10
answers
95k
views
How to use Bert for long text classification?
We know that BERT has a max length limit of tokens = 512, So if an article has a length of much bigger than 512, such as 10000 tokens in text
How can BERT be used?
50
votes
10
answers
125k
views
CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)
I got the following error when I ran my PyTorch deep learning model in Google Colab
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
1370 ret = ...
31
votes
6
answers
40k
views
How to cluster similar sentences using BERT
For ElMo, FastText and Word2Vec, I'm averaging the word embeddings within a sentence and using HDBSCAN/KMeans clustering to group similar sentences.
A good example of the implementation can be seen ...
22
votes
6
answers
27k
views
AttributeError: module 'torch' has no attribute '_six'. Bert model in Pytorch
I tried to load pre-trained model by using BertModel class in pytorch.
I have _six.py under torch, but it still shows module 'torch' has no attribute '_six'
import torch
from pytorch_pretrained_bert ...
21
votes
1
answer
30k
views
PyTorch: RuntimeError: Input, output and indices must be on the current device
I am running a BERT model on torch. It's a multi-class sentiment classification task with about 30,000 rows. I have already put everything on cuda, but not sure why I'm getting the following run time ...
17
votes
5
answers
67k
views
Transformer: Error importing packages. "ImportError: cannot import name 'SAVE_STATE_WARNING' from 'torch.optim.lr_scheduler'"
I am working on a machine learning project on Google Colab, it seems recently there is an issue when trying to import packages from transformers. The error message says:
ImportError: cannot import ...
17
votes
2
answers
11k
views
Difficulty in understanding the tokenizer used in Roberta model
from transformers import AutoModel, AutoTokenizer
tokenizer1 = AutoTokenizer.from_pretrained("roberta-base")
tokenizer2 = AutoTokenizer.from_pretrained("bert-base-cased")
sequence = "A Titan RTX has ...
16
votes
3
answers
23k
views
How to understand hidden_states of the returns in BertModel?(huggingface-transformers)
Returns last_hidden_state (torch.FloatTensor of shape (batch_size,
sequence_length, hidden_size)): Sequence of hidden-states at the
output of the last layer of the model.
pooler_output (torch....
15
votes
3
answers
35k
views
Python: BERT Error - Some weights of the model checkpoint at were not used when initializing BertModel
I am creating an entity extraction model in PyTorch using bert-base-uncased but when I try to run the model I get this error:
Error:
Some weights of the model checkpoint at D:\Transformers\bert-entity-...
13
votes
4
answers
8k
views
How to fine tune BERT on unlabeled data?
I want to fine tune BERT on a specific domain. I have texts of that domain in text files. How can I use these to fine tune BERT?
I am looking here currently.
My main objective is to get sentence ...
12
votes
2
answers
12k
views
How to train BERT from scratch on a new domain for both MLM and NSP?
I’m trying to train BERT model from scratch using my own dataset using HuggingFace library. I would like to train the model in a way that it has the exact architecture of the original BERT model.
In ...
12
votes
1
answer
8k
views
What is the difference between Sentence Encodings and Contextualized Word Embeddings?
I have seen both terms used while reading papers about BERT and ELMo so I wonder if there is a difference between them.
12
votes
4
answers
11k
views
Training TFBertForSequenceClassification with custom X and Y data
I am working on a TextClassification problem, for which I am trying to traing my model on TFBertForSequenceClassification given in huggingface-transformers library.
I followed the example given on ...
11
votes
2
answers
14k
views
Continual pre-training vs. Fine-tuning a language model with MLM
I have some custom data I want to use to further pre-train the BERT model. I’ve tried the two following approaches so far:
Starting with a pre-trained BERT checkpoint and continuing the pre-training ...
11
votes
2
answers
13k
views
How to use Transformers for text classification?
I have two questions about how to use Tensorflow implementation of the Transformers for text classifications.
First, it seems people mostly used only the encoder layer to do the text classification ...
11
votes
1
answer
6k
views
what is so special about special tokens?
what exactly is the difference between "token" and a "special token"?
I understand the following:
what is a typical token
what is a typical special token: MASK, UNK, SEP, etc
when ...
10
votes
4
answers
14k
views
Is it necessary to do stopwords removal ,Stemming/Lemmatization for text classification while using Spacy,Bert?
Is stopwords removal ,Stemming and Lemmatization necessary for text classification while using Spacy,Bert or other advanced NLP models for getting the vector embedding of the text ?
text="The ...
10
votes
3
answers
9k
views
Using trained BERT Model and Data Preprocessing
When using a pre-trained BERT embeddings from pytorch (which are then fine-tuned), should the text data fed into the model be pre-processed like in any standard NLP task?
For instance, should ...
9
votes
2
answers
12k
views
How to find the closest word to a vector using BERT
I am trying to get textual representation(or the closest word) of given word embedding using BERT. Basically I am trying to get similar functionality as in gensim:
>>> your_word_vector = ...
9
votes
2
answers
7k
views
How to get all documents per topic in bertopic modeling
I have a dataset and trying to convert it to topics using berTopic modeling but the problem is, i cant get all the docoments of a topic. berTopic is only return 3 docoments per topic.
topic_model = ...
9
votes
2
answers
3k
views
BERT output not deterministic
BERT output is not deterministic.
I expect the output values are deterministic when I put a same input, but my bert model the values are changing. Sounds awkwardly, the same value is returned twice, ...
9
votes
1
answer
7k
views
Clause extraction / long sentence segmentation in python
I'm currently working on a project involving sentence vectors (from a RoBERTa pretrained model). These vectors are lower quality when sentences are long, and my corpus contains many long sentences ...
9
votes
1
answer
4k
views
Why BERT model have to keep 10% MASK token unchanged?
I am reading BERT model paper. In Masked Language Model task during pre-training BERT model, the paper said the model will choose 15% token ramdomly. In the chose token (Ti), 80% it will be replaced ...
9
votes
1
answer
8k
views
How do I use BertForMaskedLM or BertModel to calculate perplexity of a sentence?
I want to use BertForMaskedLM or BertModel to calculate perplexity of a sentence, so I write code like this:
import numpy as np
import torch
import torch.nn as nn
from transformers import ...
8
votes
1
answer
18k
views
How is the number of parameters be calculated in BERT model?
The paper "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Devlin & Co. calculated for the base model size 110M parameters (i.e. L=12, H=768, A=12) ...
8
votes
1
answer
9k
views
How to calculate perplexity of a sentence using huggingface masked language models?
I have several masked language models (mainly Bert, Roberta, Albert, Electra). I also have a dataset of sentences. How can I get the perplexity of each sentence?
From the huggingface documentation ...
8
votes
1
answer
4k
views
Uni-directional Transformer VS Bi-directional BERT
I just finished reading the Transformer paper and BERT paper. But couldn't figure out why Transformer is uni-directional and BERT is bi-directional as mentioned in BERT paper. As they don't use ...
8
votes
1
answer
14k
views
How to store Word vector Embeddings?
I am using BERT Word Embeddings for sentence classification task with 3 labels. I am using Google Colab for coding. My problem is, since I will have to execute the embedding part every time I restart ...
8
votes
1
answer
3k
views
HuggingFace BERT `inputs_embeds` giving unexpected result
The HuggingFace BERT TensorFlow implementation allows us to feed in a precomputed embedding in place of the embedding lookup that is native to BERT. This is done using the model's call method's ...
7
votes
3
answers
19k
views
Why can't I import functions in bert after pip install bert
I am a beginner for bert, and I am trying to use files of bert given on the GitHub:https://github.com/google-research/bert
However I cannot import files(such as run_classifier, optimisation and so on)...
7
votes
2
answers
14k
views
The model did not return a loss from the inputs - LabSE error
I want to fine tune LabSE for Question answering using squad dataset. and i got this error:
ValueError: The model did not return a loss from the inputs, only the following keys: last_hidden_state,...
7
votes
1
answer
8k
views
max_seq_length for transformer (Sentence-BERT)
I'm using sentence-BERT from Huggingface in the following way:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
model.max_seq_length = 512
model....
7
votes
1
answer
6k
views
Passing multiple sentences to BERT?
I have a dataset with paragraphs that I need to classify into two classes. These paragraphs are usually 3-5 sentences long. The overwhelming majority of them are less than 500 words long. I would like ...
7
votes
1
answer
14k
views
How padding in huggingface tokenizer works?
I tried following tokenization example:
tokenizer = BertTokenizer.from_pretrained(MODEL_TYPE, do_lower_case=True)
sent = "I hate this. Not that.",
_tokenized = tokenizer(sent, ...
7
votes
1
answer
2k
views
Fine-tune Bert for specific domain (unsupervised)
I want to fine-tune BERT on texts that are related to a specific domain (in my case related to engineering). The training should be unsupervised since I don't have any labels or anything. Is this ...
7
votes
1
answer
2k
views
Use BERT under spaCy to get sentence embeddings
I am trying to use BERT to get sentence embeddings. Here is how I am doing it:
import spacy
nlp = spacy.load("en_core_web_trf")
nlp("The quick brown fox jumps over the lazy dog")....
7
votes
1
answer
2k
views
What is the significance of the magnitude/norm of BERT word embeddings?
We generally compare similarity between word embeddings with cosine similarity, but this only takes into account the angle between the vectors, not the norm. With word2vec, the norm of the vector ...
7
votes
1
answer
5k
views
Token indices sequence length error when using encode_plus method
I got a strange error when trying to encode question-answer pairs for BERT using the encode_plus method provided in the Transformers library.
I am using data from this Kaggle competition. Given a ...
6
votes
2
answers
11k
views
BERT get sentence embedding
I am replicating code from this page. I have downloaded the BERT model to my local system and getting sentence embedding.
I have around 500,000 sentences for which I need sentence embedding and it is ...
6
votes
1
answer
11k
views
BertWordPieceTokenizer vs BertTokenizer from HuggingFace
I have the following pieces of code and trying to understand the difference between BertWordPieceTokenizer and BertTokenizer.
BertWordPieceTokenizer (Rust based)
from tokenizers import ...
6
votes
2
answers
6k
views
Can you train a BERT model from scratch with task specific architecture?
BERT pre-training of the base-model is done by a language modeling approach, where we mask certain percent of tokens in a sentence, and we make the model learn those missing mask. Then, I think in ...
6
votes
3
answers
5k
views
How to stop BERT from breaking apart specific words into word-piece
I am using a pre-trained BERT model to tokenize a text into meaningful tokens. However, the text has many specific words and I don't want BERT model to break them into word-pieces. Is there any ...
6
votes
2
answers
3k
views
How to test masked language model after training it?
I have followed this tutorial for masked language modelling from Hugging Face using BERT, but I am unsure how to actually deploy the model.
Tutorial: https://github.com/huggingface/notebooks/blob/...
6
votes
1
answer
5k
views
Using BERT to generate similar word or synonyms through word embeddings
As we all know the capability of BERT model for word embedding, it is probably better than the word2vec and any other models.
I want to create a model on BERT word embedding to generate synonyms or ...
6
votes
1
answer
8k
views
Sliding window for long text in BERT for Question Answering
I've read post which explains how the sliding window works but I cannot find any information on how it is actually implemented.
From what I understand if the input are too long, sliding window can be ...
6
votes
1
answer
9k
views
Using BERT Embeddings in Keras Embedding layer
I want to use the BERT Word Vector Embeddings in the Embeddings layer of LSTM instead of the usual default embedding layer. Is there any way I can do it?
6
votes
2
answers
7k
views
Latest Pre-trained Multilingual Word Embedding
Are there any latest pre-trained multilingual word embeddings (multiple languages are jointly mapped to a same vector space)?
I have looked at the following but they don't fit my needs:
FastText / ...
6
votes
1
answer
939
views
How to predict the probability of an empty string using BERT
Suppose we have a template sentence like this:
"The ____ house is our meeting place."
and we have a list of adjectives to fill in the blank, e.g.:
"yellow"
"large"
&...
6
votes
0
answers
6k
views
How to add index to python FAISS incrementally
I am using Faiss to index my huge dataset embeddings, embedding generated from bert model. I want to add the embeddings incrementally, it is working fine if I only add it with faiss.IndexFlatL2 , but ...
5
votes
1
answer
5k
views
How to get the probability of a particular token(word) in a sentence given the context
I'm trying to calculate the probability or any type of score for words in a sentence using NLP. I've tried this approach with GPT2 model using Huggingface Transformers library, but, I couldn't get ...