Questions tagged [bert-language-model]

BERT, or Bidirectional Encoder Representations from Transformers, is a method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. BERT uses Transformers (an attention mechanism that learns contextual relations between words or sub words in a text) to generate a language model.

Filter by
Sorted by
Tagged with
9 votes
1 answer
4k views

Why BERT model have to keep 10% MASK token unchanged?

I am reading BERT model paper. In Masked Language Model task during pre-training BERT model, the paper said the model will choose 15% token ramdomly. In the chose token (Ti), 80% it will be replaced ...
Thanh Kiet's user avatar
9 votes
1 answer
8k views

How do I use BertForMaskedLM or BertModel to calculate perplexity of a sentence?

I want to use BertForMaskedLM or BertModel to calculate perplexity of a sentence, so I write code like this: import numpy as np import torch import torch.nn as nn from transformers import ...
Kaim hong's user avatar
  • 113
9 votes
1 answer
9k views

BERT document embedding

I am trying to do document embedding using BERT. The code I use is a combination of two sources. I use BERT Document Classification Tutorial with Code, and BERT Word Embeddings Tutorial. Below is the ...
MRM's user avatar
  • 1,159
9 votes
1 answer
4k views

BERT embedding for semantic similarity

I earlier posted this question. I wanted to get embedding similar to this youtube video, time 33 minutes onward. 1) I dont think that the embedding that i am getting from CLS token are similar to ...
user2543622's user avatar
  • 6,258
8 votes
1 answer
18k views

How is the number of parameters be calculated in BERT model?

The paper "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Devlin & Co. calculated for the base model size 110M parameters (i.e. L=12, H=768, A=12) ...
EchoCache's user avatar
  • 575
8 votes
1 answer
9k views

How to calculate perplexity of a sentence using huggingface masked language models?

I have several masked language models (mainly Bert, Roberta, Albert, Electra). I also have a dataset of sentences. How can I get the perplexity of each sentence? From the huggingface documentation ...
Penguin's user avatar
  • 2,148
8 votes
4 answers
16k views

Error importing BERT: module 'tensorflow._api.v2.train' has no attribute 'Optimizer'

I tried to use bert-tensorflow in Google Colab, but I got the following error: --------------------------------------------------------------------------- AttributeError ...
Belkacem Thiziri's user avatar
8 votes
1 answer
4k views

Uni-directional Transformer VS Bi-directional BERT

I just finished reading the Transformer paper and BERT paper. But couldn't figure out why Transformer is uni-directional and BERT is bi-directional as mentioned in BERT paper. As they don't use ...
JShen's user avatar
  • 409
8 votes
1 answer
14k views

How to store Word vector Embeddings?

I am using BERT Word Embeddings for sentence classification task with 3 labels. I am using Google Colab for coding. My problem is, since I will have to execute the embedding part every time I restart ...
PeakyBlinder's user avatar
  • 1,107
8 votes
3 answers
5k views

How to compute mean/max of HuggingFace Transformers BERT token embeddings with attention mask?

I'm using the HuggingFace Transformers BERT model, and I want to compute a summary vector (a.k.a. embedding) over the tokens in a sentence, using either the mean or max function. The complication is ...
stackoverflowuser2010's user avatar
8 votes
6 answers
6k views

Problem with inputs when building a model with TFBertModel and AutoTokenizer from HuggingFace's transformers

I'm trying to build the model illustrated in this picture: I obtained a pre-trained BERT and respective tokenizer from HuggingFace's transformers in the following way: from transformers import ...
Gerardo Zinno's user avatar
8 votes
1 answer
3k views

HuggingFace BERT `inputs_embeds` giving unexpected result

The HuggingFace BERT TensorFlow implementation allows us to feed in a precomputed embedding in place of the embedding lookup that is native to BERT. This is done using the model's call method's ...
Vivek Subramanian's user avatar
7 votes
1 answer
5k views

How exactly should the input file be formatted for the language model finetuning (BERT through Huggingface Transformers)?

I wanted to employ the examples/run_lm_finetuning.py from the Huggingface Transformers repository on a pretrained Bert model. However, from following the documentation it is not evident how a corpus ...
nminds's user avatar
  • 79
7 votes
3 answers
19k views

Why can't I import functions in bert after pip install bert

I am a beginner for bert, and I am trying to use files of bert given on the GitHub:https://github.com/google-research/bert However I cannot import files(such as run_classifier, optimisation and so on)...
Vicky Ding's user avatar
7 votes
2 answers
14k views

The model did not return a loss from the inputs - LabSE error

I want to fine tune LabSE for Question answering using squad dataset. and i got this error: ValueError: The model did not return a loss from the inputs, only the following keys: last_hidden_state,...
Mateusz Pasierbek's user avatar
7 votes
2 answers
6k views

The essence of learnable positional embedding? Does embedding improve outcomes better?

I was recently reading the bert source code from the hugging face project. I noticed that the so-called "learnable position encoding" seems to refer to a specific nn.Parameter layer when it ...
AdamHommer's user avatar
7 votes
1 answer
8k views

max_seq_length for transformer (Sentence-BERT)

I'm using sentence-BERT from Huggingface in the following way: from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') model.max_seq_length = 512 model....
BlackHawk's user avatar
  • 779
7 votes
1 answer
6k views

Passing multiple sentences to BERT?

I have a dataset with paragraphs that I need to classify into two classes. These paragraphs are usually 3-5 sentences long. The overwhelming majority of them are less than 500 words long. I would like ...
jhfodr76's user avatar
  • 107
7 votes
1 answer
14k views

How padding in huggingface tokenizer works?

I tried following tokenization example: tokenizer = BertTokenizer.from_pretrained(MODEL_TYPE, do_lower_case=True) sent = "I hate this. Not that.", _tokenized = tokenizer(sent, ...
MsA's user avatar
  • 2,829
7 votes
1 answer
2k views

Fine-tune Bert for specific domain (unsupervised)

I want to fine-tune BERT on texts that are related to a specific domain (in my case related to engineering). The training should be unsupervised since I don't have any labels or anything. Is this ...
spadel's user avatar
  • 1,036
7 votes
1 answer
8k views

Mismatched size on BertForSequenceClassification from Transformers and multiclass problem

I just trained a BERT model on a Dataset composed by products and labels (departments) for an e-commerce website. It's a multiclass problem. I used BertForSequenceClassification to predict the ...
Guilherme Giuliano Nicolau's user avatar
7 votes
1 answer
2k views

Use BERT under spaCy to get sentence embeddings

I am trying to use BERT to get sentence embeddings. Here is how I am doing it: import spacy nlp = spacy.load("en_core_web_trf") nlp("The quick brown fox jumps over the lazy dog")....
 owise's user avatar
  • 1,065
7 votes
2 answers
4k views

Pretraining a language model on a small custom corpus

I was curious if it is possible to use transfer learning in text generation, and re-train/pre-train it on a specific kind of text. For example, having a pre-trained BERT model and a small corpus ...
ysig's user avatar
  • 477
7 votes
1 answer
8k views

How to specify a proxy in transformers pipeline

I am using sentiment-analysis pipeline as described here. from transformers import pipeline classifier = pipeline('sentiment-analysis') It's failing with a connection error message ValueError: ...
Kumar's user avatar
  • 194
7 votes
1 answer
4k views

ModuleNotFoundError: No module named 'torch.utils._pytree'

I have installed PyTorch 1.7.1, and it works very well. However, when I try to run this code: import transformers from transformers import BertTokenizer from transformers.models.bert.modeling_bert ...
A_B_Y's user avatar
  • 422
7 votes
1 answer
2k views

What is the significance of the magnitude/norm of BERT word embeddings?

We generally compare similarity between word embeddings with cosine similarity, but this only takes into account the angle between the vectors, not the norm. With word2vec, the norm of the vector ...
Keshinko's user avatar
  • 328
7 votes
1 answer
5k views

Token indices sequence length error when using encode_plus method

I got a strange error when trying to encode question-answer pairs for BERT using the encode_plus method provided in the Transformers library. I am using data from this Kaggle competition. Given a ...
Niels's user avatar
  • 1,191
6 votes
2 answers
9k views

How to untokenize BERT tokens?

I have a sentence and I need to return the text corresponding to N BERT tokens to the left and right of a specific word. from transformers import BertTokenizer tz = BertTokenizer.from_pretrained("...
JayJay's user avatar
  • 183
6 votes
2 answers
11k views

BERT get sentence embedding

I am replicating code from this page. I have downloaded the BERT model to my local system and getting sentence embedding. I have around 500,000 sentences for which I need sentence embedding and it is ...
user2543622's user avatar
  • 6,258
6 votes
1 answer
11k views

BertWordPieceTokenizer vs BertTokenizer from HuggingFace

I have the following pieces of code and trying to understand the difference between BertWordPieceTokenizer and BertTokenizer. BertWordPieceTokenizer (Rust based) from tokenizers import ...
HopeKing's user avatar
  • 3,413
6 votes
2 answers
6k views

Can you train a BERT model from scratch with task specific architecture?

BERT pre-training of the base-model is done by a language modeling approach, where we mask certain percent of tokens in a sentence, and we make the model learn those missing mask. Then, I think in ...
viopu's user avatar
  • 71
6 votes
1 answer
7k views

"You have to specify either input_ids or inputs_embeds", but I did specify the input_ids

I trained a BERT based encoder decoder model (EncoderDecoderModel) named ed_model with HuggingFace's transformers module. I used the BertTokenizer named as input_tokenizer I tokenized the input with: ...
Uri Goren's user avatar
  • 13.6k
6 votes
2 answers
3k views

ImportError when from transformers import BertTokenizer

My code is: import torch from transformers import BertTokenizer from IPython.display import clear_output I got error in line from transformers import BertTokenizer: ImportError: /lib/x86_64-linux-gnu/...
enhhh's user avatar
  • 61
6 votes
3 answers
5k views

How to stop BERT from breaking apart specific words into word-piece

I am using a pre-trained BERT model to tokenize a text into meaningful tokens. However, the text has many specific words and I don't want BERT model to break them into word-pieces. Is there any ...
parvaneh shayegh's user avatar
6 votes
1 answer
8k views

huggingface bert showing poor accuracy / f1 score [pytorch]

I am trying BertForSequenceClassification for a simple article classification task. No matter how I train it (freeze all layers but the classification layer, all layers trainable, last k layers ...
Zabir Al Nazi's user avatar
6 votes
2 answers
3k views

How to test masked language model after training it?

I have followed this tutorial for masked language modelling from Hugging Face using BERT, but I am unsure how to actually deploy the model. Tutorial: https://github.com/huggingface/notebooks/blob/...
user avatar
6 votes
2 answers
3k views

How to fix random seed for BERTopic?

I'd like to fix the random seed from BERTopic library to get reproducible results. Looking at the code of BERTopic I see it uses numpy. Will using np.random.seed(123) be enough? or do I also need to ...
RM-'s user avatar
  • 998
6 votes
1 answer
34k views

Pytorch expects each tensor to be equal size

When running this code: embedding_matrix = torch.stack(embeddings) I got this error: RuntimeError: stack expects each tensor to be equal size, but got [7, 768] at entry 0 and [8, 768] at entry 1 I'm ...
sam's user avatar
  • 79
6 votes
3 answers
9k views

Huggingface BERT Tokenizer add new token

I am using Huggingface BERT for an NLP task. My texts contain names of companies which are split up into subwords. tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased') tokenizer....
Nui's user avatar
  • 111
6 votes
1 answer
3k views

Bert Embedding Layer raises `Type Error: unsupported operand type(s) for +: 'None Type' and 'int'` with BiLSTM

I've problems integrating Bert Embedding Layer in a BiLSTM model for word sense disambiguation task, Windows 10 Python 3.6.4 TenorFlow 1.12 Keras 2.2.4 No virtual environments were used PyCharm ...
ElSheikh's user avatar
  • 319
6 votes
1 answer
5k views

Using BERT to generate similar word or synonyms through word embeddings

As we all know the capability of BERT model for word embedding, it is probably better than the word2vec and any other models. I want to create a model on BERT word embedding to generate synonyms or ...
DevPy's user avatar
  • 467
6 votes
1 answer
8k views

Sliding window for long text in BERT for Question Answering

I've read post which explains how the sliding window works but I cannot find any information on how it is actually implemented. From what I understand if the input are too long, sliding window can be ...
Benj's user avatar
  • 63
6 votes
1 answer
9k views

Using BERT Embeddings in Keras Embedding layer

I want to use the BERT Word Vector Embeddings in the Embeddings layer of LSTM instead of the usual default embedding layer. Is there any way I can do it?
PeakyBlinder's user avatar
  • 1,107
6 votes
2 answers
7k views

Latest Pre-trained Multilingual Word Embedding

Are there any latest pre-trained multilingual word embeddings (multiple languages are jointly mapped to a same vector space)? I have looked at the following but they don't fit my needs: FastText / ...
MachineLearner's user avatar
6 votes
1 answer
1k views

BERT performing worse than word2vec

I am trying to use BERT for a document ranking problem. My task is pretty straightforward. I have to do a similarity ranking for an input document. The only issue here is that I don’t have labels - so ...
user3741951's user avatar
6 votes
1 answer
939 views

How to predict the probability of an empty string using BERT

Suppose we have a template sentence like this: "The ____ house is our meeting place." and we have a list of adjectives to fill in the blank, e.g.: "yellow" "large" &...
brienna's user avatar
  • 1,474
6 votes
3 answers
5k views

Why aren't transformers imported in Python?

I want to import transformers in jupyter notebook but I get the following error. What is the reason for this error? My Python version is 3.8 ImportError: cannot import name 'TypeAlias' from '...
M_Eng's user avatar
  • 101
6 votes
3 answers
4k views

TypeError: Layer input_spec must be an instance of InputSpec. Got: InputSpec(shape=(None, 128, 768), ndim=3)

I am trying to use a BERT pretrained model to do a multiclass classification (of 3 classes). Here's my function to use the model and also added some extra functionalities: def create_model(max_seq_len,...
Hrisav Bhowmick's user avatar
6 votes
1 answer
9k views

How to feed Bert embeddings to LSTM

I am working on a Bert + MLP model for text classification problem. Essentially, I am trying to replace the MLP model with a basic LSTM model. Is it possible to create a LSTM with embedding? Or, is ...
Green's user avatar
  • 685
6 votes
0 answers
6k views

How to add index to python FAISS incrementally

I am using Faiss to index my huge dataset embeddings, embedding generated from bert model. I want to add the embeddings incrementally, it is working fine if I only add it with faiss.IndexFlatL2 , but ...
DevPy's user avatar
  • 467

1
2
3 4 5
36