Questions tagged [bert-language-model]
BERT, or Bidirectional Encoder Representations from Transformers, is a method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. BERT uses Transformers (an attention mechanism that learns contextual relations between words or sub words in a text) to generate a language model.
1,803
questions
4
votes
1
answer
5k
views
BertModel or BertForPreTraining
I want to use Bert only for embedding and use the Bert output as an input for a classification net that I will build from scratch.
I am not sure if I want to do finetuning for the model.
I think the ...
4
votes
2
answers
2k
views
BERT get sentence level embedding after fine tuning
I came across this page
1) I would like to get sentence level embedding (embedding given by [CLS] token) after the fine tuning is done. How could I do it?
2) I also noticed that the code on that ...
4
votes
1
answer
2k
views
Using Arabert model with SpaCy
SpaCy doesn't support the Arabic language, but Can I use SpaCy with the pretrained Arabert model?
Is it possible to modify this code so it can accept bert-large-arabertv02 instead of en_core_web_lg?
!...
4
votes
3
answers
5k
views
How to apply max_length to truncate the token sequence from the left in a HuggingFace tokenizer?
In the HuggingFace tokenizer, applying the max_length argument specifies the length of the tokenized text. I believe it truncates the sequence to max_length-2 (if truncation=True) by cutting the ...
4
votes
1
answer
788
views
pytorch model evaluation slow when deployed on kubernetes
I would like to make the result of a text classification model (finBERT pytorch model) available through an endpoint that is deployed on Kubernetes.
The whole pipeline is working but it's super slow ...
4
votes
1
answer
4k
views
Huggingface TFBertForSequenceClassification always predicts the same label
TL;DR:
My model always predicts the same labels and I don't know why. Below is my entire code for fine-tuning in the hopes that someone can point out to me where I am going wrong.
I am using ...
4
votes
1
answer
4k
views
Unsupervised finetuning of BERT for embeddings only?
I would like to fine-tuning BERT for a specific domain on unlabeled data and get the output layer to check the similarity between them. How can I do it? Do I need to fine-tuning first a classifier ...
4
votes
1
answer
6k
views
BERT outputs explained
The keys of the BERT encoder's output are default, encoder_outputs, pooled_output and sequence_output
As far as I can know, encoder_outputs are the output of each encoder, pooled_output is the output ...
4
votes
1
answer
763
views
Restrict Vocab for BERT Encoder-Decoder Text Generation
Is there any way to restrict the vocabulary of the decoder in a Huggingface BERT encoder-decoder model? I'd like to force the decoder to choose from a small vocabulary when generating text rather than ...
4
votes
1
answer
2k
views
Correct Way to Fine-Tune/Train HuggingFace's Model from scratch (PyTorch)
For example, I want to train a BERT model from scratch but using the existing configuration. Is the following code the correct way to do so?
model = BertModel.from_pretrained('bert-base-cased')
model....
4
votes
1
answer
5k
views
Why do we need state_dict = state_dict.copy()
I want to load the weights of a pre-trained model on my local model. I don’t understand why state_dict = state_dict.copy() is necessary if the two networks have the same name state_dict.
# copy ...
4
votes
2
answers
4k
views
How to convert model.safetensor to pytorch_model.bin?
I'm fine tuning a pre-trained bert model and i have a weird problem:
When i'm fine tuning using the CPU, the code saves the model like this:
With the "pytorch_model.bin". But when i use ...
4
votes
1
answer
2k
views
How to stop data shuffling while training the HuggingFace BERT model?
I want to train a BERT transformer model using the HuggingFace implementation/library. During training, HuggingFace shuffles the training data for each epoch, but I don't want to shuffle the data. For ...
4
votes
2
answers
6k
views
Adding new tokens to BERT/RoBERTa while retaining tokenization of adjacent tokens
I'm trying to add some new tokens to BERT and RoBERTa tokenizers so that I can fine-tune the models on a new word. The idea is to fine-tune the models on a limited set of sentences with the new word, ...
4
votes
2
answers
3k
views
Loading tf.keras model, ValueError: The two structures don't have the same nested structure
I created a tf.keras model that has BERT and I want to train and save it for further use.
Loading this model is a big issue cause I keep getting error: ValueError: The two structures don't have the ...
4
votes
1
answer
3k
views
how to train a bert model from scratch with huggingface?
i find a answer of training model from scratch in this question:
How to train BERT from scratch on a new domain for both MLM and NSP?
one answer use Trainer and TrainingArguments like this:
from ...
4
votes
1
answer
1k
views
How can I apply pruning on a BERT model?
I have trained a BERT model using ktrain (TensorFlow wrapper) to recognize emotion on text. It works, but it suffers from really slow inference. That makes my model not suitable for a production ...
4
votes
1
answer
2k
views
PyTorch tokenizers: how to truncate tokens from left?
As we can see in the below code snippet, specifying max_length and truncation for a tokenizer cuts excess tokens from the left:
tokenizer("hello, my name", truncation=True, max_length=6).input_ids
...
4
votes
1
answer
602
views
Training SVM classifier (word embeddings vs. sentence embeddings)
I want to experiment with different embeddings such Word2Vec, ELMo, and BERT but I'm a little confused about whether to use the word embeddings or sentence embeddings, and why. I'm using the ...
4
votes
1
answer
5k
views
PyTorch GPU memory leak during inference
I am trying to encode documents sentence-wise with a huggingface transformer module. I'm using the very small google/bert_uncased_L-2_H-128_A-2 pretrained model with the following code:
def ...
4
votes
1
answer
3k
views
How to process TransformerEncoderLayer output in pytorch
I am trying to use bio-bert sentence embeddings for text classification of longer pieces of text.
As it currently stands I standardize the number of sentences in each piece of text (some sentences are ...
4
votes
1
answer
3k
views
Tensorflow BERT for token-classification - exclude pad-tokens from accuracy while training and testing
I'm doing token-based classification using the pre-trained BERT-model for tensorflow to automatically label cause and effects in sentences.
To access BERT, I'm using the TFBertForTokenClassification-...
4
votes
1
answer
1k
views
Error: Inferring the task automatically requires to check the hub with a model_id defined as a `str`. AraBERT model
I'm training a transformer model by regular training as described in this notebook to classify the questions with their expected answer class.
After training the model, I want to see the predictions ...
4
votes
1
answer
7k
views
There appear to be 1 leaked semaphore objects to clean up at shutdown
I am using MacOS & used DistilBert model using Sentence Transformer for chatbot implementation and generated the API in VS code.
But after giving 3 inputs it pop’s up this error:
UserWarning: ...
4
votes
2
answers
820
views
Why are models such as BERT or GPT-3 considered unsupervised learning during pre-training when there is an output (label)
I am not very experienced with unsupervised learning, but my general understanding is that in unsupervised learning, the model learns without there being an output. However, during pre-training in ...
4
votes
1
answer
12k
views
HuggingFace Bert Sentiment analysis
I am getting the following error :
AssertionError: text input must of type str (single example), List[str] (batch or single pretokenized example) or List[List[str]] (batch of pretokenized examples)., ...
4
votes
1
answer
2k
views
How to access BERT intermediate layer outputs in TF Hub Module?
Does anybody know a way to access the outputs of the intermediate layers from BERT's hosted models on Tensorflow Hub?
The model is hosted here. I have explored the meta graph and found the only ...
4
votes
0
answers
2k
views
ValueError: Exception encountered when calling layer "tf_bert_for_sequence_classification" (type TFBertForSequenceClassification)
train = df2[:25]
test = df2[25:]
def convert_data_to_examples(train, test, text, Airline_Cat):
train_InputExamples = train.apply(lambda x: InputExample(guid=None,
...
4
votes
1
answer
1k
views
Dutch sentiment analysis RobBERT
I have a question about Dutch sentiment analysis in Python. For a project at school I want to analyse the sentiment of a Dutch interview. I have worked with Vader but that doesn't work in Dutch. So I ...
4
votes
1
answer
4k
views
cannot import name 'TrainingArguments' from 'transformers'
I am trying to fine-tune a pretrained huggingface BERT model. I am importing the following
from transformers import (AutoTokenizer, AutoConfig,
...
4
votes
0
answers
841
views
max_steps and generative dataset huggingface
I am fine tuning a model on my domain using both MLM and NSP. I am using the TextDatasetForNextSentencePrediction for NSP and DataCollatorForLanguageModeling for MLM.
The problem is with ...
4
votes
0
answers
748
views
HuggingFace BertForMaskedLM: Expected input batch_size (3200) to match target batch_size (16)
Im working on a Multiclass Classification (Bengali Language Sentiment Analysis) on a pretrained Huggingface (BertForMaskedLM) model.
When the error occured I knew I have to change the label(output) ...
4
votes
1
answer
2k
views
How to build a dataset for language modeling with the datasets library as with the old TextDataset from the transformers library
I am trying to load a custom dataset that I will then use for language modeling. The dataset consists of a text file that has a whole document in each line, meaning that each line overpasses the ...
4
votes
0
answers
469
views
How to train a Masked Language Model with a big text corpus(200GB) using PyTorch?
Recently I am training a masked language model with a big text corpus(200GB) using transformers. The training data is too big to fit into computer equiped with 512GB memory and V100(32GB)*8. Is it ...
4
votes
0
answers
1k
views
Word embeddings with BERT and map tensors to words
I try to aggregate BERT embeddings on the token level. For each token in the corpus vocabulary, I would like to create a list of all their contextual embeddings and average them to get one ...
4
votes
0
answers
4k
views
PCA on BERT word embeddings
I am trying to take a set of sentences that use multiple meanings of the word "duck", and compute the word embeddings of each "duck" using BERT. Each word embedding is a vector of around 780 elements, ...
4
votes
0
answers
285
views
How to handle text classification model that gives few results with higher confidence to wrong category?
I had a dataset of 15k records. I trained the model using a k-train package and 'bert' model with 5k samples. The train-test split is 70-30% and test results gave me accuracy and f1 scores as 93-94%. ...
4
votes
0
answers
199
views
How to create iob tags for a sentence?
I have a dataset for NER in which I have to do POS tagging and IOB tagging, but I don't understand the concept or method of how iob tags are created. Even CoNLL is pretagged.
3
votes
1
answer
10k
views
Huggingface's BERT tokenizer not adding pad token
It's not entirely clear from the documentation, but I can see that BertTokenizer is initialised with pad_token='[PAD]', so I assume when you encode with add_special_tokens=True then it would ...
3
votes
3
answers
7k
views
Removal of Stop Words and Stemming/Lemmatization for BERTopic
For Topic Modelling, I'm trying out the BERTopic: Link
I'm little confused here, I am trying out the BERTopic on my custom Dataset.
Since BERT was trained in such a way that it holds the semantic ...
3
votes
3
answers
6k
views
Tensorflow 2.X Error - Op type not registered 'CaseFoldUTF8' in binary running on Colab
I have been using BERT encoder from the Tensorflow hub for quite sometime now. Here are the syntaxes:
tfhub_handle_encoder = "https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/4" ...
3
votes
3
answers
1k
views
String comparison with BERT seems to ignore "not" in sentence
I implemented a string comparison method using SentenceTransformers and BERT like following
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
...
3
votes
1
answer
4k
views
Applying LIME interpretation on my fine-tuned BERT for sequence classification model?
I fine tuned BERT For Sequence Classification on task specific, I wand to apply LIME interpretation to see how each token contribute to be classified to specific label as LIME handle the classifier as ...
3
votes
4
answers
18k
views
Cannot import BertModel from transformers
I am trying to import BertModel from transformers, but it fails. This is code I am using
from transformers import BertModel, BertForMaskedLM
This is the error I get
ImportError: cannot import name '...
3
votes
2
answers
7k
views
Having 6 labels instead of 2 in Hugging Face BertForSequenceClassification
I was just wondering if it is possibel to extend the HuggingFace BertForSequenceClassification model to more than 2 labels. The docs say, we can pass positional arguments, but it seems like "labels" ...
3
votes
2
answers
2k
views
Are the pre-trained layers of the Huggingface BERT models frozen?
I use the following classification model from Huggingface:
model = AutoModelForSequenceClassification.from_pretrained("dbmdz/bert-base-german-cased", num_labels=2).to(device)
As I ...
3
votes
2
answers
3k
views
PipelineException: No mask_token ([MASK]) found on the input
I am getting this error "PipelineException: No mask_token ([MASK]) found on the input"
when I run this line.
fill_mask("Auto Car .")
I am running it on Colab.
My Code:
from ...
3
votes
1
answer
6k
views
Tokens returned in transformers Bert model from encode()
I have a small dataset for sentiment analysis. The classifier will be a simple KNN but I wanted to get the word embedding with the Bert model from the transformers library. Note that I just found out ...
3
votes
3
answers
5k
views
what is the difference between pooled output and sequence output in bert layer?
everyone! I was reading about Bert and wanted to do text classification with its word embeddings. I came across this line of code:
pooled_output, sequence_output = self.bert_layer([input_word_ids, ...
3
votes
2
answers
2k
views
Where can I get the pretrained word embeddinngs for BERT?
I know that BERT has total vocabulary size of 30522 which contains some words and subwords. I want to get the initial input embeddings of BERT. So, my requirement is to get the table of size [30522, ...