All Questions
Tagged with bert-language-model machine-learning
148
questions
43
votes
2
answers
27k
views
Why Bert transformer uses [CLS] token for classification instead of average over all tokens?
I am doing experiments on bert architecture and found out that most of the fine-tuning task takes the final hidden layer as text representation and later they pass it to other models for the further ...
18
votes
1
answer
12k
views
BertForSequenceClassification vs. BertForMultipleChoice for sentence multi-class classification
I'm working on a text classification problem (e.g. sentiment analysis), where I need to classify a text string into one of five classes.
I just started using the Huggingface Transformer package and ...
14
votes
1
answer
14k
views
PyTorch torch.no_grad() versus requires_grad=False
I'm following a PyTorch tutorial which uses the BERT NLP model (feature extractor) from the Huggingface Transformers library. There are two pieces of interrelated code for gradient updates that I don'...
8
votes
3
answers
5k
views
How to compute mean/max of HuggingFace Transformers BERT token embeddings with attention mask?
I'm using the HuggingFace Transformers BERT model, and I want to compute a summary vector (a.k.a. embedding) over the tokens in a sentence, using either the mean or max function. The complication is ...
6
votes
2
answers
6k
views
Can you train a BERT model from scratch with task specific architecture?
BERT pre-training of the base-model is done by a language modeling approach, where we mask certain percent of tokens in a sentence, and we make the model learn those missing mask. Then, I think in ...
6
votes
1
answer
1k
views
BERT performing worse than word2vec
I am trying to use BERT for a document ranking problem. My task is pretty straightforward. I have to do a similarity ranking for an input document. The only issue here is that I don’t have labels - so ...
5
votes
3
answers
6k
views
AttributeError: 'str' object has no attribute 'dim' in pytorch
I got the following error output in the PyTorch when sent model predictions into the model. Does anyone know what's going on?
Following are the architecture model that I created, in the error output, ...
5
votes
1
answer
2k
views
Does BertForSequenceClassification classify on the CLS vector?
I'm using the Huggingface Transformer package and BERT with PyTorch. I'm trying to do 4-way sentiment classification and am using BertForSequenceClassification to build a model that eventually leads ...
5
votes
2
answers
1k
views
Loss function for comparing two vectors for categorization
I am performing a NLP task where I analyze a document and classify it into one of six categories. However, I do this operation at three different time periods. So the final output is an array of three ...
4
votes
2
answers
6k
views
How to increase dimension-vector size of BERT sentence-transformers embedding
I am using sentence-transformers for semantic search but sometimes it does not understand the contextual meaning and returns wrong result
eg. BERT problem with context/semantic search in italian ...
4
votes
2
answers
4k
views
How to convert model.safetensor to pytorch_model.bin?
I'm fine tuning a pre-trained bert model and i have a weird problem:
When i'm fine tuning using the CPU, the code saves the model like this:
With the "pytorch_model.bin". But when i use ...
4
votes
2
answers
820
views
Why are models such as BERT or GPT-3 considered unsupervised learning during pre-training when there is an output (label)
I am not very experienced with unsupervised learning, but my general understanding is that in unsupervised learning, the model learns without there being an output. However, during pre-training in ...
4
votes
0
answers
285
views
How to handle text classification model that gives few results with higher confidence to wrong category?
I had a dataset of 15k records. I trained the model using a k-train package and 'bert' model with 5k samples. The train-test split is 70-30% and test results gave me accuracy and f1 scores as 93-94%. ...
3
votes
1
answer
6k
views
Tokens returned in transformers Bert model from encode()
I have a small dataset for sentiment analysis. The classifier will be a simple KNN but I wanted to get the word embedding with the Bert model from the transformers library. Note that I just found out ...
3
votes
1
answer
2k
views
Using Sentence-Bert with other features in scikit-learn
I have a dataset, one feature is text and 4 more features. Sentence-Bert vectorizer transforms text data into tensors. I can use these sparse matrices directly with a machine learning classifier. Can ...
3
votes
1
answer
644
views
InternalError when using TPU for training Keras model
I am attempting to fine-tune a BERT model on Google Colab from the Tensorflow Hub using this link.
However, I run into the following error:
InternalError: RET_CHECK failure (third_party/tensorflow/...
3
votes
1
answer
4k
views
Running BERT on CPU instead of GPU
I am trying to execute BERT's run_clasifier.py script using terminal as below:
python run_classifier.py --task_name=cola --do_predict=true --data_dir=<data-dir> --vocab_file=$BERT_BASE_DIR/...
3
votes
0
answers
708
views
I'm trying to load BERT "tfbert-large-uncased" but i got an error "Can't load config.json file"
I'm trying to load the pre-train BERT model but I'm getting an error while loading tokenized it says config.json is not found.
If anyone knows how to solve these issues please help me
Model and path ...
3
votes
0
answers
710
views
Google BERT and antonym detection
I recently learned about the following phenomenon: Google BERT word embeddings of well-known state-of-the-art models seem to ignore the measure of semantical contrast between antonyms in terms of the ...
3
votes
0
answers
3k
views
BERT model classification with many classes
I want to train a BERT model to perform a multiclass text classification. I use transformers and followed this tutorial (https://towardsdatascience.com/multi-class-text-classification-with-deep-...
3
votes
0
answers
1k
views
How to update vocabulary of pre-trained bert model while doing my own training task?
I am now working on a task of predicting masked word using BERT model. Unlike others, the answer needs to be chosen from specific options.
For instance:
sentence: "In my daily [MASKED], ..."
options:...
3
votes
3
answers
4k
views
How to save a tokenizer after training it?
I have just followed this tutorial on how to train my own tokenizer.
Now, from training my tokenizer, I have wrapped it inside a Transformers object, so that I can use it with the transformers library:...
2
votes
1
answer
3k
views
How to use BERT and Elmo embedding with sklearn
I created a text classifier that uses Tf-Idf using sklearn, and I want to use BERT and Elmo embedding instead of Tf-Idf.
How would one do that ?
I'm getting Bert embedding using the code below:
from ...
2
votes
3
answers
2k
views
BERT Multi-class Sentiment Analysis got low accuracy?
I am working on a small data set which:
Contains 1500 pieces of news articles.
All of these articles were ranked by human beings with regard to their sentiment/degree of positive on a 5-point scale.
...
2
votes
1
answer
851
views
Summarization-Text rank algorithm
What are the advantages of using text rank algorithm for summarization over BERT summarization?
Even though both can be used as extractive summarization method, is there any particular advantage for ...
2
votes
1
answer
570
views
reporting other metrics during training evaluation simpletransformers
I am training a text classification model over a large set of data and I am using bert classifier (bert-base-uncased) of simpletransformer library. Simpletransformer retports by default mcc and ...
2
votes
1
answer
82
views
RuntimeError when trying to extract text features from a BERT model then using KNN for classification
I'm trying to use camembert model to just to extract text features. After that, I'm trying to use a KNN classifier to classify the feature vectors as inputs.
This is the code I wrote
import torch
from ...
2
votes
1
answer
306
views
Trying to train model for Intent Recognition but getting float error
I'm trying to train the model for intent recognition. I tried removing all special characters and stop words but unable to resolve this error. I tried removing integers also but it's throwing an error....
2
votes
1
answer
123
views
Why is a throw-away column required in Bert format?
I have recently come across Bert(Bidirectional Encoder Representations from Transformers). I saw that Bert requires a strict format for the train data. The third column needed is described as follows:
...
2
votes
0
answers
258
views
How to get the mask average for multi-token masking?
Following this paper, I'm trying to implement how they calculated the average of the log probabilities for each entity (Section 3.3). More specifically, the score for each entity is calculated as the ...
2
votes
0
answers
413
views
How do I retrain BERT model with new data
I have already trained a bert model and saved it in the .pb format and I want to retrain the model with new datasets that i custom made, so in order to not to lose the previous training and such, how ...
2
votes
0
answers
607
views
I am getting OOM while running PRE TRAINED Bert Model with new dataset with 20k
I have pre trained model with Accuracy of 96 with 2 epochs and I am trying to use that model on new dataset of 20k tweets for sentiment analysis. while doing that I am getting below error.
I haven't ...
2
votes
1
answer
526
views
max_length doesn't fix the question-answering model
My Question:
How to make my 'question-answering' model run, given a big (>512b) .txt file?
Context:
I am creating a question answering model with the word embedding model BERT from google. The ...
2
votes
1
answer
350
views
Bert model show up InvalidArgumentError Condition x <= y did not hold element wise
i am training a Bert.
Can anyone shed light on the meaning of the following error message?
Condition x == y did not hold element wise
Here is Reference colab notebook
And my code:
!pip install bert-...
2
votes
0
answers
42
views
Trying to simplify BERT architecture
I have an interesting question about BERT.
Can I simplify the architecture of the model by saying that the similarity of two words in different context will depend on the similarity of input ...
1
vote
2
answers
203
views
extracting names and associated labels from text with language model
I am trying to extract information from scientific literature on microalgae and i need to be able to scan a text for various names and find their corresponding category.
As an simple example, say I ...
1
vote
1
answer
1k
views
BertModel and BertForMaskedLM weights count
I want understand BertForMaskedLM model, in huggingface github code, BertForMaskedLM is bert model with additional 2 linear layers with shape (input 768, output 768) and (input 768, output 30522). ...
1
vote
2
answers
7k
views
(with cpu)Pytorch: IndexError: index out of range in self. (with cuda)Assertion `srcIndex < srcSelectDimSize` failed. How to solve?
Today I get the following error when I use BERT with Pytorch and cuda: /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [234,0,0], thread: [0,0,0] Assertion srcIndex &...
1
vote
1
answer
602
views
Fine-tuning distilbert takes hours
I am fine tuning the distilbert pretrained model for sentiment analysis (multilabel with 6 labels) using Huggingface emotion dataset. I am new to this, but 1 epoch, 250 steps takes around 2 hours to ...
1
vote
1
answer
478
views
BERT problem with context/semantic search in italian language
I am using BERT model for context search in Italian language but it does not understand the contextual meaning of the sentence and returns wrong result.
in below example code when I compare "milk ...
1
vote
1
answer
1k
views
Calculating Probability of a Classification Model Prediction
I have a classification task. The training data has 50 different labels. The customer wants to differentiate the low probability predictions, meaning that, I have to classify some test data as ...
1
vote
1
answer
2k
views
BERT tokenize URLs
I want to classify a bunch of tweets and therefore I'm using the huggingface implementation of BERT. However I noticed that the deafult BertTokenizer does not use special tokens for urls.
>>> ...
1
vote
1
answer
1k
views
HuggingFace transformer evaluation process is too slow
I used the HuggingFace transformers library to train a BERT model for sequence classification.
The training process is good on GPU, but the evaluation process(which is running GPU) is too slow. For ...
1
vote
1
answer
1k
views
Is splitting a long document of a dataset for BERT considered bad practice?
I am fine-tuning a BERT model on a labeled dataset with many documents longer than the 512 token limit set by the tokenizer.
Since truncating would lose a lot of data I would rather use, I started ...
1
vote
1
answer
2k
views
TypeError: Expected `trainable` argument to be a boolean, but got: bert
I got this error when implementing my model. I think the erros come from the bert model which i have imported.
def create_text_encoder(
num_projection_layers, projection_dims, dropout_rate, ...
1
vote
1
answer
356
views
what is the max limit of entities in a custom NER model
what is the maximum limit of entities we can have in a spacy or bert based custom NER models ? I have seen examples over the web which have been trained to a max of 10 custom entities per model and ...
1
vote
2
answers
9k
views
Tensorflow: Compute Precision, Recall, F1 Score
i built a BERT Model (Bert-base-multilingual-cased) from Huggingface and want to evaluate the Model with its Precision, Recall and F1-score next to accuracy, as accurays isn't always the best metrics ...
1
vote
1
answer
312
views
How to create a language model with 2 different heads in huggingface?
I know I can create a language model with 1 head:
from transformers import AutoModelForMultipleChoice
model = AutoModelForMultipleChoice.from_pretrained("distilbert-base-cased").to(device)
...
1
vote
1
answer
2k
views
How is get predict accuracy score in Bert Classification
I am using Bert Classifier for my Chatbot project. I perform the necessary tokenizer operations for the incoming text message. Then I insert it into the model and make a prediction. How can I get the ...
1
vote
1
answer
2k
views
How to store a .tar.gz formatted model to AWS SageMaker and use it as a deployed model?
I have a pre-trained BERT model which was trained on Google Cloud Platform, and the model is stored in a .tar.gz formatted file, I wanted to deploy this model to SageMaker and also be able to trigger ...