All Questions

Filter by
Sorted by
Tagged with
43 votes
2 answers
27k views

Why Bert transformer uses [CLS] token for classification instead of average over all tokens?

I am doing experiments on bert architecture and found out that most of the fine-tuning task takes the final hidden layer as text representation and later they pass it to other models for the further ...
Aaditya Ura's user avatar
  • 12.3k
18 votes
1 answer
12k views

BertForSequenceClassification vs. BertForMultipleChoice for sentence multi-class classification

I'm working on a text classification problem (e.g. sentiment analysis), where I need to classify a text string into one of five classes. I just started using the Huggingface Transformer package and ...
stackoverflowuser2010's user avatar
14 votes
1 answer
14k views

PyTorch torch.no_grad() versus requires_grad=False

I'm following a PyTorch tutorial which uses the BERT NLP model (feature extractor) from the Huggingface Transformers library. There are two pieces of interrelated code for gradient updates that I don'...
stackoverflowuser2010's user avatar
8 votes
3 answers
5k views

How to compute mean/max of HuggingFace Transformers BERT token embeddings with attention mask?

I'm using the HuggingFace Transformers BERT model, and I want to compute a summary vector (a.k.a. embedding) over the tokens in a sentence, using either the mean or max function. The complication is ...
stackoverflowuser2010's user avatar
6 votes
2 answers
6k views

Can you train a BERT model from scratch with task specific architecture?

BERT pre-training of the base-model is done by a language modeling approach, where we mask certain percent of tokens in a sentence, and we make the model learn those missing mask. Then, I think in ...
viopu's user avatar
  • 71
6 votes
1 answer
1k views

BERT performing worse than word2vec

I am trying to use BERT for a document ranking problem. My task is pretty straightforward. I have to do a similarity ranking for an input document. The only issue here is that I don’t have labels - so ...
user3741951's user avatar
5 votes
3 answers
6k views

AttributeError: 'str' object has no attribute 'dim' in pytorch

I got the following error output in the PyTorch when sent model predictions into the model. Does anyone know what's going on? Following are the architecture model that I created, in the error output, ...
Bei Zhao's user avatar
5 votes
1 answer
2k views

Does BertForSequenceClassification classify on the CLS vector?

I'm using the Huggingface Transformer package and BERT with PyTorch. I'm trying to do 4-way sentiment classification and am using BertForSequenceClassification to build a model that eventually leads ...
stackoverflowuser2010's user avatar
5 votes
2 answers
1k views

Loss function for comparing two vectors for categorization

I am performing a NLP task where I analyze a document and classify it into one of six categories. However, I do this operation at three different time periods. So the final output is an array of three ...
Jameson's user avatar
  • 4,248
4 votes
2 answers
6k views

How to increase dimension-vector size of BERT sentence-transformers embedding

I am using sentence-transformers for semantic search but sometimes it does not understand the contextual meaning and returns wrong result eg. BERT problem with context/semantic search in italian ...
Juned Ansari's user avatar
  • 5,195
4 votes
2 answers
4k views

How to convert model.safetensor to pytorch_model.bin?

I'm fine tuning a pre-trained bert model and i have a weird problem: When i'm fine tuning using the CPU, the code saves the model like this: With the "pytorch_model.bin". But when i use ...
Gabriel Henrique's user avatar
4 votes
2 answers
820 views

Why are models such as BERT or GPT-3 considered unsupervised learning during pre-training when there is an output (label)

I am not very experienced with unsupervised learning, but my general understanding is that in unsupervised learning, the model learns without there being an output. However, during pre-training in ...
danielkim9's user avatar
4 votes
0 answers
285 views

How to handle text classification model that gives few results with higher confidence to wrong category?

I had a dataset of 15k records. I trained the model using a k-train package and 'bert' model with 5k samples. The train-test split is 70-30% and test results gave me accuracy and f1 scores as 93-94%. ...
Giri Sai Ram's user avatar
3 votes
1 answer
6k views

Tokens returned in transformers Bert model from encode()

I have a small dataset for sentiment analysis. The classifier will be a simple KNN but I wanted to get the word embedding with the Bert model from the transformers library. Note that I just found out ...
Edv Beq's user avatar
  • 936
3 votes
1 answer
2k views

Using Sentence-Bert with other features in scikit-learn

I have a dataset, one feature is text and 4 more features. Sentence-Bert vectorizer transforms text data into tensors. I can use these sparse matrices directly with a machine learning classifier. Can ...
Narges Se's user avatar
3 votes
1 answer
644 views

InternalError when using TPU for training Keras model

I am attempting to fine-tune a BERT model on Google Colab from the Tensorflow Hub using this link. However, I run into the following error: InternalError: RET_CHECK failure (third_party/tensorflow/...
a_002311's user avatar
3 votes
1 answer
4k views

Running BERT on CPU instead of GPU

I am trying to execute BERT's run_clasifier.py script using terminal as below: python run_classifier.py --task_name=cola --do_predict=true --data_dir=<data-dir> --vocab_file=$BERT_BASE_DIR/...
Ashwin Geet D'Sa's user avatar
3 votes
0 answers
708 views

I'm trying to load BERT "tfbert-large-uncased" but i got an error "Can't load config.json file"

I'm trying to load the pre-train BERT model but I'm getting an error while loading tokenized it says config.json is not found. If anyone knows how to solve these issues please help me Model and path ...
iamhimanshu0's user avatar
3 votes
0 answers
710 views

Google BERT and antonym detection

I recently learned about the following phenomenon: Google BERT word embeddings of well-known state-of-the-art models seem to ignore the measure of semantical contrast between antonyms in terms of the ...
Moshe's user avatar
  • 555
3 votes
0 answers
3k views

BERT model classification with many classes

I want to train a BERT model to perform a multiclass text classification. I use transformers and followed this tutorial (https://towardsdatascience.com/multi-class-text-classification-with-deep-...
Zopui's user avatar
  • 41
3 votes
0 answers
1k views

How to update vocabulary of pre-trained bert model while doing my own training task?

I am now working on a task of predicting masked word using BERT model. Unlike others, the answer needs to be chosen from specific options. For instance: sentence: "In my daily [MASKED], ..." options:...
COrra's user avatar
  • 31
3 votes
3 answers
4k views

How to save a tokenizer after training it?

I have just followed this tutorial on how to train my own tokenizer. Now, from training my tokenizer, I have wrapped it inside a Transformers object, so that I can use it with the transformers library:...
user avatar
2 votes
1 answer
3k views

How to use BERT and Elmo embedding with sklearn

I created a text classifier that uses Tf-Idf using sklearn, and I want to use BERT and Elmo embedding instead of Tf-Idf. How would one do that ? I'm getting Bert embedding using the code below: from ...
Juned Ansari's user avatar
  • 5,195
2 votes
3 answers
2k views

BERT Multi-class Sentiment Analysis got low accuracy?

I am working on a small data set which: Contains 1500 pieces of news articles. All of these articles were ranked by human beings with regard to their sentiment/degree of positive on a 5-point scale. ...
Xu Wang's user avatar
  • 21
2 votes
1 answer
851 views

Summarization-Text rank algorithm

What are the advantages of using text rank algorithm for summarization over BERT summarization? Even though both can be used as extractive summarization method, is there any particular advantage for ...
Asha's user avatar
  • 77
2 votes
1 answer
570 views

reporting other metrics during training evaluation simpletransformers

I am training a text classification model over a large set of data and I am using bert classifier (bert-base-uncased) of simpletransformer library. Simpletransformer retports by default mcc and ...
Firouziam's user avatar
  • 787
2 votes
1 answer
82 views

RuntimeError when trying to extract text features from a BERT model then using KNN for classification

I'm trying to use camembert model to just to extract text features. After that, I'm trying to use a KNN classifier to classify the feature vectors as inputs. This is the code I wrote import torch from ...
Wajih101's user avatar
2 votes
1 answer
306 views

Trying to train model for Intent Recognition but getting float error

I'm trying to train the model for intent recognition. I tried removing all special characters and stop words but unable to resolve this error. I tried removing integers also but it's throwing an error....
user avatar
2 votes
1 answer
123 views

Why is a throw-away column required in Bert format?

I have recently come across Bert(Bidirectional Encoder Representations from Transformers). I saw that Bert requires a strict format for the train data. The third column needed is described as follows: ...
anegru's user avatar
  • 1,093
2 votes
0 answers
258 views

How to get the mask average for multi-token masking?

Following this paper, I'm trying to implement how they calculated the average of the log probabilities for each entity (Section 3.3). More specifically, the score for each entity is calculated as the ...
Penguin's user avatar
  • 2,148
2 votes
0 answers
413 views

How do I retrain BERT model with new data

I have already trained a bert model and saved it in the .pb format and I want to retrain the model with new datasets that i custom made, so in order to not to lose the previous training and such, how ...
Abdur Rahman's user avatar
2 votes
0 answers
607 views

I am getting OOM while running PRE TRAINED Bert Model with new dataset with 20k

I have pre trained model with Accuracy of 96 with 2 epochs and I am trying to use that model on new dataset of 20k tweets for sentiment analysis. while doing that I am getting below error. I haven't ...
RAMA KRISHNA's user avatar
2 votes
1 answer
526 views

max_length doesn't fix the question-answering model

My Question: How to make my 'question-answering' model run, given a big (>512b) .txt file? Context: I am creating a question answering model with the word embedding model BERT from google. The ...
Liza Darwesh's user avatar
2 votes
1 answer
350 views

Bert model show up InvalidArgumentError Condition x <= y did not hold element wise

i am training a Bert. Can anyone shed light on the meaning of the following error message? Condition x == y did not hold element wise Here is Reference colab notebook And my code: !pip install bert-...
Mao's user avatar
  • 21
2 votes
0 answers
42 views

Trying to simplify BERT architecture

I have an interesting question about BERT. Can I simplify the architecture of the model by saying that the similarity of two words in different context will depend on the similarity of input ...
PaulMil's user avatar
  • 21
1 vote
2 answers
203 views

extracting names and associated labels from text with language model

I am trying to extract information from scientific literature on microalgae and i need to be able to scan a text for various names and find their corresponding category. As an simple example, say I ...
user2737728's user avatar
1 vote
1 answer
1k views

BertModel and BertForMaskedLM weights count

I want understand BertForMaskedLM model, in huggingface github code, BertForMaskedLM is bert model with additional 2 linear layers with shape (input 768, output 768) and (input 768, output 30522). ...
Manvel Hayrapetyan's user avatar
1 vote
2 answers
7k views

(with cpu)Pytorch: IndexError: index out of range in self. (with cuda)Assertion `srcIndex < srcSelectDimSize` failed. How to solve?

Today I get the following error when I use BERT with Pytorch and cuda: /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [234,0,0], thread: [0,0,0] Assertion srcIndex &...
Haorui He's user avatar
1 vote
1 answer
602 views

Fine-tuning distilbert takes hours

I am fine tuning the distilbert pretrained model for sentiment analysis (multilabel with 6 labels) using Huggingface emotion dataset. I am new to this, but 1 epoch, 250 steps takes around 2 hours to ...
dense8's user avatar
  • 628
1 vote
1 answer
478 views

BERT problem with context/semantic search in italian language

I am using BERT model for context search in Italian language but it does not understand the contextual meaning of the sentence and returns wrong result. in below example code when I compare "milk ...
Juned Ansari's user avatar
  • 5,195
1 vote
1 answer
1k views

Calculating Probability of a Classification Model Prediction

I have a classification task. The training data has 50 different labels. The customer wants to differentiate the low probability predictions, meaning that, I have to classify some test data as ...
iso_9001_'s user avatar
  • 2,749
1 vote
1 answer
2k views

BERT tokenize URLs

I want to classify a bunch of tweets and therefore I'm using the huggingface implementation of BERT. However I noticed that the deafult BertTokenizer does not use special tokens for urls. >>> ...
random314's user avatar
1 vote
1 answer
1k views

HuggingFace transformer evaluation process is too slow

I used the HuggingFace transformers library to train a BERT model for sequence classification. The training process is good on GPU, but the evaluation process(which is running GPU) is too slow. For ...
Mohsen Mahmoodzadeh's user avatar
1 vote
1 answer
1k views

Is splitting a long document of a dataset for BERT considered bad practice?

I am fine-tuning a BERT model on a labeled dataset with many documents longer than the 512 token limit set by the tokenizer. Since truncating would lose a lot of data I would rather use, I started ...
marxlaml's user avatar
  • 341
1 vote
1 answer
2k views

TypeError: Expected `trainable` argument to be a boolean, but got: bert

I got this error when implementing my model. I think the erros come from the bert model which i have imported. def create_text_encoder( num_projection_layers, projection_dims, dropout_rate, ...
albert's user avatar
  • 168
1 vote
1 answer
356 views

what is the max limit of entities in a custom NER model

what is the maximum limit of entities we can have in a spacy or bert based custom NER models ? I have seen examples over the web which have been trained to a max of 10 custom entities per model and ...
GlobalLearner's user avatar
1 vote
2 answers
9k views

Tensorflow: Compute Precision, Recall, F1 Score

i built a BERT Model (Bert-base-multilingual-cased) from Huggingface and want to evaluate the Model with its Precision, Recall and F1-score next to accuracy, as accurays isn't always the best metrics ...
Maxl Gemeinderat's user avatar
1 vote
1 answer
312 views

How to create a language model with 2 different heads in huggingface?

I know I can create a language model with 1 head: from transformers import AutoModelForMultipleChoice model = AutoModelForMultipleChoice.from_pretrained("distilbert-base-cased").to(device) ...
Penguin's user avatar
  • 2,148
1 vote
1 answer
2k views

How is get predict accuracy score in Bert Classification

I am using Bert Classifier for my Chatbot project. I perform the necessary tokenizer operations for the incoming text message. Then I insert it into the model and make a prediction. How can I get the ...
Erdem Eminağa's user avatar
1 vote
1 answer
2k views

How to store a .tar.gz formatted model to AWS SageMaker and use it as a deployed model?

I have a pre-trained BERT model which was trained on Google Cloud Platform, and the model is stored in a .tar.gz formatted file, I wanted to deploy this model to SageMaker and also be able to trigger ...
wawawa's user avatar
  • 3,115