Highest scored 'bert-language-model+machine-learning' questions

43 votes

2 answers

27k views

Why Bert transformer uses [CLS] token for classification instead of average over all tokens?

I am doing experiments on bert architecture and found out that most of the fine-tuning task takes the final hidden layer as text representation and later they pass it to other models for the further ...

Aaditya Ura

12.3k

asked Jul 2, 2020 at 21:25

18 votes

1 answer

12k views

BertForSequenceClassification vs. BertForMultipleChoice for sentence multi-class classification

I'm working on a text classification problem (e.g. sentiment analysis), where I need to classify a text string into one of five classes. I just started using the Huggingface Transformer package and ...

stackoverflowuser2010

39.8k

asked Mar 10, 2020 at 1:02

14 votes

1 answer

14k views

PyTorch torch.no_grad() versus requires_grad=False

I'm following a PyTorch tutorial which uses the BERT NLP model (feature extractor) from the Huggingface Transformers library. There are two pieces of interrelated code for gradient updates that I don'...

stackoverflowuser2010

39.8k

asked Sep 7, 2020 at 23:23

8 votes

3 answers

5k views

How to compute mean/max of HuggingFace Transformers BERT token embeddings with attention mask?

I'm using the HuggingFace Transformers BERT model, and I want to compute a summary vector (a.k.a. embedding) over the tokens in a sentence, using either the mean or max function. The complication is ...

stackoverflowuser2010

39.8k

asked Dec 1, 2020 at 1:38

6 votes

2 answers

6k views

Can you train a BERT model from scratch with task specific architecture?

BERT pre-training of the base-model is done by a language modeling approach, where we mask certain percent of tokens in a sentence, and we make the model learn those missing mask. Then, I think in ...

viopu

71

asked May 15, 2020 at 19:21

6 votes

1 answer

1k views

BERT performing worse than word2vec

I am trying to use BERT for a document ranking problem. My task is pretty straightforward. I have to do a similarity ranking for an input document. The only issue here is that I don’t have labels - so ...

user3741951

199

asked Apr 21, 2019 at 21:30

5 votes

3 answers

6k views

AttributeError: 'str' object has no attribute 'dim' in pytorch

I got the following error output in the PyTorch when sent model predictions into the model. Does anyone know what's going on? Following are the architecture model that I created, in the error output, ...

Bei Zhao

71

asked Nov 30, 2020 at 18:41

5 votes

1 answer

2k views

Does BertForSequenceClassification classify on the CLS vector?

I'm using the Huggingface Transformer package and BERT with PyTorch. I'm trying to do 4-way sentiment classification and am using BertForSequenceClassification to build a model that eventually leads ...

stackoverflowuser2010

39.8k

asked Mar 26, 2020 at 21:27

5 votes

2 answers

1k views

Loss function for comparing two vectors for categorization

I am performing a NLP task where I analyze a document and classify it into one of six categories. However, I do this operation at three different time periods. So the final output is an array of three ...

Jameson

4,248

asked Apr 30, 2021 at 17:00

4 votes

2 answers

6k views

How to increase dimension-vector size of BERT sentence-transformers embedding

I am using sentence-transformers for semantic search but sometimes it does not understand the contextual meaning and returns wrong result eg. BERT problem with context/semantic search in italian ...

Juned Ansari

5,195

asked Aug 6, 2021 at 18:45

4 votes

2 answers

4k views

How to convert model.safetensor to pytorch_model.bin?

I'm fine tuning a pre-trained bert model and i have a weird problem: When i'm fine tuning using the CPU, the code saves the model like this: With the "pytorch_model.bin". But when i use ...

Gabriel Henrique

53

asked Dec 23, 2023 at 20:43

4 votes

2 answers

820 views

Why are models such as BERT or GPT-3 considered unsupervised learning during pre-training when there is an output (label)

I am not very experienced with unsupervised learning, but my general understanding is that in unsupervised learning, the model learns without there being an output. However, during pre-training in ...

danielkim9

41

asked Feb 17, 2022 at 9:04

4 votes

0 answers

285 views

How to handle text classification model that gives few results with higher confidence to wrong category?

I had a dataset of 15k records. I trained the model using a k-train package and 'bert' model with 5k samples. The train-test split is 70-30% and test results gave me accuracy and f1 scores as 93-94%. ...

Giri Sai Ram

41

asked May 12, 2020 at 13:44

3 votes

1 answer

6k views

Tokens returned in transformers Bert model from encode()

I have a small dataset for sentiment analysis. The classifier will be a simple KNN but I wanted to get the word embedding with the Bert model from the transformers library. Note that I just found out ...

Edv Beq

936

asked Aug 27, 2020 at 1:38

3 votes

1 answer

2k views

Using Sentence-Bert with other features in scikit-learn

I have a dataset, one feature is text and 4 more features. Sentence-Bert vectorizer transforms text data into tensors. I can use these sparse matrices directly with a machine learning classifier. Can ...

Narges Se

45

asked Oct 15, 2021 at 13:03

3 votes

1 answer

644 views

InternalError when using TPU for training Keras model

I am attempting to fine-tune a BERT model on Google Colab from the Tensorflow Hub using this link. However, I run into the following error: InternalError: RET_CHECK failure (third_party/tensorflow/...

a_002311

43

asked Dec 25, 2021 at 10:11

3 votes

1 answer

4k views

Running BERT on CPU instead of GPU

I am trying to execute BERT's run_clasifier.py script using terminal as below: python run_classifier.py --task_name=cola --do_predict=true --data_dir=<data-dir> --vocab_file=$BERT_BASE_DIR/...

Ashwin Geet D'Sa

6,934

asked Jun 17, 2019 at 8:42

3 votes

0 answers

708 views

I'm trying to load BERT "tfbert-large-uncased" but i got an error "Can't load config.json file"

I'm trying to load the pre-train BERT model but I'm getting an error while loading tokenized it says config.json is not found. If anyone knows how to solve these issues please help me Model and path ...

iamhimanshu0

379

asked May 20, 2021 at 16:10

3 votes

0 answers

710 views

Google BERT and antonym detection

I recently learned about the following phenomenon: Google BERT word embeddings of well-known state-of-the-art models seem to ignore the measure of semantical contrast between antonyms in terms of the ...

Moshe

555

asked Nov 8, 2020 at 13:11

3 votes

0 answers

3k views

BERT model classification with many classes

I want to train a BERT model to perform a multiclass text classification. I use transformers and followed this tutorial (https://towardsdatascience.com/multi-class-text-classification-with-deep-...

Zopui

41

asked Oct 9, 2020 at 10:36

3 votes

0 answers

1k views

How to update vocabulary of pre-trained bert model while doing my own training task?

I am now working on a task of predicting masked word using BERT model. Unlike others, the answer needs to be chosen from specific options. For instance: sentence: "In my daily [MASKED], ..." options:...

COrra

31

asked Feb 17, 2020 at 13:42

3 votes

3 answers

4k views

How to save a tokenizer after training it?

I have just followed this tutorial on how to train my own tokenizer. Now, from training my tokenizer, I have wrapped it inside a Transformers object, so that I can use it with the transformers library:...

user16098918

asked Aug 12, 2021 at 16:33

2 votes

1 answer

3k views

How to use BERT and Elmo embedding with sklearn

I created a text classifier that uses Tf-Idf using sklearn, and I want to use BERT and Elmo embedding instead of Tf-Idf. How would one do that ? I'm getting Bert embedding using the code below: from ...

Juned Ansari

5,195

asked Apr 15, 2021 at 9:39

2 votes

3 answers

2k views

BERT Multi-class Sentiment Analysis got low accuracy?

I am working on a small data set which: Contains 1500 pieces of news articles. All of these articles were ranked by human beings with regard to their sentiment/degree of positive on a 5-point scale. ...

Xu Wang

21

asked Jul 21, 2020 at 0:02

2 votes

1 answer

851 views

Summarization-Text rank algorithm

What are the advantages of using text rank algorithm for summarization over BERT summarization? Even though both can be used as extractive summarization method, is there any particular advantage for ...

Asha

77

asked Jul 4, 2020 at 16:15

2 votes

1 answer

570 views

reporting other metrics during training evaluation simpletransformers

I am training a text classification model over a large set of data and I am using bert classifier (bert-base-uncased) of simpletransformer library. Simpletransformer retports by default mcc and ...

Firouziam

787

asked Nov 16, 2021 at 21:57

2 votes

1 answer

82 views

RuntimeError when trying to extract text features from a BERT model then using KNN for classification

I'm trying to use camembert model to just to extract text features. After that, I'm trying to use a KNN classifier to classify the feature vectors as inputs. This is the code I wrote import torch from ...

Wajih101

11

asked Jul 31, 2023 at 9:02

2 votes

1 answer

306 views

Trying to train model for Intent Recognition but getting float error

I'm trying to train the model for intent recognition. I tried removing all special characters and stop words but unable to resolve this error. I tried removing integers also but it's throwing an error....

user13510399

asked Dec 15, 2020 at 14:55

2 votes

1 answer

123 views

Why is a throw-away column required in Bert format?

I have recently come across Bert(Bidirectional Encoder Representations from Transformers). I saw that Bert requires a strict format for the train data. The third column needed is described as follows: ...

anegru

1,093

asked Apr 29, 2019 at 20:49

2 votes

0 answers

258 views

How to get the mask average for multi-token masking?

Following this paper, I'm trying to implement how they calculated the average of the log probabilities for each entity (Section 3.3). More specifically, the score for each entity is calculated as the ...

Penguin

2,148

asked Sep 10, 2023 at 20:44

2 votes

0 answers

413 views

How do I retrain BERT model with new data

I have already trained a bert model and saved it in the .pb format and I want to retrain the model with new datasets that i custom made, so in order to not to lose the previous training and such, how ...

Abdur Rahman

21

asked Apr 28, 2022 at 8:31

2 votes

0 answers

607 views

I am getting OOM while running PRE TRAINED Bert Model with new dataset with 20k

I have pre trained model with Accuracy of 96 with 2 epochs and I am trying to use that model on new dataset of 20k tweets for sentiment analysis. while doing that I am getting below error. I haven't ...

RAMA KRISHNA

51

asked Mar 12, 2021 at 17:24

2 votes

1 answer

526 views

max_length doesn't fix the question-answering model

My Question: How to make my 'question-answering' model run, given a big (>512b) .txt file? Context: I am creating a question answering model with the word embedding model BERT from google. The ...

Liza Darwesh

421

asked Dec 19, 2020 at 13:12

2 votes

1 answer

350 views

Bert model show up InvalidArgumentError Condition x <= y did not hold element wise

i am training a Bert. Can anyone shed light on the meaning of the following error message? Condition x == y did not hold element wise Here is Reference colab notebook And my code: !pip install bert-...

Mao

21

asked Dec 1, 2020 at 6:58

2 votes

0 answers

42 views

Trying to simplify BERT architecture

I have an interesting question about BERT. Can I simplify the architecture of the model by saying that the similarity of two words in different context will depend on the similarity of input ...

PaulMil

21

asked Jan 19, 2020 at 18:17

1 vote

2 answers

203 views

extracting names and associated labels from text with language model

I am trying to extract information from scientific literature on microalgae and i need to be able to scan a text for various names and find their corresponding category. As an simple example, say I ...

user2737728

25

asked Nov 7, 2023 at 13:34

1 vote

1 answer

1k views

BertModel and BertForMaskedLM weights count

I want understand BertForMaskedLM model, in huggingface github code, BertForMaskedLM is bert model with additional 2 linear layers with shape (input 768, output 768) and (input 768, output 30522). ...

Manvel Hayrapetyan

33

asked Dec 8, 2021 at 14:02

1 vote

2 answers

7k views

(with cpu)Pytorch: IndexError: index out of range in self. (with cuda)Assertion `srcIndex < srcSelectDimSize` failed. How to solve?

Today I get the following error when I use BERT with Pytorch and cuda: /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [234,0,0], thread: [0,0,0] Assertion srcIndex &...

Haorui He

93

asked Oct 16, 2021 at 14:28

1 vote

1 answer

602 views

Fine-tuning distilbert takes hours

I am fine tuning the distilbert pretrained model for sentiment analysis (multilabel with 6 labels) using Huggingface emotion dataset. I am new to this, but 1 epoch, 250 steps takes around 2 hours to ...

dense8

628

asked Dec 19, 2022 at 22:31

1 vote

1 answer

478 views

BERT problem with context/semantic search in italian language

I am using BERT model for context search in Italian language but it does not understand the contextual meaning of the sentence and returns wrong result. in below example code when I compare "milk ...

Juned Ansari

5,195

asked Aug 2, 2021 at 19:34

1 vote

1 answer

1k views

Calculating Probability of a Classification Model Prediction

I have a classification task. The training data has 50 different labels. The customer wants to differentiate the low probability predictions, meaning that, I have to classify some test data as ...

iso_9001_

2,749

asked Mar 2, 2021 at 11:30

1 vote

1 answer

2k views

BERT tokenize URLs

I want to classify a bunch of tweets and therefore I'm using the huggingface implementation of BERT. However I noticed that the deafult BertTokenizer does not use special tokens for urls. >>> ...

random314

63

asked Oct 27, 2020 at 23:51

1 vote

1 answer

1k views

HuggingFace transformer evaluation process is too slow

I used the HuggingFace transformers library to train a BERT model for sequence classification. The training process is good on GPU, but the evaluation process(which is running GPU) is too slow. For ...

Mohsen Mahmoodzadeh

141

asked Aug 26, 2023 at 9:06

1 vote

1 answer

1k views

Is splitting a long document of a dataset for BERT considered bad practice?

I am fine-tuning a BERT model on a labeled dataset with many documents longer than the 512 token limit set by the tokenizer. Since truncating would lose a lot of data I would rather use, I started ...

marxlaml

341

asked Jan 19, 2023 at 23:20

1 vote

1 answer

2k views

TypeError: Expected `trainable` argument to be a boolean, but got: bert

I got this error when implementing my model. I think the erros come from the bert model which i have imported. def create_text_encoder( num_projection_layers, projection_dims, dropout_rate, ...

albert

168

asked Jun 29, 2022 at 12:37

1 vote

1 answer

356 views

what is the max limit of entities in a custom NER model

what is the maximum limit of entities we can have in a spacy or bert based custom NER models ? I have seen examples over the web which have been trained to a max of 10 custom entities per model and ...

GlobalLearner

57

asked May 8, 2022 at 14:56

1 vote

2 answers

9k views

Tensorflow: Compute Precision, Recall, F1 Score

i built a BERT Model (Bert-base-multilingual-cased) from Huggingface and want to evaluate the Model with its Precision, Recall and F1-score next to accuracy, as accurays isn't always the best metrics ...

Maxl Gemeinderat

453

asked Jan 5, 2022 at 8:20

1 vote

1 answer

312 views

How to create a language model with 2 different heads in huggingface?

I know I can create a language model with 1 head: from transformers import AutoModelForMultipleChoice model = AutoModelForMultipleChoice.from_pretrained("distilbert-base-cased").to(device) ...

Penguin

2,148

asked Nov 17, 2022 at 17:32

1 vote

1 answer

2k views

How is get predict accuracy score in Bert Classification

I am using Bert Classifier for my Chatbot project. I perform the necessary tokenizer operations for the incoming text message. Then I insert it into the model and make a prediction. How can I get the ...

Erdem Eminağa

43

asked Jul 9, 2021 at 6:53

1 vote

1 answer

2k views

How to store a .tar.gz formatted model to AWS SageMaker and use it as a deployed model?

I have a pre-trained BERT model which was trained on Google Cloud Platform, and the model is stored in a .tar.gz formatted file, I wanted to deploy this model to SageMaker and also be able to trigger ...

wawawa

3,115

asked Nov 21, 2020 at 16:37

Collectives™ on Stack Overflow

All Questions

Related Tags