Highest scored 'bert-language-model+deep-learning' questions

43 votes

2 answers

27k views

Why Bert transformer uses [CLS] token for classification instead of average over all tokens?

I am doing experiments on bert architecture and found out that most of the fine-tuning task takes the final hidden layer as text representation and later they pass it to other models for the further ...

Aaditya Ura

12.3k

asked Jul 2, 2020 at 21:25

24 votes

1 answer

49k views

How does max_length, padding and truncation arguments work in HuggingFace' BertTokenizerFast.from_pretrained('bert-base-uncased')?

I am working with Text Classification problem where I want to use the BERT model as the base followed by Dense layers. I want to know how does the 3 arguments work? For example, if I have 3 sentences ...

Deshwal

3,872

asked Dec 11, 2020 at 6:26

22 votes

6 answers

27k views

AttributeError: module 'torch' has no attribute '_six'. Bert model in Pytorch

I tried to load pre-trained model by using BertModel class in pytorch. I have _six.py under torch, but it still shows module 'torch' has no attribute '_six' import torch from pytorch_pretrained_bert ...

Ruitong LIU

221

asked May 21, 2019 at 15:41

12 votes

2 answers

12k views

How to train BERT from scratch on a new domain for both MLM and NSP?

I’m trying to train BERT model from scratch using my own dataset using HuggingFace library. I would like to train the model in a way that it has the exact architecture of the original BERT model. In ...

tlqn

379

asked Jan 9, 2021 at 19:46

11 votes

2 answers

14k views

Continual pre-training vs. Fine-tuning a language model with MLM

I have some custom data I want to use to further pre-train the BERT model. I’ve tried the two following approaches so far: Starting with a pre-trained BERT checkpoint and continuing the pre-training ...

Pedram

2,531

asked Jul 20, 2021 at 20:52

9 votes

1 answer

23k views

RuntimeError: The size of tensor a (4000) must match the size of tensor b (512) at non-singleton dimension 1

I'm trying to build a model for document classification. I'm using BERT with PyTorch. I got the bert model with below code. bert = AutoModel.from_pretrained('bert-base-uncased') This is the code for ...

Venkatesh Dharavath

520

asked Nov 26, 2020 at 14:01

9 votes

2 answers

3k views

BERT output not deterministic

BERT output is not deterministic. I expect the output values are deterministic when I put a same input, but my bert model the values are changing. Sounds awkwardly, the same value is returned twice, ...

Keanu Paik

314

asked Jun 17, 2019 at 23:17

9 votes

1 answer

4k views

Why BERT model have to keep 10% MASK token unchanged?

I am reading BERT model paper. In Masked Language Model task during pre-training BERT model, the paper said the model will choose 15% token ramdomly. In the chose token (Ti), 80% it will be replaced ...

Thanh Kiet

131

asked Sep 22, 2020 at 16:20

7 votes

2 answers

6k views

The essence of learnable positional embedding? Does embedding improve outcomes better?

I was recently reading the bert source code from the hugging face project. I noticed that the so-called "learnable position encoding" seems to refer to a specific nn.Parameter layer when it ...

AdamHommer

728

asked Jul 25, 2022 at 17:37

7 votes

1 answer

2k views

Fine-tune Bert for specific domain (unsupervised)

I want to fine-tune BERT on texts that are related to a specific domain (in my case related to engineering). The training should be unsupervised since I don't have any labels or anything. Is this ...

spadel

1,036

asked Nov 6, 2020 at 9:54

7 votes

1 answer

8k views

Mismatched size on BertForSequenceClassification from Transformers and multiclass problem

I just trained a BERT model on a Dataset composed by products and labels (departments) for an e-commerce website. It's a multiclass problem. I used BertForSequenceClassification to predict the ...

Guilherme Giuliano Nicolau

323

asked Sep 15, 2021 at 14:01

7 votes

2 answers

4k views

Pretraining a language model on a small custom corpus

I was curious if it is possible to use transfer learning in text generation, and re-train/pre-train it on a specific kind of text. For example, having a pre-trained BERT model and a small corpus ...

ysig

477

asked Apr 24, 2020 at 19:38

6 votes

1 answer

1k views

BERT performing worse than word2vec

I am trying to use BERT for a document ranking problem. My task is pretty straightforward. I have to do a similarity ranking for an input document. The only issue here is that I don’t have labels - so ...

user3741951

199

asked Apr 21, 2019 at 21:30

6 votes

3 answers

4k views

TypeError: Layer input_spec must be an instance of InputSpec. Got: InputSpec(shape=(None, 128, 768), ndim=3)

I am trying to use a BERT pretrained model to do a multiclass classification (of 3 classes). Here's my function to use the model and also added some extra functionalities: def create_model(max_seq_len,...

Hrisav Bhowmick

91

asked Aug 18, 2021 at 15:57

5 votes

2 answers

3k views

Why are the matrices in BERT called Query, Key, and Value?

Within the transformer units of BERT, there are modules called Query, Key, and Value, or simply Q,K,V. Based on the BERT paper and code (particularly in modeling.py), my pseudocode understanding of ...

solvingPuzzles

8,709

asked Jun 25, 2019 at 2:49

4 votes

1 answer

7k views

BERT for time series classification

I’d like to train a transformer encoder (e.g. BERT) on time-series data for a task that can be modeled as classification. Let met briefly describe the data I’m using before talking about the issue I’m ...

clems

129

asked Feb 22, 2021 at 18:05

4 votes

3 answers

27k views

OSError for huggingface model

I am trying to use a huggingface model (CamelBERT), but I am getting an error when loading the tokenizer: Code: from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer....

TMN

103

asked Mar 15, 2022 at 11:47

4 votes

1 answer

2k views

Finetuning BERT on custom data

I want to train a 21 class text classification model using Bert. But I have very little training data, so a downloaded a similar dataset with 5 classes with 2 million samples.t And finetuned ...

danishansari

644

asked May 4, 2019 at 5:40

4 votes

1 answer

5k views

BertModel or BertForPreTraining

I want to use Bert only for embedding and use the Bert output as an input for a classification net that I will build from scratch. I am not sure if I want to do finetuning for the model. I think the ...

Amit S

243

asked Mar 12, 2021 at 7:53

4 votes

1 answer

6k views

BERT outputs explained

The keys of the BERT encoder's output are default, encoder_outputs, pooled_output and sequence_output As far as I can know, encoder_outputs are the output of each encoder, pooled_output is the output ...

OK 400

1,159

asked Nov 4, 2021 at 8:41

4 votes

1 answer

7k views

There appear to be 1 leaked semaphore objects to clean up at shutdown

I am using MacOS & used DistilBert model using Sentence Transformer for chatbot implementation and generated the API in VS code. But after giving 3 inputs it pop’s up this error: UserWarning: ...

Tejas Sutar

81

asked Jun 9, 2022 at 7:17

4 votes

0 answers

748 views

HuggingFace BertForMaskedLM: Expected input batch_size (3200) to match target batch_size (16)

Im working on a Multiclass Classification (Bengali Language Sentiment Analysis) on a pretrained Huggingface (BertForMaskedLM) model. When the error occured I knew I have to change the label(output) ...

epitope21

41

asked Jun 18, 2021 at 19:48

3 votes

1 answer

3k views

ValueError: Unknown layer: TFBertModel. Please ensure this object is passed to the `custom_objects` argument

Here I training the bert model. below code i used to train, when i load the saved model for predict, it's shows this error. can anyone please help me out? import tensorflow as tf import logging from ...

waji

71

asked Aug 31, 2022 at 14:49

3 votes

2 answers

1k views

Multilingual Bert sentence vector captures language used more than meaning - working as interned?

Playing around with BERT, I downloaded the Huggingface Multilingual Bert and entered three sentences, saving their sentence vectors (the embedding of [CLS]), then translated them via Google Translate, ...

user2182857

718

asked Jan 6, 2020 at 22:19

3 votes

2 answers

2k views

Sentence-Transformer Training and Validation Loss

I am using the Sentence-Transformers model to Fine Tune(using PyTorch) it on a custom dataset which is the same as the Semantic Text Similarity (STS) Dataset. I am unable to get(or print) the training ...

Abhas kumar

37

asked Mar 6, 2023 at 21:15

3 votes

1 answer

939 views

How to find the (Most important) responsible Words/ Tokens/ embeddings responsible for the label result of a text classification model in PyTorch

Let us suppose I have a model like: class BERT_Subject_Classifier(nn.Module): def __init__(self,out_classes,hidden1=128,hidden2=32,dropout_val=0.2): super(BERT_Subject_Classifier, self)....

Deshwal

3,872

asked Jan 8, 2021 at 7:44

3 votes

1 answer

807 views

How to set output_shape of BERT preprocessing layer from tensorflow hub?

I am building a simple BERT model for text classification, using the tensorflow hub. import tensorflow as tf import tensorflow_hub as tf_hub bert_preprocess = tf_hub.KerasLayer("https://tfhub....

lazarea

1,219

asked Sep 18, 2022 at 14:26

3 votes

0 answers

855 views

Same input, same model, same weights but getting different results

I'm finetuning sentence-bert to do some task like sentence cosine-similarity calculation in Tensorflow. I set up a encoder, let's say, encoder1 using the code below: from sentence_transformers import ...

PlasticSaber

63

asked Mar 12, 2022 at 16:16

3 votes

2 answers

2k views

how to save and load custom siamese bert model

I am following this tutorial on how to train a siamese bert network: https://keras.io/examples/nlp/semantic_similarity_with_bert/ all good, but I am not sure what is the best way to save the model ...

Carbo

916

asked Mar 8, 2022 at 14:20

3 votes

0 answers

708 views

I'm trying to load BERT "tfbert-large-uncased" but i got an error "Can't load config.json file"

I'm trying to load the pre-train BERT model but I'm getting an error while loading tokenized it says config.json is not found. If anyone knows how to solve these issues please help me Model and path ...

iamhimanshu0

379

asked May 20, 2021 at 16:10

3 votes

0 answers

710 views

Google BERT and antonym detection

I recently learned about the following phenomenon: Google BERT word embeddings of well-known state-of-the-art models seem to ignore the measure of semantical contrast between antonyms in terms of the ...

Moshe

555

asked Nov 8, 2020 at 13:11

2 votes

1 answer

1k views

Using BERT in order to detect language of a given word

I have words in the Hebrew language. Part of them are originally in English, and part of them are 'Hebrew English', meaning that those are words that are originally from English but are written with ...

jonb

865

asked Jun 23, 2019 at 10:48

2 votes

1 answer

2k views

How does the BERT tokenizer result in an input tensor shape of (b, 24, 768)?

I understand how the BERT tokenizer works thanks to this article: https://albertauyeung.github.io/2020/06/19/bert-tokenization.html However, I am confused about how this ends up as the final input ...

Joshua Clancy

131

asked Jan 19, 2021 at 18:25

2 votes

1 answer

715 views

BERT Text Classification

I am new to BERT and try to learn BERT Fine-Tuning for Text Classification via a coursera course https://www.coursera.org/projects/fine-tune-bert-tensorflow/ Based on the course, I would like to ...

plm0998

35

asked Apr 17, 2021 at 16:46

2 votes

2 answers

2k views

Fine-tune BERT for a specific domain on a different language?

I want to fine-tune on a pre-trained BERT model. However, my task uses data within a specific domain (say biomedical data). Additionally, my data is also in a language different from English (say ...

Moonreaderx

33

asked Jan 27, 2021 at 20:40

2 votes

1 answer

306 views

Trying to train model for Intent Recognition but getting float error

I'm trying to train the model for intent recognition. I tried removing all special characters and stop words but unable to resolve this error. I tried removing integers also but it's throwing an error....

user13510399

asked Dec 15, 2020 at 14:55

2 votes

1 answer

106 views

Tensor flow Model saving and Calculating Average of Models [closed]

I am trying to implement and reproduce the results of federated Bert pertaining in paper Federated pretraining and fine-tuning of BERT using clinical notes from multiple silos. I prefer to use ...

Faseela Thayattuchira

499

asked Jun 17, 2020 at 23:06

2 votes

1 answer

498 views

Have you encountered the similar problem like loss jitter during training?

Background: It's about loss jittering which generates at the beginning stage of every training epoch. When the dataloader loads the first batch data to feed into the network, the loss value always ...

Timetraveler

21

asked Mar 20, 2020 at 12:55

2 votes

1 answer

123 views

Why is a throw-away column required in Bert format?

I have recently come across Bert(Bidirectional Encoder Representations from Transformers). I saw that Bert requires a strict format for the train data. The third column needed is described as follows: ...

anegru

1,093

asked Apr 29, 2019 at 20:49

2 votes

0 answers

138 views

How can I implement this BERT model for sequential sentences classification using HuggingFace?

I want to classify the functions of sentences in the abstracts of scientific papers, and the function of a sentence is related to the functions of its surrounding sentences. I found the model proposed ...

Tom Leung

364

asked Oct 19, 2022 at 21:06

2 votes

1 answer

2k views

How to compute the Hessian of a large neural network in PyTorch?

How to compute the Hessian matrix of a large neural network or transformer model like BERT in PyTorch? I know torch.autograd.functional.hessian, but it seems like it only calculates the Hessian of a ...

Yan Pan

21

asked Mar 14, 2022 at 16:50

2 votes

0 answers

1k views

Finetuning Transformers in PyTorch (BERT, RoBERTa, etc.)

Alright. So there are multiple methods to fine tune a transformer: freeze transformer's parameters and only its final outputs are fed into another model (user trains this "another" model), ...

brucewlee

31

asked Feb 19, 2022 at 4:16

2 votes

0 answers

431 views

String cleaning/preprocessing for BERT

So my goal is to train a BERT Model on wikipedia data that I derive right from Wikipedia. The contents that I scrape from the site look like this (example): "(148975) 2001 XA255, provisional ...

Heidedo

21

asked Nov 22, 2021 at 15:04

2 votes

0 answers

607 views

I am getting OOM while running PRE TRAINED Bert Model with new dataset with 20k

I have pre trained model with Accuracy of 96 with 2 epochs and I am trying to use that model on new dataset of 20k tweets for sentiment analysis. while doing that I am getting below error. I haven't ...

RAMA KRISHNA

51

asked Mar 12, 2021 at 17:24

2 votes

0 answers

156 views

How to predict if a phrase is related to a short text or an article using supervised learning?

I have set of short phrases and a set of texts. I want to predict if a phrase is related to an article. A phrase that isn't appearing in the article may still be related. Some examples of annotated ...

landings

696

asked Dec 8, 2020 at 16:03

2 votes

0 answers

470 views

How Bert change max sequence length when we do fine tune task?

Suppose we use pretrained model with max sequence length 128。 Now I change the config file, reduce the max sequence length from 128 to 64。 Next, do the fine tune task , such as a smiple classifaction ...

McCree Feng

199

asked Oct 23, 2020 at 9:52

2 votes

1 answer

75 views

TensorFlow1.15, the inner logic of Estimator's input_fn? Or the inner logic of MirroredStrategy?

I am pretraining BERT in 1 machine with 4 GPU, not 1 GPU. For each training step, I am wondering whether the input_fn give 1 GPU 1 batch or give 4 GPU 1 batch. The mirrow strategy code: ...

惊天补扣

1,732

asked Jun 11, 2020 at 3:48

2 votes

0 answers

322 views

Is it possible to vectorize the documents using Google BERT?

I would like to convert my documents to vector using BERT, one vector for each document. Is it possible? How could it be programmed using standard or popular libraries?

user_5

548

asked Nov 4, 2019 at 17:11

2 votes

0 answers

359 views

How to use run_classifer.py,an example of Pytorch implementation of Bert for classification Task?

How to use the fine-tuned bert pytorch model for classification (CoLa) task? I do not see the argument --do_predict, in /examples/run_classifier.py. However, --do_predict exists in the original ...

Ashwin Geet D'Sa

6,934

asked May 15, 2019 at 14:24

2 votes

1 answer

161 views

bert_vocab.bert_vocab_from_dataset returning wrong vocabulary [closed]

i'm trying to build a tokenizer following the tf's tutorial https://www.tensorflow.org/text/guide/subwords_tokenizer. I'm basically doing the same thing only with a different dataset. The dataset in ...

Niccolò Tiezzi

77

asked Apr 8, 2023 at 10:30

Collectives™ on Stack Overflow

All Questions

Related Tags