Highest scored 'bert-language-model+pre-trained-model' questions

11 votes

2 answers

14k views

Continual pre-training vs. Fine-tuning a language model with MLM

I have some custom data I want to use to further pre-train the BERT model. I’ve tried the two following approaches so far: Starting with a pre-trained BERT checkpoint and continuing the pre-training ...

Pedram

2,531

asked Jul 20, 2021 at 20:52

8 votes

1 answer

4k views

Uni-directional Transformer VS Bi-directional BERT

I just finished reading the Transformer paper and BERT paper. But couldn't figure out why Transformer is uni-directional and BERT is bi-directional as mentioned in BERT paper. As they don't use ...

JShen

409

asked Mar 12, 2019 at 4:23

6 votes

2 answers

7k views

Latest Pre-trained Multilingual Word Embedding

Are there any latest pre-trained multilingual word embeddings (multiple languages are jointly mapped to a same vector space)? I have looked at the following but they don't fit my needs: FastText / ...

MachineLearner

423

asked Jun 15, 2020 at 9:13

3 votes

1 answer

1k views

How do you get single embedding vector for each word (token) from RoBERTa?

As you may know, RoBERTa (BERT, etc.) has its own tokenizer and sometimes you get pieces of given word as tokens, e.g. embeddings » embed, #dings Since the nature of the task I am working on, I need a ...

Fatih Beyhan

71

asked Jan 31, 2021 at 6:13

3 votes

0 answers

1k views

How to post-train BERT model on custom dataset

I want to get the BERT word embeddings which will be used in another down-stream task later. I have a corpus for my custom dataset and want to further pre-train the pre-trained Huggingface BERT base ...

The Exile

694

asked Nov 24, 2021 at 7:28

3 votes

1 answer

1k views

Why does huggingface bert pooler hack make mixed precission training stable?

Huggigface BERT implementation has a hack to remove the pooler from optimizer. https://github.com/huggingface/transformers/blob/b832d5bb8a6dfc5965015b828e577677eace601e/examples/run_squad.py#L927 # ...

Krishan Subudhi

376

asked Mar 18, 2020 at 16:42

2 votes

1 answer

825 views

How to feed the output of a finetuned bert model as inpunt to another finetuned bert model?

I finetuned two separate bert model (bert-base-uncased) on sentiment analysis and pos tagging tasks. Now, I want to feed the output of the pos tagger (batch, seqlength, hiddensize) as input to the ...

Erfan

23

asked Feb 19, 2020 at 10:13

2 votes

0 answers

582 views

Why is my pretrained BERT model always predicting the most frequent tokens (including [PAD])?

I am trying to further pretrain a Dutch BERT model with MLM on an in-domain dataset (law-related). I have set up my entire preprocessing and training stages, but when I use the trained model to ...

Carina

21

asked Apr 18, 2022 at 13:43

1 vote

3 answers

3k views

BERT encoding layer produces same output for all inputs during evaluation (PyTorch)

I don't understand why my BERT model returns the same output during evaluation. The output of my model during training seems correct, as the values were different, but is totally the same during ...

user9114146

172

asked May 17, 2020 at 16:56

1 vote

2 answers

3k views

How to use a bert pretrained model somewhere else?

I followed this course https://www.coursera.org/learn/sentiment-analysis-bert about building a pretrained model for sentiment analysis. During the trining, at each epoch they saved the model using ...

Asma

189

asked Jul 16, 2020 at 15:33

1 vote

1 answer

701 views

Understanding the Hugging face transformers

I am new to the Transformers concept and I am going through some tutorials and writing my own code to understand the Squad 2.0 dataset Question Answering using the transformer models. In the hugging ...

Vishnukk

564

asked May 13, 2020 at 12:40

1 vote

1 answer

3k views

How to further pretrain a bert model using our custom data and increase the vocab size?

I am trying to further pretrain the bert-base model using the custom data. The steps I'm following are as follows: Generate list of words from the custom data and add these words to the existing bert-...

sravani.s

185

asked Jul 17, 2020 at 6:07

1 vote

1 answer

415 views

How many neurons (units) are there in the BERT model?

How to estimate the number of neurons (units) in the BERT model? Note this is different from the number of model parameters.

Celso França

724

asked Mar 25, 2023 at 20:08

1 vote

1 answer

247 views

How to access BERT's inter layer?

I want to put [batch_size, 768, text_length] tensor into 6th layer of BERT. How can I give input to 6th layer? Can I take just 6~last layer of BERT then use it? Thank you.

유은석

25

asked Jan 28, 2023 at 7:29

1 vote

1 answer

112 views

Is it possible to get the embedding table in tf_hub models?

I'm having problems with finetuning a BERT model. I was previously using get_transformer_encoder() in official.nlp and MLM task in official.nlp to train a BERT. But this seems like tough, so I changed ...

PlasticSaber

63

asked Jan 28, 2022 at 2:54

1 vote

1 answer

975 views

"RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn " error BertFoeSequenceClassification

I am trying to build Bert model for Arabic Text classification task using pretrained model from https://github.com/alisafaya/Arabic-BERT i want to know the exact difference between the two statement: ...

Yaman Afadar

53

asked Nov 24, 2020 at 11:40

1 vote

1 answer

1k views

Using Pretrained BERT model to add additional words that are not recognized by the model

I want some help regarding adding additional words in the existing BERT model. I have two quires kindly guide me: I am working on NER task for a domain: There are few words (not sure the exact numbers)...

muzamil

306

asked Nov 13, 2020 at 6:53

1 vote

0 answers

67 views

Pretraining BERT Models from scratch vs Further Pretraining

I want to pretrain an Arabic BERT model on domain-specific data to make it suitable for a specific domain problem, which is the classification of citizen reviews about government services into ...

Ghada Mansour

11

asked Nov 28, 2023 at 21:41

1 vote

0 answers

25 views

Square brackets at the end of TFBertModel call method

I'm trying to understand how to use bert-base-cased pretrained model in my code, so I was reviewing this code: input_ids = Input(shape=(max_len,), dtype=tf.int32, name="input_ids") ...

Majd

73

asked May 29, 2023 at 14:25

1 vote

0 answers

104 views

Transfer learning (or fine-tuning) pre-trained model on non-text data

I am currently fine-tuning a sentiment analysis bert-based model using PyTorch Trainer from hugging face. So far, so good. I have easily managed to fine-tune the model on my text data. However, I'd ...

corvusMidnight

650

asked Dec 11, 2022 at 9:40

1 vote

0 answers

214 views

Continue LM pretraining with Huggingface - loss function clarification

I'm trying to use Huggingface's tensorflow run_mlm.py script to continue pretraining a bert model, and didn't understand the following: in the above script, the model is loaded using from_pretrained ...

dalia

11

asked Feb 16, 2022 at 13:29

1 vote

0 answers

471 views

Why fine-tuning BERT mlm on specific domain doesn't work? What am I doing wrong?

I'm new. I'm trying to fine-tuned a BERT MLM (bert-base-uncased) on a target domain. Unfortunately, results are not good. Before fine-tuning, the pre-trained model fills the mask of a sentence with ...

Ding Dong

25

asked Nov 18, 2021 at 15:30

1 vote

0 answers

385 views

BERT Pre-training accuracy not increasing

I am trying to pretrain BERT on dataset (wiki103) which contains 150k sentences. After 12 epochs nsp (next sentence prediction) task gives accuracy around 0.76 (overfits if I continue with more epochs)...

Abdul Wahab

137

asked Jan 6, 2021 at 1:18

0 votes

1 answer

2k views

How to use GPU for training instead of CPU?

I was replicating the code which is fine-tuned for Domain Adaptation. This is the main link to the post for more details: (https://towardsdatascience.com/fine-tuning-for-domain-adaptation-in-nlp-...

Ashok Chhetri

479

asked Jan 28, 2023 at 7:33

0 votes

2 answers

325 views

Failed to connect to TensorFlow master: TPU worker may not be ready or TensorFlow master address is incorrect

I signed up for the Tensor Research Cloud (TRC) program for the third time in two years. Now I barely created a preemptible v3-8 TPU. Before that, I could efficiently allocate five non-preemptible v3-...

darklordofsoftware

331

asked Dec 30, 2022 at 12:01

0 votes

0 answers

13 views

Label not included inside my tensor dataset

I am new at Machine Learning and I want to use pretrained model from BERT model. I am facing the following problem: The label output is not inserted at the tensor type dataset. Does anyone have a ...

Hendra Putra

1

asked Apr 9 at 10:36

0 votes

0 answers

37 views

Domain specific pretraining using BERT models vs other smaller architecture models

I have around 4K target data about Arabic citizen reviews towards government services and I want to apply transfer learning to enhance the performance of target task, which is classifying the reviews ...

Ghada Mansour

11

asked Dec 7, 2023 at 20:06

0 votes

0 answers

20 views

Suitable data for Task Adaptive Pretraining

I want to pretrain an Arabic BERT model on domain-specific data to make it suitable for a specific domain problem, which is the classification of citizen reviews about government services into ...

Ghada Mansour

11

asked Dec 4, 2023 at 14:14

0 votes

3 answers

431 views

Modifying last layer of a pre-trained sentiment classification model to get a linear output

How can I modify a pre-trained sentiment classification model (e.g., 'bert-base-multilingual-uncased-sentiment') to output a value between 0 and 1 instead of a classification tensor? The output should ...

GlitzerImHirn

11

asked Jun 7, 2023 at 7:41

0 votes

1 answer

369 views

BERT pre-training - masked_lm_accuracy is always zero

I am trying to train BERT from scratch on a domain specific dataset using the official tensorflow github repository I used this part of documentation to adapt the scripts to my use case, but I have a ...

iulian

13

asked Dec 10, 2022 at 11:50

0 votes

1 answer

182 views

Fine-tune a pre-trained model

I am new to transformer based models. I am trying to fine-tune the following model (https://huggingface.co/Chramer/remote-sensing-distilbert-cased) on my dataset. The code: enter image description ...

Fahad Alghamdi

1

asked Jun 9, 2022 at 11:48

0 votes

1 answer

1k views

Retrain a BERT Model

I have trained a BERT model using pytorch for about a million text data for a classification task. After testing this model with new data I get False Positives and False Negatives. Now I want retrain ...

Patricia

1

asked Nov 23, 2021 at 11:38

0 votes

1 answer

3k views

How to load Bert pretrained model with SentenceTransformers from local path?

I am using the SentenceTransformer library to use Bert pre-trained model I download the file in google Colabs and saved it with these commands: from sentence_transformers import SentenceTransformer ...

Sahar Rezazadeh

314

asked Oct 18, 2021 at 19:06

0 votes

1 answer

161 views

Why does new layer get ignored in modified pretrained pytorch model?

I am trying to add a classification layer to a pre-trained Bert model. I've tried a few things people have posted online like: mod = list(model.children()) mod.pop() mod.append(torch.nn.Linear(768, ...

avcg21

1

asked Aug 8, 2021 at 13:45

0 votes

0 answers

170 views

Is there a like `datacollator` code can apply n-grams masked to masked Language Model using pytorch?

I want apply n-grams masked to masked Language Model in pre-train model using pytorch, Is there source code about it? or Just I must to Implementation it? This is huggingface's code about datacollator....

wa007

125

asked Mar 24, 2021 at 2:46

0 votes

1 answer

2k views

AttributeError: 'Tensor' object has no attribute 'size' pretrained bert

This is the model that I have defined: def build_model(): input_layer = keras.layers.Input(name="Input", shape=(MAX_LEN), dtype='int64') bert = BertForPreTraining.from_pretrained('...

Piyush Mishra

13

asked Oct 10, 2020 at 7:41

Collectives™ on Stack Overflow

All Questions

Continual pre-training vs. Fine-tuning a language model with MLM

Uni-directional Transformer VS Bi-directional BERT

Latest Pre-trained Multilingual Word Embedding

How do you get single embedding vector for each word (token) from RoBERTa?

How to post-train BERT model on custom dataset

Why does huggingface bert pooler hack make mixed precission training stable?

How to feed the output of a finetuned bert model as inpunt to another finetuned bert model?

Why is my pretrained BERT model always predicting the most frequent tokens (including [PAD])?

BERT encoding layer produces same output for all inputs during evaluation (PyTorch)

How to use a bert pretrained model somewhere else?

Understanding the Hugging face transformers

How to further pretrain a bert model using our custom data and increase the vocab size?

How many neurons (units) are there in the BERT model?

How to access BERT's inter layer?

Is it possible to get the embedding table in tf_hub models?

"RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn " error BertFoeSequenceClassification

Using Pretrained BERT model to add additional words that are not recognized by the model

Pretraining BERT Models from scratch vs Further Pretraining

Square brackets at the end of TFBertModel call method

Transfer learning (or fine-tuning) pre-trained model on non-text data

Continue LM pretraining with Huggingface - loss function clarification

Why fine-tuning BERT mlm on specific domain doesn't work? What am I doing wrong?

BERT Pre-training accuracy not increasing

How to use GPU for training instead of CPU?

Failed to connect to TensorFlow master: TPU worker may not be ready or TensorFlow master address is incorrect

Label not included inside my tensor dataset

Domain specific pretraining using BERT models vs other smaller architecture models

Suitable data for Task Adaptive Pretraining

Modifying last layer of a pre-trained sentiment classification model to get a linear output

BERT pre-training - masked_lm_accuracy is always zero

Fine-tune a pre-trained model

Retrain a BERT Model

How to load Bert pretrained model with SentenceTransformers from local path?

Why does new layer get ignored in modified pretrained pytorch model?

Is there a like `datacollator` code can apply n-grams masked to masked Language Model using pytorch?

AttributeError: 'Tensor' object has no attribute 'size' pretrained bert

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags