All Questions

Filter by
Sorted by
Tagged with
11 votes
2 answers
14k views

Continual pre-training vs. Fine-tuning a language model with MLM

I have some custom data I want to use to further pre-train the BERT model. I’ve tried the two following approaches so far: Starting with a pre-trained BERT checkpoint and continuing the pre-training ...
Pedram's user avatar
  • 2,531
8 votes
1 answer
4k views

Uni-directional Transformer VS Bi-directional BERT

I just finished reading the Transformer paper and BERT paper. But couldn't figure out why Transformer is uni-directional and BERT is bi-directional as mentioned in BERT paper. As they don't use ...
JShen's user avatar
  • 409
6 votes
2 answers
7k views

Latest Pre-trained Multilingual Word Embedding

Are there any latest pre-trained multilingual word embeddings (multiple languages are jointly mapped to a same vector space)? I have looked at the following but they don't fit my needs: FastText / ...
MachineLearner's user avatar
3 votes
1 answer
1k views

How do you get single embedding vector for each word (token) from RoBERTa?

As you may know, RoBERTa (BERT, etc.) has its own tokenizer and sometimes you get pieces of given word as tokens, e.g. embeddings » embed, #dings Since the nature of the task I am working on, I need a ...
Fatih Beyhan's user avatar
3 votes
0 answers
1k views

How to post-train BERT model on custom dataset

I want to get the BERT word embeddings which will be used in another down-stream task later. I have a corpus for my custom dataset and want to further pre-train the pre-trained Huggingface BERT base ...
The Exile's user avatar
  • 694
3 votes
1 answer
1k views

Why does huggingface bert pooler hack make mixed precission training stable?

Huggigface BERT implementation has a hack to remove the pooler from optimizer. https://github.com/huggingface/transformers/blob/b832d5bb8a6dfc5965015b828e577677eace601e/examples/run_squad.py#L927 # ...
Krishan Subudhi's user avatar
2 votes
1 answer
825 views

How to feed the output of a finetuned bert model as inpunt to another finetuned bert model?

I finetuned two separate bert model (bert-base-uncased) on sentiment analysis and pos tagging tasks. Now, I want to feed the output of the pos tagger (batch, seqlength, hiddensize) as input to the ...
Erfan's user avatar
  • 23
2 votes
0 answers
582 views

Why is my pretrained BERT model always predicting the most frequent tokens (including [PAD])?

I am trying to further pretrain a Dutch BERT model with MLM on an in-domain dataset (law-related). I have set up my entire preprocessing and training stages, but when I use the trained model to ...
Carina's user avatar
  • 21
1 vote
3 answers
3k views

BERT encoding layer produces same output for all inputs during evaluation (PyTorch)

I don't understand why my BERT model returns the same output during evaluation. The output of my model during training seems correct, as the values were different, but is totally the same during ...
user9114146's user avatar
1 vote
2 answers
3k views

How to use a bert pretrained model somewhere else?

I followed this course https://www.coursera.org/learn/sentiment-analysis-bert about building a pretrained model for sentiment analysis. During the trining, at each epoch they saved the model using ...
Asma's user avatar
  • 189
1 vote
1 answer
701 views

Understanding the Hugging face transformers

I am new to the Transformers concept and I am going through some tutorials and writing my own code to understand the Squad 2.0 dataset Question Answering using the transformer models. In the hugging ...
Vishnukk's user avatar
  • 564
1 vote
1 answer
3k views

How to further pretrain a bert model using our custom data and increase the vocab size?

I am trying to further pretrain the bert-base model using the custom data. The steps I'm following are as follows: Generate list of words from the custom data and add these words to the existing bert-...
sravani.s's user avatar
  • 185
1 vote
1 answer
415 views

How many neurons (units) are there in the BERT model?

How to estimate the number of neurons (units) in the BERT model? Note this is different from the number of model parameters.
Celso França's user avatar
1 vote
1 answer
247 views

How to access BERT's inter layer?

I want to put [batch_size, 768, text_length] tensor into 6th layer of BERT. How can I give input to 6th layer? Can I take just 6~last layer of BERT then use it? Thank you.
유은석's user avatar
1 vote
1 answer
112 views

Is it possible to get the embedding table in tf_hub models?

I'm having problems with finetuning a BERT model. I was previously using get_transformer_encoder() in official.nlp and MLM task in official.nlp to train a BERT. But this seems like tough, so I changed ...
PlasticSaber's user avatar
1 vote
1 answer
975 views

"RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn " error BertFoeSequenceClassification

I am trying to build Bert model for Arabic Text classification task using pretrained model from https://github.com/alisafaya/Arabic-BERT i want to know the exact difference between the two statement: ...
Yaman Afadar's user avatar
1 vote
1 answer
1k views

Using Pretrained BERT model to add additional words that are not recognized by the model

I want some help regarding adding additional words in the existing BERT model. I have two quires kindly guide me: I am working on NER task for a domain: There are few words (not sure the exact numbers)...
muzamil's user avatar
  • 306
1 vote
0 answers
67 views

Pretraining BERT Models from scratch vs Further Pretraining

I want to pretrain an Arabic BERT model on domain-specific data to make it suitable for a specific domain problem, which is the classification of citizen reviews about government services into ...
Ghada Mansour's user avatar
1 vote
0 answers
25 views

Square brackets at the end of TFBertModel call method

I'm trying to understand how to use bert-base-cased pretrained model in my code, so I was reviewing this code: input_ids = Input(shape=(max_len,), dtype=tf.int32, name="input_ids") ...
Majd's user avatar
  • 73
1 vote
0 answers
104 views

Transfer learning (or fine-tuning) pre-trained model on non-text data

I am currently fine-tuning a sentiment analysis bert-based model using PyTorch Trainer from hugging face. So far, so good. I have easily managed to fine-tune the model on my text data. However, I'd ...
corvusMidnight's user avatar
1 vote
0 answers
214 views

Continue LM pretraining with Huggingface - loss function clarification

I'm trying to use Huggingface's tensorflow run_mlm.py script to continue pretraining a bert model, and didn't understand the following: in the above script, the model is loaded using from_pretrained ...
dalia's user avatar
  • 11
1 vote
0 answers
471 views

Why fine-tuning BERT mlm on specific domain doesn't work? What am I doing wrong?

I'm new. I'm trying to fine-tuned a BERT MLM (bert-base-uncased) on a target domain. Unfortunately, results are not good. Before fine-tuning, the pre-trained model fills the mask of a sentence with ...
Ding Dong's user avatar
1 vote
0 answers
385 views

BERT Pre-training accuracy not increasing

I am trying to pretrain BERT on dataset (wiki103) which contains 150k sentences. After 12 epochs nsp (next sentence prediction) task gives accuracy around 0.76 (overfits if I continue with more epochs)...
Abdul Wahab's user avatar
0 votes
1 answer
2k views

How to use GPU for training instead of CPU?

I was replicating the code which is fine-tuned for Domain Adaptation. This is the main link to the post for more details: (https://towardsdatascience.com/fine-tuning-for-domain-adaptation-in-nlp-...
Ashok Chhetri's user avatar
0 votes
2 answers
325 views

Failed to connect to TensorFlow master: TPU worker may not be ready or TensorFlow master address is incorrect

I signed up for the Tensor Research Cloud (TRC) program for the third time in two years. Now I barely created a preemptible v3-8 TPU. Before that, I could efficiently allocate five non-preemptible v3-...
darklordofsoftware's user avatar
0 votes
0 answers
13 views

Label not included inside my tensor dataset

I am new at Machine Learning and I want to use pretrained model from BERT model. I am facing the following problem: The label output is not inserted at the tensor type dataset. Does anyone have a ...
Hendra Putra's user avatar
0 votes
0 answers
37 views

Domain specific pretraining using BERT models vs other smaller architecture models

I have around 4K target data about Arabic citizen reviews towards government services and I want to apply transfer learning to enhance the performance of target task, which is classifying the reviews ...
Ghada Mansour's user avatar
0 votes
0 answers
20 views

Suitable data for Task Adaptive Pretraining

I want to pretrain an Arabic BERT model on domain-specific data to make it suitable for a specific domain problem, which is the classification of citizen reviews about government services into ...
Ghada Mansour's user avatar
0 votes
3 answers
431 views

Modifying last layer of a pre-trained sentiment classification model to get a linear output

How can I modify a pre-trained sentiment classification model (e.g., 'bert-base-multilingual-uncased-sentiment') to output a value between 0 and 1 instead of a classification tensor? The output should ...
GlitzerImHirn's user avatar
0 votes
1 answer
369 views

BERT pre-training - masked_lm_accuracy is always zero

I am trying to train BERT from scratch on a domain specific dataset using the official tensorflow github repository I used this part of documentation to adapt the scripts to my use case, but I have a ...
iulian's user avatar
  • 13
0 votes
1 answer
182 views

Fine-tune a pre-trained model

I am new to transformer based models. I am trying to fine-tune the following model (https://huggingface.co/Chramer/remote-sensing-distilbert-cased) on my dataset. The code: enter image description ...
Fahad Alghamdi's user avatar
0 votes
1 answer
1k views

Retrain a BERT Model

I have trained a BERT model using pytorch for about a million text data for a classification task. After testing this model with new data I get False Positives and False Negatives. Now I want retrain ...
Patricia 's user avatar
0 votes
1 answer
3k views

How to load Bert pretrained model with SentenceTransformers from local path?

I am using the SentenceTransformer library to use Bert pre-trained model I download the file in google Colabs and saved it with these commands: from sentence_transformers import SentenceTransformer ...
Sahar Rezazadeh's user avatar
0 votes
1 answer
161 views

Why does new layer get ignored in modified pretrained pytorch model?

I am trying to add a classification layer to a pre-trained Bert model. I've tried a few things people have posted online like: mod = list(model.children()) mod.pop() mod.append(torch.nn.Linear(768, ...
avcg21's user avatar
  • 1
0 votes
0 answers
170 views

Is there a like `datacollator` code can apply n-grams masked to masked Language Model using pytorch?

I want apply n-grams masked to masked Language Model in pre-train model using pytorch, Is there source code about it? or Just I must to Implementation it? This is huggingface's code about datacollator....
wa007's user avatar
  • 125
0 votes
1 answer
2k views

AttributeError: 'Tensor' object has no attribute 'size' pretrained bert

This is the model that I have defined: def build_model(): input_layer = keras.layers.Input(name="Input", shape=(MAX_LEN), dtype='int64') bert = BertForPreTraining.from_pretrained('...
Piyush Mishra's user avatar