All Questions
Tagged with bert-language-model pre-trained-model
37
questions
11
votes
2
answers
14k
views
Continual pre-training vs. Fine-tuning a language model with MLM
I have some custom data I want to use to further pre-train the BERT model. I’ve tried the two following approaches so far:
Starting with a pre-trained BERT checkpoint and continuing the pre-training ...
8
votes
1
answer
4k
views
Uni-directional Transformer VS Bi-directional BERT
I just finished reading the Transformer paper and BERT paper. But couldn't figure out why Transformer is uni-directional and BERT is bi-directional as mentioned in BERT paper. As they don't use ...
6
votes
2
answers
7k
views
Latest Pre-trained Multilingual Word Embedding
Are there any latest pre-trained multilingual word embeddings (multiple languages are jointly mapped to a same vector space)?
I have looked at the following but they don't fit my needs:
FastText / ...
3
votes
1
answer
1k
views
How do you get single embedding vector for each word (token) from RoBERTa?
As you may know, RoBERTa (BERT, etc.) has its own tokenizer and sometimes you get pieces of given word as tokens, e.g. embeddings » embed, #dings
Since the nature of the task I am working on, I need a ...
3
votes
0
answers
1k
views
How to post-train BERT model on custom dataset
I want to get the BERT word embeddings which will be used in another down-stream task later. I have a corpus for my custom dataset and want to further pre-train the pre-trained Huggingface BERT base ...
3
votes
1
answer
1k
views
Why does huggingface bert pooler hack make mixed precission training stable?
Huggigface BERT implementation has a hack to remove the pooler from optimizer.
https://github.com/huggingface/transformers/blob/b832d5bb8a6dfc5965015b828e577677eace601e/examples/run_squad.py#L927
# ...
2
votes
1
answer
825
views
How to feed the output of a finetuned bert model as inpunt to another finetuned bert model?
I finetuned two separate bert model (bert-base-uncased) on sentiment analysis and pos tagging tasks. Now, I want to feed the output of the pos tagger (batch, seqlength, hiddensize) as input to the ...
2
votes
0
answers
582
views
Why is my pretrained BERT model always predicting the most frequent tokens (including [PAD])?
I am trying to further pretrain a Dutch BERT model with MLM on an in-domain dataset (law-related). I have set up my entire preprocessing and training stages, but when I use the trained model to ...
1
vote
3
answers
3k
views
BERT encoding layer produces same output for all inputs during evaluation (PyTorch)
I don't understand why my BERT model returns the same output during evaluation. The output of my model during training seems correct, as the values were different, but is totally the same during ...
1
vote
2
answers
3k
views
How to use a bert pretrained model somewhere else?
I followed this course https://www.coursera.org/learn/sentiment-analysis-bert about building a pretrained model for sentiment analysis. During the trining, at each epoch they saved the model using ...
1
vote
1
answer
701
views
Understanding the Hugging face transformers
I am new to the Transformers concept and I am going through some tutorials and writing my own code to understand the Squad 2.0 dataset Question Answering using the transformer models. In the hugging ...
1
vote
1
answer
3k
views
How to further pretrain a bert model using our custom data and increase the vocab size?
I am trying to further pretrain the bert-base model using the custom data. The steps I'm following are as follows:
Generate list of words from the custom data and add these words to the existing bert-...
1
vote
1
answer
415
views
How many neurons (units) are there in the BERT model?
How to estimate the number of neurons (units) in the BERT model?
Note this is different from the number of model parameters.
1
vote
1
answer
247
views
How to access BERT's inter layer?
I want to put [batch_size, 768, text_length] tensor into
6th layer of BERT.
How can I give input to 6th layer?
Can I take just 6~last layer of BERT then use it?
Thank you.
1
vote
1
answer
112
views
Is it possible to get the embedding table in tf_hub models?
I'm having problems with finetuning a BERT model. I was previously using get_transformer_encoder() in official.nlp and MLM task in official.nlp to train a BERT. But this seems like tough, so I changed ...
1
vote
1
answer
975
views
"RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn " error BertFoeSequenceClassification
I am trying to build Bert model for Arabic Text classification task using pretrained model from https://github.com/alisafaya/Arabic-BERT
i want to know the exact difference between the two statement:
...
1
vote
1
answer
1k
views
Using Pretrained BERT model to add additional words that are not recognized by the model
I want some help regarding adding additional words in the existing BERT model. I have two quires kindly guide me:
I am working on NER task for a domain:
There are few words (not sure the exact numbers)...
1
vote
0
answers
67
views
Pretraining BERT Models from scratch vs Further Pretraining
I want to pretrain an Arabic BERT model on domain-specific data to make it suitable for a specific domain problem, which is the classification of citizen reviews about government services into ...
1
vote
0
answers
25
views
Square brackets at the end of TFBertModel call method
I'm trying to understand how to use bert-base-cased pretrained model in my code, so I was reviewing this code:
input_ids = Input(shape=(max_len,), dtype=tf.int32, name="input_ids")
...
1
vote
0
answers
104
views
Transfer learning (or fine-tuning) pre-trained model on non-text data
I am currently fine-tuning a sentiment analysis bert-based model using PyTorch Trainer from hugging face. So far, so good.
I have easily managed to fine-tune the model on my text data. However, I'd ...
1
vote
0
answers
214
views
Continue LM pretraining with Huggingface - loss function clarification
I'm trying to use Huggingface's tensorflow run_mlm.py script to continue pretraining a bert model, and didn't understand the following:
in the above script, the model is loaded using from_pretrained ...
1
vote
0
answers
471
views
Why fine-tuning BERT mlm on specific domain doesn't work? What am I doing wrong?
I'm new. I'm trying to fine-tuned a BERT MLM (bert-base-uncased) on a target domain. Unfortunately, results are not good.
Before fine-tuning, the pre-trained model fills the mask of a sentence with ...
1
vote
0
answers
385
views
BERT Pre-training accuracy not increasing
I am trying to pretrain BERT on dataset (wiki103) which contains 150k sentences. After 12 epochs nsp (next sentence prediction) task gives accuracy around 0.76 (overfits if I continue with more epochs)...
0
votes
1
answer
2k
views
How to use GPU for training instead of CPU?
I was replicating the code which is fine-tuned for Domain Adaptation. This is the main link to the post for more details:
(https://towardsdatascience.com/fine-tuning-for-domain-adaptation-in-nlp-...
0
votes
2
answers
325
views
Failed to connect to TensorFlow master: TPU worker may not be ready or TensorFlow master address is incorrect
I signed up for the Tensor Research Cloud (TRC) program for the third time in two years. Now I barely created a preemptible v3-8 TPU. Before that, I could efficiently allocate five non-preemptible v3-...
0
votes
0
answers
13
views
Label not included inside my tensor dataset
I am new at Machine Learning and I want to use pretrained model from BERT model.
I am facing the following problem: The label output is not inserted at the tensor type dataset.
Does anyone have a ...
0
votes
0
answers
37
views
Domain specific pretraining using BERT models vs other smaller architecture models
I have around 4K target data about Arabic citizen reviews towards government services and I want to apply transfer learning to enhance the performance of target task, which is classifying the reviews ...
0
votes
0
answers
20
views
Suitable data for Task Adaptive Pretraining
I want to pretrain an Arabic BERT model on domain-specific data to make it suitable for a specific domain problem, which is the classification of citizen reviews about government services into ...
0
votes
3
answers
431
views
Modifying last layer of a pre-trained sentiment classification model to get a linear output
How can I modify a pre-trained sentiment classification model (e.g., 'bert-base-multilingual-uncased-sentiment') to output a value between 0 and 1 instead of a classification tensor? The output should ...
0
votes
1
answer
369
views
BERT pre-training - masked_lm_accuracy is always zero
I am trying to train BERT from scratch on a domain specific dataset using the official tensorflow github repository
I used this part of documentation to adapt the scripts to my use case, but I have a ...
0
votes
1
answer
182
views
Fine-tune a pre-trained model
I am new to transformer based models. I am trying to fine-tune the following model (https://huggingface.co/Chramer/remote-sensing-distilbert-cased) on my dataset. The code:
enter image description ...
0
votes
1
answer
1k
views
Retrain a BERT Model
I have trained a BERT model using pytorch for about a million text data for a classification task. After testing this model with new data I get False Positives and False Negatives. Now I want retrain ...
0
votes
1
answer
3k
views
How to load Bert pretrained model with SentenceTransformers from local path?
I am using the SentenceTransformer library to use Bert pre-trained model
I download the file in google Colabs and saved it with these commands:
from sentence_transformers import SentenceTransformer
...
0
votes
1
answer
161
views
Why does new layer get ignored in modified pretrained pytorch model?
I am trying to add a classification layer to a pre-trained Bert model. I've tried a few things people have posted online like:
mod = list(model.children())
mod.pop()
mod.append(torch.nn.Linear(768, ...
0
votes
0
answers
170
views
Is there a like `datacollator` code can apply n-grams masked to masked Language Model using pytorch?
I want apply n-grams masked to masked Language Model in pre-train model using pytorch, Is there source code about it? or Just I must to Implementation it?
This is huggingface's code about datacollator....
0
votes
1
answer
2k
views
AttributeError: 'Tensor' object has no attribute 'size' pretrained bert
This is the model that I have defined:
def build_model():
input_layer = keras.layers.Input(name="Input", shape=(MAX_LEN), dtype='int64')
bert = BertForPreTraining.from_pretrained('...