All Questions

Filter by
Sorted by
Tagged with
8 votes
1 answer
18k views

How is the number of parameters be calculated in BERT model?

The paper "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Devlin & Co. calculated for the base model size 110M parameters (i.e. L=12, H=768, A=12) ...
EchoCache's user avatar
  • 575
7 votes
1 answer
2k views

Fine-tune Bert for specific domain (unsupervised)

I want to fine-tune BERT on texts that are related to a specific domain (in my case related to engineering). The training should be unsupervised since I don't have any labels or anything. Is this ...
spadel's user avatar
  • 1,036
5 votes
2 answers
5k views

How to use trained BERT model checkpoints for prediction?

I trained the BERT with SQUAD 2.0 and got the model.ckpt.data, model.ckpt.meta, model.ckpt.index (F1 score : 81) in the output directory along with predictions.json, etc. using the BERT-master/...
Jeeva Bharathi's user avatar
4 votes
0 answers
1k views

Word embeddings with BERT and map tensors to words

I try to aggregate BERT embeddings on the token level. For each token in the corpus vocabulary, I would like to create a list of all their contextual embeddings and average them to get one ...
Andrej's user avatar
  • 3,799
3 votes
3 answers
5k views

what is the difference between pooled output and sequence output in bert layer?

everyone! I was reading about Bert and wanted to do text classification with its word embeddings. I came across this line of code: pooled_output, sequence_output = self.bert_layer([input_word_ids, ...
mitra mirshafiee's user avatar
3 votes
1 answer
3k views

ValueError: Unknown layer: TFBertModel. Please ensure this object is passed to the `custom_objects` argument

Here I training the bert model. below code i used to train, when i load the saved model for predict, it's shows this error. can anyone please help me out? import tensorflow as tf import logging from ...
waji's user avatar
  • 71
3 votes
2 answers
3k views

How can i get all outputs of the last transformer encoder in bert pretrained model and not just the cls token output?

I'm using pytorch and this is the model from huggingface transformers link: from transformers import BertTokenizerFast, BertForSequenceClassification bert = BertForSequenceClassification....
Alaa Grable's user avatar
3 votes
0 answers
1k views

How to update vocabulary of pre-trained bert model while doing my own training task?

I am now working on a task of predicting masked word using BERT model. Unlike others, the answer needs to be chosen from specific options. For instance: sentence: "In my daily [MASKED], ..." options:...
COrra's user avatar
  • 31
2 votes
2 answers
7k views

using nn.Cross entropy between outputs and target label

I use this code function to train the model def train(): model.train() total_loss, total_accuracy = 0, 0 # empty list to save model predictions total_preds=[] # iterate over ...
Shorouk Adel's user avatar
2 votes
1 answer
537 views

Multi Head Attention: Correct implementation of Linear Transformations of Q, K, V

I am implementing the Multi-Head Self-Attention in Pytorch now. I looked at a couple of implementations and they seem a bit wrong, or at least I am not sure why it is done the way it is. They would ...
Germans Savcisens's user avatar
2 votes
2 answers
1k views

Translation between different tokenizers

Sorry if this question is too basic to be asked here. I tried but I couldn't find solutions. I'm now working on an NLP project that requires using two different models (BART for summarization and BERT ...
exitialium's user avatar
2 votes
1 answer
2k views

BigBird, or Sparse self-attention: How to implement a sparse matrix?

This question is related to the new paper: Big Bird: Transformers for Longer Sequences. Mainly, about the implementation of the Sparse Attention (that is specified in the Supplemental material, part D)...
Germans Savcisens's user avatar
2 votes
1 answer
1k views

cannot import name 'DISTILBERT_PRETRAINED_MODEL_ARCHIVE_MAP' from 'transformers.modeling_distilbert'

I am trying to train the distil BERT model for Question Answering purpose. I have installed simple transformers and everything but when I try to run the following command: model = ...
swapnil agashe's user avatar
2 votes
0 answers
449 views

Updating model parameters of two models in one optimizer optimizes just one neural network

I'm trying to train two sequential neural networks in one optimizer. I read that this can be done by defining the optimizer as follows: optimizer_domain = torch.optim.SGD(list(sentences_model....
thomasb's user avatar
  • 23
2 votes
1 answer
2k views

How to compute the Hessian of a large neural network in PyTorch?

How to compute the Hessian matrix of a large neural network or transformer model like BERT in PyTorch? I know torch.autograd.functional.hessian, but it seems like it only calculates the Hessian of a ...
Yan Pan's user avatar
  • 21
1 vote
2 answers
574 views

Fine tuning BERT with my own entities/labels

i would like to fine tune A BERT model with my own labels, like [COLOR, MATERIAL] and not the normal "NAME", "ORG". I'm following this Colab: https://colab.research.google.com/drive/...
mrqwerty91's user avatar
1 vote
1 answer
680 views

Overfitting training data but still improving on test data

My machine learning model massively overfits the training data but still performs quite well on test data. When using a neural network approach every iteration increases the accuracy on the test set ...
Nick Sorros's user avatar
1 vote
1 answer
2k views

ValueError: Target size (torch.Size([32])) must be the same as input size (torch.Size([32, 3]))

I've looked at some explanation. Here But I understand what is going wrong I think, but my error occurs not at the loss. For example the snippet where the error is occurring is the line outputs = ...
Jeff Jefferson's user avatar
1 vote
0 answers
184 views

Concatenating two pre-trained BERT

max_length = 50 tokenizer = RobertaTokenizer.from_pretrained('roberta-large', do_lower_case=True) encodings = tokenizer.batch_encode_plus(comments,max_length=max_length,pad_to_max_length=True, ...
Mamad_Knight's user avatar
1 vote
0 answers
61 views

How to Local Bert to Bert_module_hub

I just want to my Local Bert to here: bert_module = hub.Module( BERT_MODEL_HUB, trainable=True) How to add my local bert? i have Tensorflow==1.15 and python==3.7 def create_model(is_predicting, ...
Bold Ganbaatar's user avatar
1 vote
0 answers
2k views

Predicting NER with BertForTokenClassification model

I have built my model using this tutorial on NER with bert: https://www.depends-on-the-definition.com/named-entity-recognition-with-bert/#resources However, I could not figure out how to parse in a ...
Hi_there's user avatar
0 votes
1 answer
2k views

Using KerasClassifier for training neural network

I created a simple neural network for binary spam/ham text classification using pretrained BERT transformer. The current pure-keras implementation works fine. I wanted however to plot certain metrics ...
lazarea's user avatar
  • 1,219
0 votes
1 answer
1k views

BERT model not giving loss or logits when training in an epoch

I'm trying to train the model. This is the epoch loop seed_val = 17 random.seed(seed_val) np.random.seed(seed_val) torch.manual_seed(seed_val) torch.cuda.manual_seed_all(seed_val) device = torch....
Jeff Jefferson's user avatar
0 votes
1 answer
1k views

Pytorch Siamese NN with BERT for sentence matching

I'm trying to build a Siamese neural network using pytorch in which I feed BERT word embeddings and trying to find whether two sentences are similar or not (imagine duplicate posts matching, product ...
Antonis Karvelas's user avatar
0 votes
1 answer
566 views

Sequence labeling with BERT for words position

If I have a set of sentences and in these sentences there are some dependencies between words. I want to train BERT to predict which words have dependencies with others. Example, If I have this ...
Minions's user avatar
  • 5,327
0 votes
0 answers
68 views

Neural network classifier always outputs the same class

I'm coding a neural network for recommendation system using pytorch. The item's metadata is a textual description and user's metadata are age and gender (binary values). I used Bert encoder (with ...
milad heidari's user avatar
0 votes
1 answer
3k views

How does BERT loss function works?

I'm confused about how cross-entropy works in bert LM. To calculate loss function we need the truth labels of masks. But we don't have the vector representation of the truth labels and the predictions ...
kowser66's user avatar
  • 155
0 votes
1 answer
3k views

RuntimeError: shape '[4, 512]' is invalid for input of size 1024 while while evaluating test data

I am trying XLnet over Jigsaw toxic dataset. When I train my data with input_ids = d["input_ids"].reshape(4,512).to(device) # batch size x seq length it trains perfectly. But when I try to ...
RajB009's user avatar
  • 427
0 votes
0 answers
162 views

How to use BigBirdModel to create a neural network in Python?

I am trying to create a network with tenserflow and BigBird. from transformers import BigBirdModel import tensorflow as tf classic_model = BigBirdModel.from_pretrained('google/bigbird-roberta-base') ...
Josephine Fraser's user avatar
0 votes
0 answers
186 views

problem in training RNN using bert embeddings

I have been working with bert embedding using a neural network model for the sentiment classification task. during model fit it's giving indices error, and I am still new to this so could not able to ...
shankar's user avatar
  • 39
-1 votes
1 answer
793 views

Should feature embeddings be taken before or after dropout layer in neural network?

I am training a binary text classification model using BERT as follows: def create_model(): text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text') preprocessed_text = ...
Jane Sully's user avatar
  • 3,247