Highest scored 'bert-language-model+neural-network' questions

8 votes

1 answer

18k views

How is the number of parameters be calculated in BERT model?

The paper "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Devlin & Co. calculated for the base model size 110M parameters (i.e. L=12, H=768, A=12) ...

EchoCache

575

asked Oct 22, 2020 at 15:41

7 votes

1 answer

2k views

Fine-tune Bert for specific domain (unsupervised)

I want to fine-tune BERT on texts that are related to a specific domain (in my case related to engineering). The training should be unsupervised since I don't have any labels or anything. Is this ...

spadel

1,036

asked Nov 6, 2020 at 9:54

5 votes

2 answers

5k views

How to use trained BERT model checkpoints for prediction?

I trained the BERT with SQUAD 2.0 and got the model.ckpt.data, model.ckpt.meta, model.ckpt.index (F1 score : 81) in the output directory along with predictions.json, etc. using the BERT-master/...

Jeeva Bharathi

544

asked Jun 28, 2019 at 4:28

4 votes

0 answers

1k views

Word embeddings with BERT and map tensors to words

I try to aggregate BERT embeddings on the token level. For each token in the corpus vocabulary, I would like to create a list of all their contextual embeddings and average them to get one ...

Andrej

3,799

asked Aug 4, 2020 at 9:49

3 votes

3 answers

5k views

what is the difference between pooled output and sequence output in bert layer?

everyone! I was reading about Bert and wanted to do text classification with its word embeddings. I came across this line of code: pooled_output, sequence_output = self.bert_layer([input_word_ids, ...

mitra mirshafiee

453

asked Aug 12, 2020 at 13:09

3 votes

1 answer

3k views

ValueError: Unknown layer: TFBertModel. Please ensure this object is passed to the `custom_objects` argument

Here I training the bert model. below code i used to train, when i load the saved model for predict, it's shows this error. can anyone please help me out? import tensorflow as tf import logging from ...

waji

71

asked Aug 31, 2022 at 14:49

3 votes

2 answers

3k views

How can i get all outputs of the last transformer encoder in bert pretrained model and not just the cls token output?

I'm using pytorch and this is the model from huggingface transformers link: from transformers import BertTokenizerFast, BertForSequenceClassification bert = BertForSequenceClassification....

Alaa Grable

101

asked Dec 9, 2020 at 12:43

3 votes

0 answers

1k views

How to update vocabulary of pre-trained bert model while doing my own training task?

I am now working on a task of predicting masked word using BERT model. Unlike others, the answer needs to be chosen from specific options. For instance: sentence: "In my daily [MASKED], ..." options:...

COrra

31

asked Feb 17, 2020 at 13:42

2 votes

2 answers

7k views

using nn.Cross entropy between outputs and target label

I use this code function to train the model def train(): model.train() total_loss, total_accuracy = 0, 0 # empty list to save model predictions total_preds=[] # iterate over ...

Shorouk Adel

147

asked Dec 7, 2021 at 22:40

2 votes

1 answer

537 views

Multi Head Attention: Correct implementation of Linear Transformations of Q, K, V

I am implementing the Multi-Head Self-Attention in Pytorch now. I looked at a couple of implementations and they seem a bit wrong, or at least I am not sure why it is done the way it is. They would ...

Germans Savcisens

158

asked Dec 17, 2020 at 11:45

2 votes

2 answers

1k views

Translation between different tokenizers

Sorry if this question is too basic to be asked here. I tried but I couldn't find solutions. I'm now working on an NLP project that requires using two different models (BART for summarization and BERT ...

exitialium

43

asked Jun 15, 2022 at 3:12

2 votes

1 answer

2k views

BigBird, or Sparse self-attention: How to implement a sparse matrix?

This question is related to the new paper: Big Bird: Transformers for Longer Sequences. Mainly, about the implementation of the Sparse Attention (that is specified in the Supplemental material, part D)...

Germans Savcisens

158

asked Dec 25, 2020 at 17:22

2 votes

1 answer

1k views

cannot import name 'DISTILBERT_PRETRAINED_MODEL_ARCHIVE_MAP' from 'transformers.modeling_distilbert'

I am trying to train the distil BERT model for Question Answering purpose. I have installed simple transformers and everything but when I try to run the following command: model = ...

swapnil agashe

69

asked Jun 3, 2020 at 12:33

2 votes

0 answers

449 views

Updating model parameters of two models in one optimizer optimizes just one neural network

I'm trying to train two sequential neural networks in one optimizer. I read that this can be done by defining the optimizer as follows: optimizer_domain = torch.optim.SGD(list(sentences_model....

thomasb

23

asked Jan 18, 2023 at 16:01

2 votes

1 answer

2k views

How to compute the Hessian of a large neural network in PyTorch?

How to compute the Hessian matrix of a large neural network or transformer model like BERT in PyTorch? I know torch.autograd.functional.hessian, but it seems like it only calculates the Hessian of a ...

Yan Pan

21

asked Mar 14, 2022 at 16:50

1 vote

2 answers

574 views

Fine tuning BERT with my own entities/labels

i would like to fine tune A BERT model with my own labels, like [COLOR, MATERIAL] and not the normal "NAME", "ORG". I'm following this Colab: https://colab.research.google.com/drive/...

mrqwerty91

170

asked Jun 8, 2020 at 13:39

1 vote

1 answer

680 views

Overfitting training data but still improving on test data

My machine learning model massively overfits the training data but still performs quite well on test data. When using a neural network approach every iteration increases the accuracy on the test set ...

Nick Sorros

121

asked Nov 7, 2019 at 17:04

1 vote

1 answer

2k views

ValueError: Target size (torch.Size([32])) must be the same as input size (torch.Size([32, 3]))

I've looked at some explanation. Here But I understand what is going wrong I think, but my error occurs not at the loss. For example the snippet where the error is occurring is the line outputs = ...

Jeff Jefferson

23

asked Jul 22, 2021 at 4:42

1 vote

0 answers

184 views

Concatenating two pre-trained BERT

max_length = 50 tokenizer = RobertaTokenizer.from_pretrained('roberta-large', do_lower_case=True) encodings = tokenizer.batch_encode_plus(comments,max_length=max_length,pad_to_max_length=True, ...

Mamad_Knight

31

asked Jan 2, 2021 at 6:57

1 vote

0 answers

61 views

How to Local Bert to Bert_module_hub

I just want to my Local Bert to here: bert_module = hub.Module( BERT_MODEL_HUB, trainable=True) How to add my local bert? i have Tensorflow==1.15 and python==3.7 def create_model(is_predicting, ...

Bold Ganbaatar

41

asked Sep 25, 2020 at 3:03

1 vote

0 answers

2k views

Predicting NER with BertForTokenClassification model

I have built my model using this tutorial on NER with bert: https://www.depends-on-the-definition.com/named-entity-recognition-with-bert/#resources However, I could not figure out how to parse in a ...

Hi_there

41

asked Mar 2, 2020 at 14:43

0 votes

1 answer

2k views

Using KerasClassifier for training neural network

I created a simple neural network for binary spam/ham text classification using pretrained BERT transformer. The current pure-keras implementation works fine. I wanted however to plot certain metrics ...

lazarea

1,219

asked Feb 27, 2022 at 20:39

0 votes

1 answer

1k views

BERT model not giving loss or logits when training in an epoch

I'm trying to train the model. This is the epoch loop seed_val = 17 random.seed(seed_val) np.random.seed(seed_val) torch.manual_seed(seed_val) torch.cuda.manual_seed_all(seed_val) device = torch....

Jeff Jefferson

23

asked Jul 23, 2021 at 5:21

0 votes

1 answer

1k views

Pytorch Siamese NN with BERT for sentence matching

I'm trying to build a Siamese neural network using pytorch in which I feed BERT word embeddings and trying to find whether two sentences are similar or not (imagine duplicate posts matching, product ...

Antonis Karvelas

125

asked Mar 17, 2021 at 17:36

0 votes

1 answer

566 views

Sequence labeling with BERT for words position

If I have a set of sentences and in these sentences there are some dependencies between words. I want to train BERT to predict which words have dependencies with others. Example, If I have this ...

Minions

5,327

asked Jan 9, 2020 at 17:02

0 votes

0 answers

68 views

Neural network classifier always outputs the same class

I'm coding a neural network for recommendation system using pytorch. The item's metadata is a textual description and user's metadata are age and gender (binary values). I used Bert encoder (with ...

milad heidari

21

asked Jun 24, 2023 at 4:43

0 votes

1 answer

3k views

How does BERT loss function works?

I'm confused about how cross-entropy works in bert LM. To calculate loss function we need the truth labels of masks. But we don't have the vector representation of the truth labels and the predictions ...

kowser66

155

asked Jun 16, 2022 at 5:36

0 votes

1 answer

3k views

RuntimeError: shape '[4, 512]' is invalid for input of size 1024 while while evaluating test data

I am trying XLnet over Jigsaw toxic dataset. When I train my data with input_ids = d["input_ids"].reshape(4,512).to(device) # batch size x seq length it trains perfectly. But when I try to ...

RajB009

427

asked Aug 6, 2021 at 8:09

0 votes

0 answers

162 views

How to use BigBirdModel to create a neural network in Python?

I am trying to create a network with tenserflow and BigBird. from transformers import BigBirdModel import tensorflow as tf classic_model = BigBirdModel.from_pretrained('google/bigbird-roberta-base') ...

Josephine Fraser

1

asked Jun 11, 2021 at 18:54

0 votes

0 answers

186 views

problem in training RNN using bert embeddings

I have been working with bert embedding using a neural network model for the sentiment classification task. during model fit it's giving indices error, and I am still new to this so could not able to ...

shankar

39

asked Nov 5, 2020 at 14:30

-1 votes

1 answer

793 views

Should feature embeddings be taken before or after dropout layer in neural network?

I am training a binary text classification model using BERT as follows: def create_model(): text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text') preprocessed_text = ...

Jane Sully

3,247

asked Sep 13, 2021 at 22:03

Collectives™ on Stack Overflow

All Questions

How is the number of parameters be calculated in BERT model?

Fine-tune Bert for specific domain (unsupervised)

How to use trained BERT model checkpoints for prediction?

Word embeddings with BERT and map tensors to words

what is the difference between pooled output and sequence output in bert layer?

ValueError: Unknown layer: TFBertModel. Please ensure this object is passed to the `custom_objects` argument

How can i get all outputs of the last transformer encoder in bert pretrained model and not just the cls token output?

How to update vocabulary of pre-trained bert model while doing my own training task?

using nn.Cross entropy between outputs and target label

Multi Head Attention: Correct implementation of Linear Transformations of Q, K, V

Translation between different tokenizers

BigBird, or Sparse self-attention: How to implement a sparse matrix?

cannot import name 'DISTILBERT_PRETRAINED_MODEL_ARCHIVE_MAP' from 'transformers.modeling_distilbert'

Updating model parameters of two models in one optimizer optimizes just one neural network

How to compute the Hessian of a large neural network in PyTorch?

Fine tuning BERT with my own entities/labels

Overfitting training data but still improving on test data

ValueError: Target size (torch.Size([32])) must be the same as input size (torch.Size([32, 3]))

Concatenating two pre-trained BERT

How to Local Bert to Bert_module_hub

Predicting NER with BertForTokenClassification model

Using KerasClassifier for training neural network

BERT model not giving loss or logits when training in an epoch

Pytorch Siamese NN with BERT for sentence matching

Sequence labeling with BERT for words position

Neural network classifier always outputs the same class

How does BERT loss function works?

RuntimeError: shape '[4, 512]' is invalid for input of size 1024 while while evaluating test data

How to use BigBirdModel to create a neural network in Python?

problem in training RNN using bert embeddings

Should feature embeddings be taken before or after dropout layer in neural network?

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags