Highest scored 'bert-language-model+transformer-model' questions

12 votes

2 answers

12k views

How to train BERT from scratch on a new domain for both MLM and NSP?

I’m trying to train BERT model from scratch using my own dataset using HuggingFace library. I would like to train the model in a way that it has the exact architecture of the original BERT model. In ...

tlqn

379

asked Jan 9, 2021 at 19:46

12 votes

2 answers

5k views

Get probability of multi-token word in MASK position

It is relatively easy to get a token's probability according to a language model, as the snippet below shows. You can get the output of a model, restrict yourself to the output of the masked token, ...

Bram Vanroy

27.7k

asked Dec 21, 2019 at 9:24

11 votes

1 answer

2k views

what's the difference between "self-attention mechanism" and "full-connection" layer?

I am confused with these two structures. In theory, the output of them are all connected to their input. what magic make 'self-attention mechanism' is more powerful than the full-connection layer?

tom_cat

325

asked Oct 6, 2020 at 2:50

11 votes

2 answers

13k views

How to use Transformers for text classification?

I have two questions about how to use Tensorflow implementation of the Transformers for text classifications. First, it seems people mostly used only the encoder layer to do the text classification ...

khemedi

806

asked Sep 26, 2019 at 19:18

9 votes

2 answers

3k views

BERT output not deterministic

BERT output is not deterministic. I expect the output values are deterministic when I put a same input, but my bert model the values are changing. Sounds awkwardly, the same value is returned twice, ...

Keanu Paik

314

asked Jun 17, 2019 at 23:17

9 votes

1 answer

8k views

How do I use BertForMaskedLM or BertModel to calculate perplexity of a sentence?

I want to use BertForMaskedLM or BertModel to calculate perplexity of a sentence, so I write code like this: import numpy as np import torch import torch.nn as nn from transformers import ...

Kaim hong

113

asked Jul 22, 2020 at 9:07

8 votes

1 answer

9k views

How to calculate perplexity of a sentence using huggingface masked language models?

I have several masked language models (mainly Bert, Roberta, Albert, Electra). I also have a dataset of sentences. How can I get the perplexity of each sentence? From the huggingface documentation ...

Penguin

2,148

asked Dec 23, 2021 at 15:50

8 votes

1 answer

4k views

Uni-directional Transformer VS Bi-directional BERT

I just finished reading the Transformer paper and BERT paper. But couldn't figure out why Transformer is uni-directional and BERT is bi-directional as mentioned in BERT paper. As they don't use ...

JShen

409

asked Mar 12, 2019 at 4:23

7 votes

2 answers

6k views

The essence of learnable positional embedding? Does embedding improve outcomes better?

I was recently reading the bert source code from the hugging face project. I noticed that the so-called "learnable position encoding" seems to refer to a specific nn.Parameter layer when it ...

AdamHommer

728

asked Jul 25, 2022 at 17:37

7 votes

1 answer

14k views

How padding in huggingface tokenizer works?

I tried following tokenization example: tokenizer = BertTokenizer.from_pretrained(MODEL_TYPE, do_lower_case=True) sent = "I hate this. Not that.", _tokenized = tokenizer(sent, ...

MsA

2,829

asked Nov 22, 2021 at 14:43

5 votes

1 answer

12k views

How to get cosine similarity of word embedding from BERT model

I was interesting in how to get the similarity of word embedding in different sentences from BERT model (actually, that means words have different meanings in different scenarios). For example: sent1 =...

Mark J.

143

asked Nov 21, 2021 at 19:39

5 votes

3 answers

32k views

Unable to pip install -U sentence-transformers

I am unable to do: pip install -U sentence-transformers. I get this message on Anaconda Prompt: ERROR: Could not find a version that satisfies the requirement torch>=1.0.1 (from sentence-transformers) ...

Kay

667

asked May 25, 2020 at 0:14

5 votes

1 answer

3k views

How does BertForSequenceClassification classify on the CLS vector?

Background: Following along with this question when using bert to classify sequences the model uses the "[CLS]" token representing the classification task. According to the paper: The first ...

Kevin

3,159

asked Jul 17, 2020 at 20:14

5 votes

3 answers

1k views

BERT token vs. embedding

I understand that WordPiece is used to break text into tokens. And I understand that, somewhere in BERT, the model maps tokens into token embeddings that represent the meaning of the tokens. But ...

i82much

61

asked Sep 27, 2023 at 18:03

5 votes

2 answers

19k views

ERROR: file:///content does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found

https://colab.research.google.com/drive/11u6leEKvqE0CCbvDHHKmCxmW5GxyjlBm?usp=sharing setup.py file is in transformers folder(root directory). But this error occurs when I run !git clone https://...

Amrutha k

97

asked May 3, 2023 at 17:21

4 votes

1 answer

5k views

BertModel or BertForPreTraining

I want to use Bert only for embedding and use the Bert output as an input for a classification net that I will build from scratch. I am not sure if I want to do finetuning for the model. I think the ...

Amit S

243

asked Mar 12, 2021 at 7:53

4 votes

1 answer

2k views

How to stop data shuffling while training the HuggingFace BERT model?

I want to train a BERT transformer model using the HuggingFace implementation/library. During training, HuggingFace shuffles the training data for each epoch, but I don't want to shuffle the data. For ...

Nusrat Jahan

41

asked Nov 12, 2022 at 18:16

4 votes

1 answer

3k views

how to train a bert model from scratch with huggingface?

i find a answer of training model from scratch in this question: How to train BERT from scratch on a new domain for both MLM and NSP? one answer use Trainer and TrainingArguments like this: from ...

Jack.Sparrow

141

asked Sep 10, 2021 at 3:30

4 votes

1 answer

3k views

How to process TransformerEncoderLayer output in pytorch

I am trying to use bio-bert sentence embeddings for text classification of longer pieces of text. As it currently stands I standardize the number of sentences in each piece of text (some sentences are ...

Wackaman

161

asked Dec 7, 2020 at 22:12

3 votes

3 answers

1k views

String comparison with BERT seems to ignore "not" in sentence

I implemented a string comparison method using SentenceTransformers and BERT like following from sentence_transformers import SentenceTransformer from sklearn.metrics.pairwise import cosine_similarity ...

Tiago Bachiega de Almeida

121

asked Sep 7, 2021 at 16:18

3 votes

2 answers

7k views

Having 6 labels instead of 2 in Hugging Face BertForSequenceClassification

I was just wondering if it is possibel to extend the HuggingFace BertForSequenceClassification model to more than 2 labels. The docs say, we can pass positional arguments, but it seems like "labels" ...

Alex

73

asked Jun 11, 2020 at 15:23

3 votes

1 answer

1k views

BERT Heads Count

From the literature I read, Bert Base has 12 encoder layers and 12 attention heads. Bert Large has 24 encoder layers and 16 attention heads. Why is Bert large having 16 attentions heads ?

koayst

2,115

asked Oct 4, 2021 at 13:29

3 votes

1 answer

4k views

How to map token indices from the SQuAD data to tokens from BERT tokenizer?

I am using the SQuaD dataset for answer span selection. After using the BertTokenizer to tokenize the passages, for some samples, the start and end indices of the answer don't match the real answer ...

KoalaJ

145

asked Mar 17, 2021 at 3:21

3 votes

1 answer

2k views

Must the vocab size must math the vocab_size in bert_config.json exactly?

I am seeing someone other's BERT model, in which the vocab.txt's size is 22110, but the vocab_size parameter's value is 21128 in bert_config.json. I understand that these two numbers must be exactly ...

marlon

6,847

asked Jun 15, 2021 at 4:07

3 votes

1 answer

559 views

Why doesn't BertForMaskedLM generate right masked tokens?

I am testing this piece of code: from transformers import BertTokenizer, BertModel, BertForMaskedLM tokenizer = BertTokenizer.from_pretrained("hfl/chinese-roberta-wwm-ext") model = ...

marlon

6,847

asked May 24, 2021 at 7:48

3 votes

1 answer

1k views

TFBertMainLayer gets less accuracy compared to TFBertModel

I had a problem with saving weights of TFBertModel wrapped in Keras. the problem is described here in GitHub issue and here in Stack Overflow.The solution proposed in both cases is to use config = ...

Marzi Heidari

2,720

asked Jun 20, 2020 at 6:37

3 votes

1 answer

545 views

how can we get the attention scores of multimodal models via hugging face library?

I was wondering if we could get the attention scores of any multimodal model using the api provided by the hugging face library, as it's relatively easier to get such scores of normal language bert ...

lazytux

167

asked May 28, 2022 at 9:51

3 votes

1 answer

2k views

Using Hugging-face transformer with arguments in pipeline

I am working on using a transformer. Pipeline to get BERT embeddings to my input. using this without a pipeline i am able to get constant outputs but not with pipeline since I was not able to pass ...

Israel-abebe

548

asked Sep 15, 2021 at 16:47

3 votes

1 answer

2k views

BERT transformer KeyError: 3

I am quite new to the BERT language model. I am currently using the Huggingface transformer libraryand i'm encountering an error when encoding the inputs. The goal of the model is to classify fake ...

Jasper van Gool

31

asked May 14, 2021 at 8:58

3 votes

1 answer

1k views

Transformer/BERT token prediction vocabulary (filtering the special tokens out of the set of possible tokens)

With the Transformer model, especially with the BERT, does it make sense (and would it be statistically correct) to programmatically forbid the model to result with the special tokens as predictions? ...

István Ketykó

31

asked Feb 13, 2021 at 23:23

3 votes

0 answers

3k views

How to get tokens to words in BERT tokenizer

I have a list, using higgingface bert tokenizer I can get the mapping numerical representation. X = ['[CLS]', '[MASK]', 'love', 'this', '[SEP]'] tokens = tokenizer.convert_tokens_to_ids(X) toekns: [...

kowser66

155

asked Mar 21, 2022 at 4:08

3 votes

1 answer

224 views

Training RoBerta using transformers on masked language task giving weird results?

I trained a RoBERTa model following this colab - https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb#scrollTo=XaFAsB_fnU3K Here is how my data looked:...

Shawn

301

asked Jun 9, 2020 at 6:02

2 votes

1 answer

1k views

How does BERT utilize TPU memories?

README in the Google's BERT repo says, even a single sentence of length 512 can not sit in a 12 GB Titan X for the BERT-Large model. But in the BERT paper, it says 64 TPU chips are used to train BERT-...

soloice

1,000

asked May 12, 2019 at 18:26

2 votes

1 answer

2k views

What does "fine-tuning of a BERT model" refer to?

I was not able to understand one thing , when it says "fine-tuning of BERT", what does it actually mean: Are we retraining the entire model again with new data. Or are we just training top ...

ashish chandan

65

asked Jun 1, 2021 at 5:46

2 votes

1 answer

1k views

Backpropagation in bert

i would like to know when people say pretrained bert model, is it only the final classification neural network is trained Or Is there any update inside transformer through back propagation along with ...

prog

1,073

asked Feb 3, 2021 at 17:15

2 votes

2 answers

2k views

Huggingface Transformers - AttributeError: 'MrpcProcessor' object has no attribute 'tfds_map'

When using Hugginface Transformers on GLUE task, I've got the error AttributeError: 'MrpcProcessor' object has no attribute 'tfds_map' I suspect a problem of compatibility.

Claude COULOMBE

3,630

asked Dec 24, 2019 at 8:21

2 votes

1 answer

2k views

Are these normal speed of Bert Pretrained Model Inference in PyTorch

I am testing Bert base and Bert distilled model in Huggingface with 4 scenarios of speeds, batch_size = 1: 1) bert-base-uncased: 154ms per request 2) bert-base-uncased with quantifization: 94ms per ...

marlon

6,847

asked May 26, 2021 at 6:07

2 votes

1 answer

966 views

How to know if a word belong to a Transformer model?

I use the python library sentence_transformers with the models RoBERTa and FlauBERT. I use cosine scores to compute similarity but for some words it doesn't work well. Those words seems to be the one ...

Nathan Redin

43

asked Mar 28, 2022 at 15:36

2 votes

1 answer

954 views

Do I need to train on my own data in using bert model as an embedding vector?

When I try the huggingface models and it gives the following error message: from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ...

marlon

6,847

asked May 18, 2021 at 23:14

2 votes

0 answers

1k views

Unable to solve RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

I am facing this error "RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)" when trying to fine tune the model "deepset/bert-base-cased-squad2&...

tt40kiwi

411

asked Sep 4, 2023 at 9:38

2 votes

0 answers

369 views

How do I implement a knowledge base in a Huggingface model?

I made a knowledge base using COMET on the Atomic knowledge graph, using this tutorial. I would like to include this knowledge in a regular pre-trained BERT model from HuggingFace to see how the model ...

IneG

83

asked Oct 25, 2022 at 8:21

2 votes

0 answers

559 views

IndexError: too many indices for tensor of dimension 2: When adding custom layer on HuggingFace model

I've tried to add custom layers to HuggingFace Transformer model on binary classification task. As an absolute beginner, I tried to follow this tutorial Here's the custom model class CustomModel(nn....

ntdev

61

asked May 9, 2022 at 15:10

2 votes

1 answer

2k views

How to compute the Hessian of a large neural network in PyTorch?

How to compute the Hessian matrix of a large neural network or transformer model like BERT in PyTorch? I know torch.autograd.functional.hessian, but it seems like it only calculates the Hessian of a ...

Yan Pan

21

asked Mar 14, 2022 at 16:50

2 votes

0 answers

1k views

Finetuning Transformers in PyTorch (BERT, RoBERTa, etc.)

Alright. So there are multiple methods to fine tune a transformer: freeze transformer's parameters and only its final outputs are fed into another model (user trains this "another" model), ...

brucewlee

31

asked Feb 19, 2022 at 4:16

2 votes

0 answers

899 views

HuggingFace Tranfsormers BERTForSequenceClassification with Trainer: How to do multi-output regression?

I am trying to fine-tune a BERT model on a dataset of sentences that has two different real-valued attributes for each sentence. For each one, there is a Valence score and an Arousal score, with real ...

Slna

55

asked Jun 24, 2021 at 18:04

2 votes

0 answers

577 views

How to finetune distillbart for abstractive summarization using Gigaword or Cnn dailymail?

I would like to ask about how to finetune distillbart on gigaword and cnn dailymail with the starting checkpoint distilbart-cnn-12-6. I did use the gigaword dataset provided by tensorflow but it ...

Moon Days

17

asked Jul 19, 2020 at 15:38

2 votes

0 answers

1k views

Python "Can't pickle local object" exception during BertModel training

I am using simpletransformers.classification to train a Bert moder to classify some text inputs. Here is my code. from simpletransformers.classification import ClassificationModel import torch ...

OmerArslan

374

asked May 28, 2020 at 11:24

2 votes

1 answer

1k views

How to retrieve attention weight alignment for tokens using transformer (BERT) model?

I am working on text classification with transformer models (PyTorch, Huggingface, running on GPU). I have already my model and my training loop and it works fine but to better understand wrong ...

Lorra

21

asked Feb 7, 2023 at 14:45

1 vote

3 answers

784 views

huggingface transformer issue

I used huggingface transformer, but I got some issues like below. How can I handle this problem? training_args = TrainingArguments( output_dir='./.checkpoints', num_train_epochs=config....

UNGGI LEE

69

asked Jun 6, 2022 at 15:21

1 vote

2 answers

14k views

transformers and BERT downloading to your local machine

I am trying to replicates the code from this page. At my workplace we have access to transformers and pytorch library but cannot connect to internet from our python environment. Could anyone help with ...

user2543622

6,258

asked Sep 22, 2021 at 15:07

Collectives™ on Stack Overflow

All Questions

Related Tags