All Questions

Filter by
Sorted by
Tagged with
12 votes
2 answers
12k views

How to train BERT from scratch on a new domain for both MLM and NSP?

I’m trying to train BERT model from scratch using my own dataset using HuggingFace library. I would like to train the model in a way that it has the exact architecture of the original BERT model. In ...
tlqn's user avatar
  • 379
12 votes
2 answers
5k views

Get probability of multi-token word in MASK position

It is relatively easy to get a token's probability according to a language model, as the snippet below shows. You can get the output of a model, restrict yourself to the output of the masked token, ...
Bram Vanroy's user avatar
  • 27.7k
11 votes
1 answer
2k views

what's the difference between "self-attention mechanism" and "full-connection" layer?

I am confused with these two structures. In theory, the output of them are all connected to their input. what magic make 'self-attention mechanism' is more powerful than the full-connection layer?
tom_cat's user avatar
  • 325
11 votes
2 answers
13k views

How to use Transformers for text classification?

I have two questions about how to use Tensorflow implementation of the Transformers for text classifications. First, it seems people mostly used only the encoder layer to do the text classification ...
khemedi's user avatar
  • 806
9 votes
2 answers
3k views

BERT output not deterministic

BERT output is not deterministic. I expect the output values are deterministic when I put a same input, but my bert model the values are changing. Sounds awkwardly, the same value is returned twice, ...
Keanu Paik's user avatar
9 votes
1 answer
8k views

How do I use BertForMaskedLM or BertModel to calculate perplexity of a sentence?

I want to use BertForMaskedLM or BertModel to calculate perplexity of a sentence, so I write code like this: import numpy as np import torch import torch.nn as nn from transformers import ...
Kaim hong's user avatar
  • 113
8 votes
1 answer
9k views

How to calculate perplexity of a sentence using huggingface masked language models?

I have several masked language models (mainly Bert, Roberta, Albert, Electra). I also have a dataset of sentences. How can I get the perplexity of each sentence? From the huggingface documentation ...
Penguin's user avatar
  • 2,148
8 votes
1 answer
4k views

Uni-directional Transformer VS Bi-directional BERT

I just finished reading the Transformer paper and BERT paper. But couldn't figure out why Transformer is uni-directional and BERT is bi-directional as mentioned in BERT paper. As they don't use ...
JShen's user avatar
  • 409
7 votes
2 answers
6k views

The essence of learnable positional embedding? Does embedding improve outcomes better?

I was recently reading the bert source code from the hugging face project. I noticed that the so-called "learnable position encoding" seems to refer to a specific nn.Parameter layer when it ...
AdamHommer's user avatar
7 votes
1 answer
14k views

How padding in huggingface tokenizer works?

I tried following tokenization example: tokenizer = BertTokenizer.from_pretrained(MODEL_TYPE, do_lower_case=True) sent = "I hate this. Not that.", _tokenized = tokenizer(sent, ...
MsA's user avatar
  • 2,829
5 votes
1 answer
12k views

How to get cosine similarity of word embedding from BERT model

I was interesting in how to get the similarity of word embedding in different sentences from BERT model (actually, that means words have different meanings in different scenarios). For example: sent1 =...
Mark J.'s user avatar
  • 143
5 votes
3 answers
32k views

Unable to pip install -U sentence-transformers

I am unable to do: pip install -U sentence-transformers. I get this message on Anaconda Prompt: ERROR: Could not find a version that satisfies the requirement torch>=1.0.1 (from sentence-transformers) ...
Kay's user avatar
  • 667
5 votes
1 answer
3k views

How does BertForSequenceClassification classify on the CLS vector?

Background: Following along with this question when using bert to classify sequences the model uses the "[CLS]" token representing the classification task. According to the paper: The first ...
Kevin's user avatar
  • 3,159
5 votes
3 answers
1k views

BERT token vs. embedding

I understand that WordPiece is used to break text into tokens. And I understand that, somewhere in BERT, the model maps tokens into token embeddings that represent the meaning of the tokens. But ...
i82much's user avatar
  • 61
5 votes
2 answers
19k views

ERROR: file:///content does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found

https://colab.research.google.com/drive/11u6leEKvqE0CCbvDHHKmCxmW5GxyjlBm?usp=sharing setup.py file is in transformers folder(root directory). But this error occurs when I run !git clone https://...
Amrutha k's user avatar
4 votes
1 answer
5k views

BertModel or BertForPreTraining

I want to use Bert only for embedding and use the Bert output as an input for a classification net that I will build from scratch. I am not sure if I want to do finetuning for the model. I think the ...
Amit S's user avatar
  • 243
4 votes
1 answer
2k views

How to stop data shuffling while training the HuggingFace BERT model?

I want to train a BERT transformer model using the HuggingFace implementation/library. During training, HuggingFace shuffles the training data for each epoch, but I don't want to shuffle the data. For ...
Nusrat Jahan's user avatar
4 votes
1 answer
3k views

how to train a bert model from scratch with huggingface?

i find a answer of training model from scratch in this question: How to train BERT from scratch on a new domain for both MLM and NSP? one answer use Trainer and TrainingArguments like this: from ...
Jack.Sparrow's user avatar
4 votes
1 answer
3k views

How to process TransformerEncoderLayer output in pytorch

I am trying to use bio-bert sentence embeddings for text classification of longer pieces of text. As it currently stands I standardize the number of sentences in each piece of text (some sentences are ...
Wackaman's user avatar
  • 161
3 votes
3 answers
1k views

String comparison with BERT seems to ignore "not" in sentence

I implemented a string comparison method using SentenceTransformers and BERT like following from sentence_transformers import SentenceTransformer from sklearn.metrics.pairwise import cosine_similarity ...
Tiago Bachiega de Almeida's user avatar
3 votes
2 answers
7k views

Having 6 labels instead of 2 in Hugging Face BertForSequenceClassification

I was just wondering if it is possibel to extend the HuggingFace BertForSequenceClassification model to more than 2 labels. The docs say, we can pass positional arguments, but it seems like "labels" ...
Alex's user avatar
  • 73
3 votes
1 answer
1k views

BERT Heads Count

From the literature I read, Bert Base has 12 encoder layers and 12 attention heads. Bert Large has 24 encoder layers and 16 attention heads. Why is Bert large having 16 attentions heads ?
koayst's user avatar
  • 2,115
3 votes
1 answer
4k views

How to map token indices from the SQuAD data to tokens from BERT tokenizer?

I am using the SQuaD dataset for answer span selection. After using the BertTokenizer to tokenize the passages, for some samples, the start and end indices of the answer don't match the real answer ...
KoalaJ's user avatar
  • 145
3 votes
1 answer
2k views

Must the vocab size must math the vocab_size in bert_config.json exactly?

I am seeing someone other's BERT model, in which the vocab.txt's size is 22110, but the vocab_size parameter's value is 21128 in bert_config.json. I understand that these two numbers must be exactly ...
marlon's user avatar
  • 6,847
3 votes
1 answer
559 views

Why doesn't BertForMaskedLM generate right masked tokens?

I am testing this piece of code: from transformers import BertTokenizer, BertModel, BertForMaskedLM tokenizer = BertTokenizer.from_pretrained("hfl/chinese-roberta-wwm-ext") model = ...
marlon's user avatar
  • 6,847
3 votes
1 answer
1k views

TFBertMainLayer gets less accuracy compared to TFBertModel

I had a problem with saving weights of TFBertModel wrapped in Keras. the problem is described here in GitHub issue and here in Stack Overflow.The solution proposed in both cases is to use config = ...
Marzi Heidari's user avatar
3 votes
1 answer
545 views

how can we get the attention scores of multimodal models via hugging face library?

I was wondering if we could get the attention scores of any multimodal model using the api provided by the hugging face library, as it's relatively easier to get such scores of normal language bert ...
lazytux's user avatar
  • 167
3 votes
1 answer
2k views

Using Hugging-face transformer with arguments in pipeline

I am working on using a transformer. Pipeline to get BERT embeddings to my input. using this without a pipeline i am able to get constant outputs but not with pipeline since I was not able to pass ...
Israel-abebe's user avatar
3 votes
1 answer
2k views

BERT transformer KeyError: 3

I am quite new to the BERT language model. I am currently using the Huggingface transformer libraryand i'm encountering an error when encoding the inputs. The goal of the model is to classify fake ...
Jasper van Gool's user avatar
3 votes
1 answer
1k views

Transformer/BERT token prediction vocabulary (filtering the special tokens out of the set of possible tokens)

With the Transformer model, especially with the BERT, does it make sense (and would it be statistically correct) to programmatically forbid the model to result with the special tokens as predictions? ...
István Ketykó's user avatar
3 votes
0 answers
3k views

How to get tokens to words in BERT tokenizer

I have a list, using higgingface bert tokenizer I can get the mapping numerical representation. X = ['[CLS]', '[MASK]', 'love', 'this', '[SEP]'] tokens = tokenizer.convert_tokens_to_ids(X) toekns: [...
kowser66's user avatar
  • 155
3 votes
1 answer
224 views

Training RoBerta using transformers on masked language task giving weird results?

I trained a RoBERTa model following this colab - https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb#scrollTo=XaFAsB_fnU3K Here is how my data looked:...
Shawn's user avatar
  • 301
2 votes
1 answer
1k views

How does BERT utilize TPU memories?

README in the Google's BERT repo says, even a single sentence of length 512 can not sit in a 12 GB Titan X for the BERT-Large model. But in the BERT paper, it says 64 TPU chips are used to train BERT-...
soloice's user avatar
  • 1,000
2 votes
1 answer
2k views

What does "fine-tuning of a BERT model" refer to?

I was not able to understand one thing , when it says "fine-tuning of BERT", what does it actually mean: Are we retraining the entire model again with new data. Or are we just training top ...
ashish chandan's user avatar
2 votes
1 answer
1k views

Backpropagation in bert

i would like to know when people say pretrained bert model, is it only the final classification neural network is trained Or Is there any update inside transformer through back propagation along with ...
prog's user avatar
  • 1,073
2 votes
2 answers
2k views

Huggingface Transformers - AttributeError: 'MrpcProcessor' object has no attribute 'tfds_map'

When using Hugginface Transformers on GLUE task, I've got the error AttributeError: 'MrpcProcessor' object has no attribute 'tfds_map' I suspect a problem of compatibility.
Claude COULOMBE's user avatar
2 votes
1 answer
2k views

Are these normal speed of Bert Pretrained Model Inference in PyTorch

I am testing Bert base and Bert distilled model in Huggingface with 4 scenarios of speeds, batch_size = 1: 1) bert-base-uncased: 154ms per request 2) bert-base-uncased with quantifization: 94ms per ...
marlon's user avatar
  • 6,847
2 votes
1 answer
966 views

How to know if a word belong to a Transformer model?

I use the python library sentence_transformers with the models RoBERTa and FlauBERT. I use cosine scores to compute similarity but for some words it doesn't work well. Those words seems to be the one ...
Nathan Redin's user avatar
2 votes
1 answer
954 views

Do I need to train on my own data in using bert model as an embedding vector?

When I try the huggingface models and it gives the following error message: from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ...
marlon's user avatar
  • 6,847
2 votes
0 answers
1k views

Unable to solve RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

I am facing this error "RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)" when trying to fine tune the model "deepset/bert-base-cased-squad2&...
tt40kiwi's user avatar
  • 411
2 votes
0 answers
369 views

How do I implement a knowledge base in a Huggingface model?

I made a knowledge base using COMET on the Atomic knowledge graph, using this tutorial. I would like to include this knowledge in a regular pre-trained BERT model from HuggingFace to see how the model ...
IneG's user avatar
  • 83
2 votes
0 answers
559 views

IndexError: too many indices for tensor of dimension 2: When adding custom layer on HuggingFace model

I've tried to add custom layers to HuggingFace Transformer model on binary classification task. As an absolute beginner, I tried to follow this tutorial Here's the custom model class CustomModel(nn....
ntdev's user avatar
  • 61
2 votes
1 answer
2k views

How to compute the Hessian of a large neural network in PyTorch?

How to compute the Hessian matrix of a large neural network or transformer model like BERT in PyTorch? I know torch.autograd.functional.hessian, but it seems like it only calculates the Hessian of a ...
Yan Pan's user avatar
  • 21
2 votes
0 answers
1k views

Finetuning Transformers in PyTorch (BERT, RoBERTa, etc.)

Alright. So there are multiple methods to fine tune a transformer: freeze transformer's parameters and only its final outputs are fed into another model (user trains this "another" model), ...
brucewlee's user avatar
2 votes
0 answers
899 views

HuggingFace Tranfsormers BERTForSequenceClassification with Trainer: How to do multi-output regression?

I am trying to fine-tune a BERT model on a dataset of sentences that has two different real-valued attributes for each sentence. For each one, there is a Valence score and an Arousal score, with real ...
Slna's user avatar
  • 55
2 votes
0 answers
577 views

How to finetune distillbart for abstractive summarization using Gigaword or Cnn dailymail?

I would like to ask about how to finetune distillbart on gigaword and cnn dailymail with the starting checkpoint distilbart-cnn-12-6. I did use the gigaword dataset provided by tensorflow but it ...
Moon Days's user avatar
2 votes
0 answers
1k views

Python "Can't pickle local object" exception during BertModel training

I am using simpletransformers.classification to train a Bert moder to classify some text inputs. Here is my code. from simpletransformers.classification import ClassificationModel import torch ...
OmerArslan's user avatar
2 votes
1 answer
1k views

How to retrieve attention weight alignment for tokens using transformer (BERT) model?

I am working on text classification with transformer models (PyTorch, Huggingface, running on GPU). I have already my model and my training loop and it works fine but to better understand wrong ...
Lorra's user avatar
  • 21
1 vote
3 answers
784 views

huggingface transformer issue

I used huggingface transformer, but I got some issues like below. How can I handle this problem? training_args = TrainingArguments( output_dir='./.checkpoints', num_train_epochs=config....
UNGGI LEE's user avatar
1 vote
2 answers
14k views

transformers and BERT downloading to your local machine

I am trying to replicates the code from this page. At my workplace we have access to transformers and pytorch library but cannot connect to internet from our python environment. Could anyone help with ...
user2543622's user avatar
  • 6,258