All Questions
Tagged with bert-language-model transformer-model
135
questions
12
votes
2
answers
12k
views
How to train BERT from scratch on a new domain for both MLM and NSP?
I’m trying to train BERT model from scratch using my own dataset using HuggingFace library. I would like to train the model in a way that it has the exact architecture of the original BERT model.
In ...
12
votes
2
answers
5k
views
Get probability of multi-token word in MASK position
It is relatively easy to get a token's probability according to a language model, as the snippet below shows. You can get the output of a model, restrict yourself to the output of the masked token, ...
11
votes
1
answer
2k
views
what's the difference between "self-attention mechanism" and "full-connection" layer?
I am confused with these two structures. In theory, the output of them are all connected to their input. what magic make 'self-attention mechanism' is more powerful than the full-connection layer?
11
votes
2
answers
13k
views
How to use Transformers for text classification?
I have two questions about how to use Tensorflow implementation of the Transformers for text classifications.
First, it seems people mostly used only the encoder layer to do the text classification ...
9
votes
2
answers
3k
views
BERT output not deterministic
BERT output is not deterministic.
I expect the output values are deterministic when I put a same input, but my bert model the values are changing. Sounds awkwardly, the same value is returned twice, ...
9
votes
1
answer
8k
views
How do I use BertForMaskedLM or BertModel to calculate perplexity of a sentence?
I want to use BertForMaskedLM or BertModel to calculate perplexity of a sentence, so I write code like this:
import numpy as np
import torch
import torch.nn as nn
from transformers import ...
8
votes
1
answer
9k
views
How to calculate perplexity of a sentence using huggingface masked language models?
I have several masked language models (mainly Bert, Roberta, Albert, Electra). I also have a dataset of sentences. How can I get the perplexity of each sentence?
From the huggingface documentation ...
8
votes
1
answer
4k
views
Uni-directional Transformer VS Bi-directional BERT
I just finished reading the Transformer paper and BERT paper. But couldn't figure out why Transformer is uni-directional and BERT is bi-directional as mentioned in BERT paper. As they don't use ...
7
votes
2
answers
6k
views
The essence of learnable positional embedding? Does embedding improve outcomes better?
I was recently reading the bert source code from the hugging face project. I noticed that the so-called "learnable position encoding" seems to refer to a specific nn.Parameter layer when it ...
7
votes
1
answer
14k
views
How padding in huggingface tokenizer works?
I tried following tokenization example:
tokenizer = BertTokenizer.from_pretrained(MODEL_TYPE, do_lower_case=True)
sent = "I hate this. Not that.",
_tokenized = tokenizer(sent, ...
5
votes
1
answer
12k
views
How to get cosine similarity of word embedding from BERT model
I was interesting in how to get the similarity of word embedding in different sentences from BERT model (actually, that means words have different meanings in different scenarios).
For example:
sent1 =...
5
votes
3
answers
32k
views
Unable to pip install -U sentence-transformers
I am unable to do: pip install -U sentence-transformers. I get this message on Anaconda Prompt:
ERROR: Could not find a version that satisfies the requirement torch>=1.0.1 (from sentence-transformers) ...
5
votes
1
answer
3k
views
How does BertForSequenceClassification classify on the CLS vector?
Background:
Following along with this question when using bert to classify sequences the model uses the "[CLS]" token representing the classification task. According to the paper:
The first ...
5
votes
3
answers
1k
views
BERT token vs. embedding
I understand that WordPiece is used to break text into tokens. And I understand that, somewhere in BERT, the model maps tokens into token embeddings that represent the meaning of the tokens. But ...
5
votes
2
answers
19k
views
ERROR: file:///content does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found
https://colab.research.google.com/drive/11u6leEKvqE0CCbvDHHKmCxmW5GxyjlBm?usp=sharing
setup.py file is in transformers folder(root directory). But this error occurs when I run
!git clone https://...
4
votes
1
answer
5k
views
BertModel or BertForPreTraining
I want to use Bert only for embedding and use the Bert output as an input for a classification net that I will build from scratch.
I am not sure if I want to do finetuning for the model.
I think the ...
4
votes
1
answer
2k
views
How to stop data shuffling while training the HuggingFace BERT model?
I want to train a BERT transformer model using the HuggingFace implementation/library. During training, HuggingFace shuffles the training data for each epoch, but I don't want to shuffle the data. For ...
4
votes
1
answer
3k
views
how to train a bert model from scratch with huggingface?
i find a answer of training model from scratch in this question:
How to train BERT from scratch on a new domain for both MLM and NSP?
one answer use Trainer and TrainingArguments like this:
from ...
4
votes
1
answer
3k
views
How to process TransformerEncoderLayer output in pytorch
I am trying to use bio-bert sentence embeddings for text classification of longer pieces of text.
As it currently stands I standardize the number of sentences in each piece of text (some sentences are ...
3
votes
3
answers
1k
views
String comparison with BERT seems to ignore "not" in sentence
I implemented a string comparison method using SentenceTransformers and BERT like following
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
...
3
votes
2
answers
7k
views
Having 6 labels instead of 2 in Hugging Face BertForSequenceClassification
I was just wondering if it is possibel to extend the HuggingFace BertForSequenceClassification model to more than 2 labels. The docs say, we can pass positional arguments, but it seems like "labels" ...
3
votes
1
answer
1k
views
BERT Heads Count
From the literature I read,
Bert Base has 12 encoder layers and 12 attention heads. Bert Large has 24 encoder layers and 16 attention heads.
Why is Bert large having 16 attentions heads ?
3
votes
1
answer
4k
views
How to map token indices from the SQuAD data to tokens from BERT tokenizer?
I am using the SQuaD dataset for answer span selection. After using the BertTokenizer to tokenize the passages, for some samples, the start and end indices of the answer don't match the real answer ...
3
votes
1
answer
2k
views
Must the vocab size must math the vocab_size in bert_config.json exactly?
I am seeing someone other's BERT model, in which the vocab.txt's size is 22110, but the vocab_size parameter's value is 21128 in bert_config.json.
I understand that these two numbers must be exactly ...
3
votes
1
answer
559
views
Why doesn't BertForMaskedLM generate right masked tokens?
I am testing this piece of code:
from transformers import BertTokenizer, BertModel, BertForMaskedLM
tokenizer = BertTokenizer.from_pretrained("hfl/chinese-roberta-wwm-ext")
model = ...
3
votes
1
answer
1k
views
TFBertMainLayer gets less accuracy compared to TFBertModel
I had a problem with saving weights of TFBertModel wrapped in Keras. the problem is described here in GitHub issue and here in Stack Overflow.The solution proposed in both cases is to use
config = ...
3
votes
1
answer
545
views
how can we get the attention scores of multimodal models via hugging face library?
I was wondering if we could get the attention scores of any multimodal model using the api provided by the hugging face library, as it's relatively easier to get such scores of normal language bert ...
3
votes
1
answer
2k
views
Using Hugging-face transformer with arguments in pipeline
I am working on using a transformer. Pipeline to get BERT embeddings to my input. using this without a pipeline i am able to get constant outputs but not with pipeline since I was not able to pass ...
3
votes
1
answer
2k
views
BERT transformer KeyError: 3
I am quite new to the BERT language model. I am currently using the Huggingface transformer libraryand i'm encountering an error when encoding the inputs. The goal of the model is to classify fake ...
3
votes
1
answer
1k
views
Transformer/BERT token prediction vocabulary (filtering the special tokens out of the set of possible tokens)
With the Transformer model, especially with the BERT, does it make sense (and would it be statistically correct) to programmatically forbid the model to result with the special tokens as predictions?
...
3
votes
0
answers
3k
views
How to get tokens to words in BERT tokenizer
I have a list, using higgingface bert tokenizer I can get the mapping numerical representation.
X = ['[CLS]', '[MASK]', 'love', 'this', '[SEP]']
tokens = tokenizer.convert_tokens_to_ids(X)
toekns: [...
3
votes
1
answer
224
views
Training RoBerta using transformers on masked language task giving weird results?
I trained a RoBERTa model following this colab - https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb#scrollTo=XaFAsB_fnU3K
Here is how my data looked:...
2
votes
1
answer
1k
views
How does BERT utilize TPU memories?
README in the Google's BERT repo says, even a single sentence of length 512 can not sit in a 12 GB Titan X for the BERT-Large model.
But in the BERT paper, it says 64 TPU chips are used to train BERT-...
2
votes
1
answer
2k
views
What does "fine-tuning of a BERT model" refer to?
I was not able to understand one thing , when it says "fine-tuning of BERT", what does it actually mean:
Are we retraining the entire model again with new data.
Or are we just training top ...
2
votes
1
answer
1k
views
Backpropagation in bert
i would like to know when people say pretrained bert model, is it only the final classification neural network is trained
Or
Is there any update inside transformer through back propagation along with ...
2
votes
2
answers
2k
views
Huggingface Transformers - AttributeError: 'MrpcProcessor' object has no attribute 'tfds_map'
When using Hugginface Transformers on GLUE task, I've got the error AttributeError: 'MrpcProcessor' object has no attribute 'tfds_map'
I suspect a problem of compatibility.
2
votes
1
answer
2k
views
Are these normal speed of Bert Pretrained Model Inference in PyTorch
I am testing Bert base and Bert distilled model in Huggingface with 4 scenarios of speeds, batch_size = 1:
1) bert-base-uncased: 154ms per request
2) bert-base-uncased with quantifization: 94ms per ...
2
votes
1
answer
966
views
How to know if a word belong to a Transformer model?
I use the python library sentence_transformers with the models RoBERTa and FlauBERT.
I use cosine scores to compute similarity but for some words it doesn't work well.
Those words seems to be the one ...
2
votes
1
answer
954
views
Do I need to train on my own data in using bert model as an embedding vector?
When I try the huggingface models and it gives the following error message:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
...
2
votes
0
answers
1k
views
Unable to solve RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
I am facing this error "RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)" when trying to fine tune the model "deepset/bert-base-cased-squad2&...
2
votes
0
answers
369
views
How do I implement a knowledge base in a Huggingface model?
I made a knowledge base using COMET on the Atomic knowledge graph, using this tutorial.
I would like to include this knowledge in a regular pre-trained BERT model from HuggingFace to see how the model ...
2
votes
0
answers
559
views
IndexError: too many indices for tensor of dimension 2: When adding custom layer on HuggingFace model
I've tried to add custom layers to HuggingFace Transformer model on binary classification task. As an absolute beginner, I tried to follow this tutorial
Here's the custom model
class CustomModel(nn....
2
votes
1
answer
2k
views
How to compute the Hessian of a large neural network in PyTorch?
How to compute the Hessian matrix of a large neural network or transformer model like BERT in PyTorch? I know torch.autograd.functional.hessian, but it seems like it only calculates the Hessian of a ...
2
votes
0
answers
1k
views
Finetuning Transformers in PyTorch (BERT, RoBERTa, etc.)
Alright. So there are multiple methods to fine tune a transformer:
freeze transformer's parameters and only its final outputs are fed into another model (user trains this "another" model),
...
2
votes
0
answers
899
views
HuggingFace Tranfsormers BERTForSequenceClassification with Trainer: How to do multi-output regression?
I am trying to fine-tune a BERT model on a dataset of sentences that has two different real-valued attributes for each sentence. For each one, there is a Valence score and an Arousal score, with real ...
2
votes
0
answers
577
views
How to finetune distillbart for abstractive summarization using Gigaword or Cnn dailymail?
I would like to ask about how to finetune distillbart on gigaword and cnn dailymail with the starting checkpoint distilbart-cnn-12-6.
I did use the gigaword dataset provided by tensorflow but it ...
2
votes
0
answers
1k
views
Python "Can't pickle local object" exception during BertModel training
I am using simpletransformers.classification to train a Bert moder to classify some text inputs. Here is my code.
from simpletransformers.classification import ClassificationModel
import torch
...
2
votes
1
answer
1k
views
How to retrieve attention weight alignment for tokens using transformer (BERT) model?
I am working on text classification with transformer models (PyTorch, Huggingface, running on GPU).
I have already my model and my training loop and it works fine but to better understand wrong ...
1
vote
3
answers
784
views
huggingface transformer issue
I used huggingface transformer, but I got some issues like below.
How can I handle this problem?
training_args = TrainingArguments(
output_dir='./.checkpoints',
num_train_epochs=config....
1
vote
2
answers
14k
views
transformers and BERT downloading to your local machine
I am trying to replicates the code from this page.
At my workplace we have access to transformers and pytorch library but cannot connect to internet from our python environment. Could anyone help with ...