All Questions

Filter by
Sorted by
Tagged with
43 votes
2 answers
27k views

Why Bert transformer uses [CLS] token for classification instead of average over all tokens?

I am doing experiments on bert architecture and found out that most of the fine-tuning task takes the final hidden layer as text representation and later they pass it to other models for the further ...
Aaditya Ura's user avatar
  • 12.3k
24 votes
1 answer
49k views

How does max_length, padding and truncation arguments work in HuggingFace' BertTokenizerFast.from_pretrained('bert-base-uncased')?

I am working with Text Classification problem where I want to use the BERT model as the base followed by Dense layers. I want to know how does the 3 arguments work? For example, if I have 3 sentences ...
Deshwal's user avatar
  • 3,872
22 votes
6 answers
27k views

AttributeError: module 'torch' has no attribute '_six'. Bert model in Pytorch

I tried to load pre-trained model by using BertModel class in pytorch. I have _six.py under torch, but it still shows module 'torch' has no attribute '_six' import torch from pytorch_pretrained_bert ...
Ruitong LIU's user avatar
12 votes
2 answers
12k views

How to train BERT from scratch on a new domain for both MLM and NSP?

I’m trying to train BERT model from scratch using my own dataset using HuggingFace library. I would like to train the model in a way that it has the exact architecture of the original BERT model. In ...
tlqn's user avatar
  • 379
11 votes
2 answers
14k views

Continual pre-training vs. Fine-tuning a language model with MLM

I have some custom data I want to use to further pre-train the BERT model. I’ve tried the two following approaches so far: Starting with a pre-trained BERT checkpoint and continuing the pre-training ...
Pedram's user avatar
  • 2,531
9 votes
1 answer
23k views

RuntimeError: The size of tensor a (4000) must match the size of tensor b (512) at non-singleton dimension 1

I'm trying to build a model for document classification. I'm using BERT with PyTorch. I got the bert model with below code. bert = AutoModel.from_pretrained('bert-base-uncased') This is the code for ...
Venkatesh Dharavath's user avatar
9 votes
2 answers
3k views

BERT output not deterministic

BERT output is not deterministic. I expect the output values are deterministic when I put a same input, but my bert model the values are changing. Sounds awkwardly, the same value is returned twice, ...
Keanu Paik's user avatar
9 votes
1 answer
4k views

Why BERT model have to keep 10% MASK token unchanged?

I am reading BERT model paper. In Masked Language Model task during pre-training BERT model, the paper said the model will choose 15% token ramdomly. In the chose token (Ti), 80% it will be replaced ...
Thanh Kiet's user avatar
7 votes
2 answers
6k views

The essence of learnable positional embedding? Does embedding improve outcomes better?

I was recently reading the bert source code from the hugging face project. I noticed that the so-called "learnable position encoding" seems to refer to a specific nn.Parameter layer when it ...
AdamHommer's user avatar
7 votes
1 answer
2k views

Fine-tune Bert for specific domain (unsupervised)

I want to fine-tune BERT on texts that are related to a specific domain (in my case related to engineering). The training should be unsupervised since I don't have any labels or anything. Is this ...
spadel's user avatar
  • 1,036
7 votes
1 answer
8k views

Mismatched size on BertForSequenceClassification from Transformers and multiclass problem

I just trained a BERT model on a Dataset composed by products and labels (departments) for an e-commerce website. It's a multiclass problem. I used BertForSequenceClassification to predict the ...
Guilherme Giuliano Nicolau's user avatar
7 votes
2 answers
4k views

Pretraining a language model on a small custom corpus

I was curious if it is possible to use transfer learning in text generation, and re-train/pre-train it on a specific kind of text. For example, having a pre-trained BERT model and a small corpus ...
ysig's user avatar
  • 477
6 votes
1 answer
1k views

BERT performing worse than word2vec

I am trying to use BERT for a document ranking problem. My task is pretty straightforward. I have to do a similarity ranking for an input document. The only issue here is that I don’t have labels - so ...
user3741951's user avatar
6 votes
3 answers
4k views

TypeError: Layer input_spec must be an instance of InputSpec. Got: InputSpec(shape=(None, 128, 768), ndim=3)

I am trying to use a BERT pretrained model to do a multiclass classification (of 3 classes). Here's my function to use the model and also added some extra functionalities: def create_model(max_seq_len,...
Hrisav Bhowmick's user avatar
5 votes
2 answers
3k views

Why are the matrices in BERT called Query, Key, and Value?

Within the transformer units of BERT, there are modules called Query, Key, and Value, or simply Q,K,V. Based on the BERT paper and code (particularly in modeling.py), my pseudocode understanding of ...
solvingPuzzles's user avatar
4 votes
1 answer
7k views

BERT for time series classification

I’d like to train a transformer encoder (e.g. BERT) on time-series data for a task that can be modeled as classification. Let met briefly describe the data I’m using before talking about the issue I’m ...
clems's user avatar
  • 129
4 votes
3 answers
27k views

OSError for huggingface model

I am trying to use a huggingface model (CamelBERT), but I am getting an error when loading the tokenizer: Code: from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer....
TMN's user avatar
  • 103
4 votes
1 answer
2k views

Finetuning BERT on custom data

I want to train a 21 class text classification model using Bert. But I have very little training data, so a downloaded a similar dataset with 5 classes with 2 million samples.t And finetuned ...
danishansari's user avatar
4 votes
1 answer
5k views

BertModel or BertForPreTraining

I want to use Bert only for embedding and use the Bert output as an input for a classification net that I will build from scratch. I am not sure if I want to do finetuning for the model. I think the ...
Amit S's user avatar
  • 243
4 votes
1 answer
6k views

BERT outputs explained

The keys of the BERT encoder's output are default, encoder_outputs, pooled_output and sequence_output As far as I can know, encoder_outputs are the output of each encoder, pooled_output is the output ...
OK 400's user avatar
  • 1,159
4 votes
1 answer
7k views

There appear to be 1 leaked semaphore objects to clean up at shutdown

I am using MacOS & used DistilBert model using Sentence Transformer for chatbot implementation and generated the API in VS code. But after giving 3 inputs it pop’s up this error: UserWarning: ...
Tejas Sutar's user avatar
4 votes
0 answers
748 views

HuggingFace BertForMaskedLM: Expected input batch_size (3200) to match target batch_size (16)

Im working on a Multiclass Classification (Bengali Language Sentiment Analysis) on a pretrained Huggingface (BertForMaskedLM) model. When the error occured I knew I have to change the label(output) ...
epitope21's user avatar
3 votes
1 answer
3k views

ValueError: Unknown layer: TFBertModel. Please ensure this object is passed to the `custom_objects` argument

Here I training the bert model. below code i used to train, when i load the saved model for predict, it's shows this error. can anyone please help me out? import tensorflow as tf import logging from ...
waji's user avatar
  • 71
3 votes
2 answers
1k views

Multilingual Bert sentence vector captures language used more than meaning - working as interned?

Playing around with BERT, I downloaded the Huggingface Multilingual Bert and entered three sentences, saving their sentence vectors (the embedding of [CLS]), then translated them via Google Translate, ...
user2182857's user avatar
3 votes
2 answers
2k views

Sentence-Transformer Training and Validation Loss

I am using the Sentence-Transformers model to Fine Tune(using PyTorch) it on a custom dataset which is the same as the Semantic Text Similarity (STS) Dataset. I am unable to get(or print) the training ...
Abhas kumar's user avatar
3 votes
1 answer
939 views

How to find the (Most important) responsible Words/ Tokens/ embeddings responsible for the label result of a text classification model in PyTorch

Let us suppose I have a model like: class BERT_Subject_Classifier(nn.Module): def __init__(self,out_classes,hidden1=128,hidden2=32,dropout_val=0.2): super(BERT_Subject_Classifier, self)....
Deshwal's user avatar
  • 3,872
3 votes
1 answer
807 views

How to set output_shape of BERT preprocessing layer from tensorflow hub?

I am building a simple BERT model for text classification, using the tensorflow hub. import tensorflow as tf import tensorflow_hub as tf_hub bert_preprocess = tf_hub.KerasLayer("https://tfhub....
lazarea's user avatar
  • 1,219
3 votes
0 answers
855 views

Same input, same model, same weights but getting different results

I'm finetuning sentence-bert to do some task like sentence cosine-similarity calculation in Tensorflow. I set up a encoder, let's say, encoder1 using the code below: from sentence_transformers import ...
PlasticSaber's user avatar
3 votes
2 answers
2k views

how to save and load custom siamese bert model

I am following this tutorial on how to train a siamese bert network: https://keras.io/examples/nlp/semantic_similarity_with_bert/ all good, but I am not sure what is the best way to save the model ...
Carbo's user avatar
  • 916
3 votes
0 answers
708 views

I'm trying to load BERT "tfbert-large-uncased" but i got an error "Can't load config.json file"

I'm trying to load the pre-train BERT model but I'm getting an error while loading tokenized it says config.json is not found. If anyone knows how to solve these issues please help me Model and path ...
iamhimanshu0's user avatar
3 votes
0 answers
710 views

Google BERT and antonym detection

I recently learned about the following phenomenon: Google BERT word embeddings of well-known state-of-the-art models seem to ignore the measure of semantical contrast between antonyms in terms of the ...
Moshe's user avatar
  • 555
2 votes
1 answer
1k views

Using BERT in order to detect language of a given word

I have words in the Hebrew language. Part of them are originally in English, and part of them are 'Hebrew English', meaning that those are words that are originally from English but are written with ...
jonb's user avatar
  • 865
2 votes
1 answer
2k views

How does the BERT tokenizer result in an input tensor shape of (b, 24, 768)?

I understand how the BERT tokenizer works thanks to this article: https://albertauyeung.github.io/2020/06/19/bert-tokenization.html However, I am confused about how this ends up as the final input ...
Joshua Clancy's user avatar
2 votes
1 answer
715 views

BERT Text Classification

I am new to BERT and try to learn BERT Fine-Tuning for Text Classification via a coursera course https://www.coursera.org/projects/fine-tune-bert-tensorflow/ Based on the course, I would like to ...
plm0998's user avatar
  • 35
2 votes
2 answers
2k views

Fine-tune BERT for a specific domain on a different language?

I want to fine-tune on a pre-trained BERT model. However, my task uses data within a specific domain (say biomedical data). Additionally, my data is also in a language different from English (say ...
Moonreaderx's user avatar
2 votes
1 answer
306 views

Trying to train model for Intent Recognition but getting float error

I'm trying to train the model for intent recognition. I tried removing all special characters and stop words but unable to resolve this error. I tried removing integers also but it's throwing an error....
user avatar
2 votes
1 answer
106 views

Tensor flow Model saving and Calculating Average of Models [closed]

I am trying to implement and reproduce the results of federated Bert pertaining in paper Federated pretraining and fine-tuning of BERT using clinical notes from multiple silos. I prefer to use ...
Faseela Thayattuchira's user avatar
2 votes
1 answer
498 views

Have you encountered the similar problem like loss jitter during training?

Background: It's about loss jittering which generates at the beginning stage of every training epoch. When the dataloader loads the first batch data to feed into the network, the loss value always ...
Timetraveler's user avatar
2 votes
1 answer
123 views

Why is a throw-away column required in Bert format?

I have recently come across Bert(Bidirectional Encoder Representations from Transformers). I saw that Bert requires a strict format for the train data. The third column needed is described as follows: ...
anegru's user avatar
  • 1,093
2 votes
0 answers
138 views

How can I implement this BERT model for sequential sentences classification using HuggingFace?

I want to classify the functions of sentences in the abstracts of scientific papers, and the function of a sentence is related to the functions of its surrounding sentences. I found the model proposed ...
Tom Leung's user avatar
  • 364
2 votes
1 answer
2k views

How to compute the Hessian of a large neural network in PyTorch?

How to compute the Hessian matrix of a large neural network or transformer model like BERT in PyTorch? I know torch.autograd.functional.hessian, but it seems like it only calculates the Hessian of a ...
Yan Pan's user avatar
  • 21
2 votes
0 answers
1k views

Finetuning Transformers in PyTorch (BERT, RoBERTa, etc.)

Alright. So there are multiple methods to fine tune a transformer: freeze transformer's parameters and only its final outputs are fed into another model (user trains this "another" model), ...
brucewlee's user avatar
2 votes
0 answers
431 views

String cleaning/preprocessing for BERT

So my goal is to train a BERT Model on wikipedia data that I derive right from Wikipedia. The contents that I scrape from the site look like this (example): "(148975) 2001 XA255, provisional ...
Heidedo's user avatar
  • 21
2 votes
0 answers
607 views

I am getting OOM while running PRE TRAINED Bert Model with new dataset with 20k

I have pre trained model with Accuracy of 96 with 2 epochs and I am trying to use that model on new dataset of 20k tweets for sentiment analysis. while doing that I am getting below error. I haven't ...
RAMA KRISHNA's user avatar
2 votes
0 answers
156 views

How to predict if a phrase is related to a short text or an article using supervised learning?

I have set of short phrases and a set of texts. I want to predict if a phrase is related to an article. A phrase that isn't appearing in the article may still be related. Some examples of annotated ...
landings's user avatar
  • 696
2 votes
0 answers
470 views

How Bert change max sequence length when we do fine tune task?

Suppose we use pretrained model with max sequence length 128。 Now I change the config file, reduce the max sequence length from 128 to 64。 Next, do the fine tune task , such as a smiple classifaction ...
McCree Feng's user avatar
2 votes
1 answer
75 views

TensorFlow1.15, the inner logic of Estimator's input_fn? Or the inner logic of MirroredStrategy?

I am pretraining BERT in 1 machine with 4 GPU, not 1 GPU. For each training step, I am wondering whether the input_fn give 1 GPU 1 batch or give 4 GPU 1 batch. The mirrow strategy code: ...
惊天补扣's user avatar
  • 1,732
2 votes
0 answers
322 views

Is it possible to vectorize the documents using Google BERT?

I would like to convert my documents to vector using BERT, one vector for each document. Is it possible? How could it be programmed using standard or popular libraries?
user_5's user avatar
  • 548
2 votes
0 answers
359 views

How to use run_classifer.py,an example of Pytorch implementation of Bert for classification Task?

How to use the fine-tuned bert pytorch model for classification (CoLa) task? I do not see the argument --do_predict, in /examples/run_classifier.py. However, --do_predict exists in the original ...
Ashwin Geet D'Sa's user avatar
2 votes
1 answer
161 views

bert_vocab.bert_vocab_from_dataset returning wrong vocabulary [closed]

i'm trying to build a tokenizer following the tf's tutorial https://www.tensorflow.org/text/guide/subwords_tokenizer. I'm basically doing the same thing only with a different dataset. The dataset in ...
Niccolò Tiezzi's user avatar