All Questions
Tagged with bert-language-model deep-learning
169
questions
43
votes
2
answers
27k
views
Why Bert transformer uses [CLS] token for classification instead of average over all tokens?
I am doing experiments on bert architecture and found out that most of the fine-tuning task takes the final hidden layer as text representation and later they pass it to other models for the further ...
24
votes
1
answer
49k
views
How does max_length, padding and truncation arguments work in HuggingFace' BertTokenizerFast.from_pretrained('bert-base-uncased')?
I am working with Text Classification problem where I want to use the BERT model as the base followed by Dense layers. I want to know how does the 3 arguments work? For example, if I have 3 sentences ...
22
votes
6
answers
27k
views
AttributeError: module 'torch' has no attribute '_six'. Bert model in Pytorch
I tried to load pre-trained model by using BertModel class in pytorch.
I have _six.py under torch, but it still shows module 'torch' has no attribute '_six'
import torch
from pytorch_pretrained_bert ...
12
votes
2
answers
12k
views
How to train BERT from scratch on a new domain for both MLM and NSP?
I’m trying to train BERT model from scratch using my own dataset using HuggingFace library. I would like to train the model in a way that it has the exact architecture of the original BERT model.
In ...
11
votes
2
answers
14k
views
Continual pre-training vs. Fine-tuning a language model with MLM
I have some custom data I want to use to further pre-train the BERT model. I’ve tried the two following approaches so far:
Starting with a pre-trained BERT checkpoint and continuing the pre-training ...
9
votes
1
answer
23k
views
RuntimeError: The size of tensor a (4000) must match the size of tensor b (512) at non-singleton dimension 1
I'm trying to build a model for document classification. I'm using BERT with PyTorch.
I got the bert model with below code.
bert = AutoModel.from_pretrained('bert-base-uncased')
This is the code for ...
9
votes
2
answers
3k
views
BERT output not deterministic
BERT output is not deterministic.
I expect the output values are deterministic when I put a same input, but my bert model the values are changing. Sounds awkwardly, the same value is returned twice, ...
9
votes
1
answer
4k
views
Why BERT model have to keep 10% MASK token unchanged?
I am reading BERT model paper. In Masked Language Model task during pre-training BERT model, the paper said the model will choose 15% token ramdomly. In the chose token (Ti), 80% it will be replaced ...
7
votes
2
answers
6k
views
The essence of learnable positional embedding? Does embedding improve outcomes better?
I was recently reading the bert source code from the hugging face project. I noticed that the so-called "learnable position encoding" seems to refer to a specific nn.Parameter layer when it ...
7
votes
1
answer
2k
views
Fine-tune Bert for specific domain (unsupervised)
I want to fine-tune BERT on texts that are related to a specific domain (in my case related to engineering). The training should be unsupervised since I don't have any labels or anything. Is this ...
7
votes
1
answer
8k
views
Mismatched size on BertForSequenceClassification from Transformers and multiclass problem
I just trained a BERT model on a Dataset composed by products and labels (departments) for an e-commerce website. It's a multiclass problem. I used BertForSequenceClassification to predict the ...
7
votes
2
answers
4k
views
Pretraining a language model on a small custom corpus
I was curious if it is possible to use transfer learning in text generation, and re-train/pre-train it on a specific kind of text.
For example, having a pre-trained BERT model and a small corpus ...
6
votes
1
answer
1k
views
BERT performing worse than word2vec
I am trying to use BERT for a document ranking problem. My task is pretty straightforward. I have to do a similarity ranking for an input document. The only issue here is that I don’t have labels - so ...
6
votes
3
answers
4k
views
TypeError: Layer input_spec must be an instance of InputSpec. Got: InputSpec(shape=(None, 128, 768), ndim=3)
I am trying to use a BERT pretrained model to do a multiclass classification (of 3 classes). Here's my function to use the model and also added some extra functionalities:
def create_model(max_seq_len,...
5
votes
2
answers
3k
views
Why are the matrices in BERT called Query, Key, and Value?
Within the transformer units of BERT, there are modules called Query, Key, and Value, or simply Q,K,V.
Based on the BERT paper and code (particularly in modeling.py), my pseudocode understanding of ...
4
votes
1
answer
7k
views
BERT for time series classification
I’d like to train a transformer encoder (e.g. BERT) on time-series data for a task that can be modeled as classification. Let met briefly describe the data I’m using before talking about the issue I’m ...
4
votes
3
answers
27k
views
OSError for huggingface model
I am trying to use a huggingface model (CamelBERT), but I am getting an error when loading the tokenizer:
Code:
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer....
4
votes
1
answer
2k
views
Finetuning BERT on custom data
I want to train a 21 class text classification model using Bert. But I have very little training data, so a downloaded a similar dataset with 5 classes with 2 million samples.t
And finetuned ...
4
votes
1
answer
5k
views
BertModel or BertForPreTraining
I want to use Bert only for embedding and use the Bert output as an input for a classification net that I will build from scratch.
I am not sure if I want to do finetuning for the model.
I think the ...
4
votes
1
answer
6k
views
BERT outputs explained
The keys of the BERT encoder's output are default, encoder_outputs, pooled_output and sequence_output
As far as I can know, encoder_outputs are the output of each encoder, pooled_output is the output ...
4
votes
1
answer
7k
views
There appear to be 1 leaked semaphore objects to clean up at shutdown
I am using MacOS & used DistilBert model using Sentence Transformer for chatbot implementation and generated the API in VS code.
But after giving 3 inputs it pop’s up this error:
UserWarning: ...
4
votes
0
answers
748
views
HuggingFace BertForMaskedLM: Expected input batch_size (3200) to match target batch_size (16)
Im working on a Multiclass Classification (Bengali Language Sentiment Analysis) on a pretrained Huggingface (BertForMaskedLM) model.
When the error occured I knew I have to change the label(output) ...
3
votes
1
answer
3k
views
ValueError: Unknown layer: TFBertModel. Please ensure this object is passed to the `custom_objects` argument
Here I training the bert model. below code i used to train, when i load the saved model for predict, it's shows this error. can anyone please help me out?
import tensorflow as tf
import logging
from ...
3
votes
2
answers
1k
views
Multilingual Bert sentence vector captures language used more than meaning - working as interned?
Playing around with BERT, I downloaded the Huggingface Multilingual Bert and entered three sentences, saving their sentence vectors (the embedding of [CLS]), then translated them via Google Translate, ...
3
votes
2
answers
2k
views
Sentence-Transformer Training and Validation Loss
I am using the Sentence-Transformers model to Fine Tune(using PyTorch) it on a custom dataset which is the same as the Semantic Text Similarity (STS) Dataset.
I am unable to get(or print) the training ...
3
votes
1
answer
939
views
How to find the (Most important) responsible Words/ Tokens/ embeddings responsible for the label result of a text classification model in PyTorch
Let us suppose I have a model like:
class BERT_Subject_Classifier(nn.Module):
def __init__(self,out_classes,hidden1=128,hidden2=32,dropout_val=0.2):
super(BERT_Subject_Classifier, self)....
3
votes
1
answer
807
views
How to set output_shape of BERT preprocessing layer from tensorflow hub?
I am building a simple BERT model for text classification, using the tensorflow hub.
import tensorflow as tf
import tensorflow_hub as tf_hub
bert_preprocess = tf_hub.KerasLayer("https://tfhub....
3
votes
0
answers
855
views
Same input, same model, same weights but getting different results
I'm finetuning sentence-bert to do some task like sentence cosine-similarity calculation in Tensorflow. I set up a encoder, let's say, encoder1 using the code below:
from sentence_transformers import ...
3
votes
2
answers
2k
views
how to save and load custom siamese bert model
I am following this tutorial on how to train a siamese bert network:
https://keras.io/examples/nlp/semantic_similarity_with_bert/
all good, but I am not sure what is the best way to save the model ...
3
votes
0
answers
708
views
I'm trying to load BERT "tfbert-large-uncased" but i got an error "Can't load config.json file"
I'm trying to load the pre-train BERT model but I'm getting an error while loading tokenized it says config.json is not found.
If anyone knows how to solve these issues please help me
Model and path ...
3
votes
0
answers
710
views
Google BERT and antonym detection
I recently learned about the following phenomenon: Google BERT word embeddings of well-known state-of-the-art models seem to ignore the measure of semantical contrast between antonyms in terms of the ...
2
votes
1
answer
1k
views
Using BERT in order to detect language of a given word
I have words in the Hebrew language. Part of them are originally in English, and part of them are 'Hebrew English', meaning that those are words that are originally from English but are written with ...
2
votes
1
answer
2k
views
How does the BERT tokenizer result in an input tensor shape of (b, 24, 768)?
I understand how the BERT tokenizer works thanks to this article:
https://albertauyeung.github.io/2020/06/19/bert-tokenization.html
However, I am confused about how this ends up as the final input ...
2
votes
1
answer
715
views
BERT Text Classification
I am new to BERT and try to learn BERT Fine-Tuning for Text Classification via a coursera course https://www.coursera.org/projects/fine-tune-bert-tensorflow/
Based on the course, I would like to ...
2
votes
2
answers
2k
views
Fine-tune BERT for a specific domain on a different language?
I want to fine-tune on a pre-trained BERT model.
However, my task uses data within a specific domain (say biomedical data).
Additionally, my data is also in a language different from English (say ...
2
votes
1
answer
306
views
Trying to train model for Intent Recognition but getting float error
I'm trying to train the model for intent recognition. I tried removing all special characters and stop words but unable to resolve this error. I tried removing integers also but it's throwing an error....
2
votes
1
answer
106
views
Tensor flow Model saving and Calculating Average of Models [closed]
I am trying to implement and reproduce the results of federated Bert pertaining in paper
Federated pretraining and fine-tuning of BERT using clinical notes from multiple silos.
I prefer to use ...
2
votes
1
answer
498
views
Have you encountered the similar problem like loss jitter during training?
Background: It's about loss jittering which generates at the beginning stage of every training epoch. When the dataloader loads the first batch data to feed into the network, the loss value always ...
2
votes
1
answer
123
views
Why is a throw-away column required in Bert format?
I have recently come across Bert(Bidirectional Encoder Representations from Transformers). I saw that Bert requires a strict format for the train data. The third column needed is described as follows:
...
2
votes
0
answers
138
views
How can I implement this BERT model for sequential sentences classification using HuggingFace?
I want to classify the functions of sentences in the abstracts of scientific papers, and the function of a sentence is related to the functions of its surrounding sentences.
I found the model proposed ...
2
votes
1
answer
2k
views
How to compute the Hessian of a large neural network in PyTorch?
How to compute the Hessian matrix of a large neural network or transformer model like BERT in PyTorch? I know torch.autograd.functional.hessian, but it seems like it only calculates the Hessian of a ...
2
votes
0
answers
1k
views
Finetuning Transformers in PyTorch (BERT, RoBERTa, etc.)
Alright. So there are multiple methods to fine tune a transformer:
freeze transformer's parameters and only its final outputs are fed into another model (user trains this "another" model),
...
2
votes
0
answers
431
views
String cleaning/preprocessing for BERT
So my goal is to train a BERT Model on wikipedia data that I derive right from Wikipedia.
The contents that I scrape from the site look like this (example):
"(148975) 2001 XA255, provisional ...
2
votes
0
answers
607
views
I am getting OOM while running PRE TRAINED Bert Model with new dataset with 20k
I have pre trained model with Accuracy of 96 with 2 epochs and I am trying to use that model on new dataset of 20k tweets for sentiment analysis. while doing that I am getting below error.
I haven't ...
2
votes
0
answers
156
views
How to predict if a phrase is related to a short text or an article using supervised learning?
I have set of short phrases and a set of texts. I want to predict if a phrase is related to an article. A phrase that isn't appearing in the article may still be related.
Some examples of annotated ...
2
votes
0
answers
470
views
How Bert change max sequence length when we do fine tune task?
Suppose we use pretrained model with max sequence length 128。
Now I change the config file, reduce the max sequence length from 128 to 64。
Next, do the fine tune task , such as a smiple classifaction ...
2
votes
1
answer
75
views
TensorFlow1.15, the inner logic of Estimator's input_fn? Or the inner logic of MirroredStrategy?
I am pretraining BERT in 1 machine with 4 GPU, not 1 GPU.
For each training step, I am wondering whether the input_fn give 1 GPU 1 batch or give 4 GPU 1 batch.
The mirrow strategy code:
...
2
votes
0
answers
322
views
Is it possible to vectorize the documents using Google BERT?
I would like to convert my documents to vector using BERT, one vector for each document. Is it possible? How could it be programmed using standard or popular libraries?
2
votes
0
answers
359
views
How to use run_classifer.py,an example of Pytorch implementation of Bert for classification Task?
How to use the fine-tuned bert pytorch model for classification (CoLa) task?
I do not see the argument --do_predict, in /examples/run_classifier.py.
However, --do_predict exists in the original ...
2
votes
1
answer
161
views
bert_vocab.bert_vocab_from_dataset returning wrong vocabulary [closed]
i'm trying to build a tokenizer following the tf's tutorial https://www.tensorflow.org/text/guide/subwords_tokenizer. I'm basically doing the same thing only with a different dataset. The dataset in ...