Highest scored 'bert-language-model' questions

94 votes

10 answers

95k views

How to use Bert for long text classification?

We know that BERT has a max length limit of tokens = 512, So if an article has a length of much bigger than 512, such as 10000 tokens in text How can BERT be used?

user1337896

1,221

asked Oct 31, 2019 at 3:34

50 votes

10 answers

125k views

CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)

I got the following error when I ran my PyTorch deep learning model in Google Colab /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias) 1370 ret = ...

Mr. NLP

971

asked Apr 28, 2020 at 5:39

46 votes

5 answers

58k views

ValueError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]] - Tokenizing BERT / Distilbert Error

def split_data(path): df = pd.read_csv(path) return train_test_split(df , test_size=0.1, random_state=100) train, test = split_data(DATA_DIR) train_texts, train_labels = train['text'].to_list(), ...

Raoof Naushad

736

asked Aug 21, 2020 at 5:59

43 votes

2 answers

27k views

Why Bert transformer uses [CLS] token for classification instead of average over all tokens?

I am doing experiments on bert architecture and found out that most of the fine-tuning task takes the final hidden layer as text representation and later they pass it to other models for the further ...

Aaditya Ura

12.3k

asked Jul 2, 2020 at 21:25

39 votes

3 answers

36k views

dropout(): argument 'input' (position 1) must be Tensor, not str when using Bert with Huggingface

My code was working fine and when I tried to run it today without changing anything I got the following error: dropout(): argument 'input' (position 1) must be Tensor, not str Would appreciate if ...

Tashinga Musanhu

401

asked Nov 30, 2020 at 22:45

31 votes

6 answers

40k views

How to cluster similar sentences using BERT

For ElMo, FastText and Word2Vec, I'm averaging the word embeddings within a sentence and using HDBSCAN/KMeans clustering to group similar sentences. A good example of the implementation can be seen ...

somethingstrang

1,123

asked Apr 10, 2019 at 18:31

24 votes

1 answer

49k views

How does max_length, padding and truncation arguments work in HuggingFace' BertTokenizerFast.from_pretrained('bert-base-uncased')?

I am working with Text Classification problem where I want to use the BERT model as the base followed by Dense layers. I want to know how does the 3 arguments work? For example, if I have 3 sentences ...

Deshwal

3,872

asked Dec 11, 2020 at 6:26

24 votes

1 answer

62k views

PyTorch BERT TypeError: forward() got an unexpected keyword argument 'labels'

Training a BERT model using PyTorch transformers (following the tutorial here). Following statement in the tutorial loss = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask, labels=...

PinkBanter

1,826

asked Oct 18, 2019 at 15:42

23 votes

3 answers

21k views

Cased VS uncased BERT models in spacy and train data

I want to use spacy's pretrained BERT model for text classification but I'm a little confused about cased/uncased models. I read somewhere that cased models should only be used when there is a chance ...

Oleg Ivanytskyi

1,039

asked May 19, 2020 at 23:20

22 votes

6 answers

27k views

AttributeError: module 'torch' has no attribute '_six'. Bert model in Pytorch

I tried to load pre-trained model by using BertModel class in pytorch. I have _six.py under torch, but it still shows module 'torch' has no attribute '_six' import torch from pytorch_pretrained_bert ...

Ruitong LIU

221

asked May 21, 2019 at 15:41

21 votes

1 answer

30k views

PyTorch: RuntimeError: Input, output and indices must be on the current device

I am running a BERT model on torch. It's a multi-class sentiment classification task with about 30,000 rows. I have already put everything on cuda, but not sure why I'm getting the following run time ...

Roy

984

asked Nov 19, 2020 at 15:17

19 votes

5 answers

68k views

Pytorch: IndexError: index out of range in self. How to solve?

This training code is based on the run_glue.py script found here: # Set the seed value all over the place to make this reproducible. seed_val = 42 random.seed(seed_val) np.random.seed(seed_val) torch....

sylvester

243

asked May 29, 2020 at 7:51

18 votes

1 answer

12k views

BertForSequenceClassification vs. BertForMultipleChoice for sentence multi-class classification

I'm working on a text classification problem (e.g. sentiment analysis), where I need to classify a text string into one of five classes. I just started using the Huggingface Transformer package and ...

stackoverflowuser2010

39.8k

asked Mar 10, 2020 at 1:02

17 votes

2 answers

33k views

The size of tensor a (707) must match the size of tensor b (512) at non-singleton dimension 1

I am trying to do text classification using pretrained BERT model. I trained the model on my dataset, and in the phase of testing; I know that BERT can only take to 512 tokens, so I wrote if condition ...

Mee

1,561

asked Oct 12, 2020 at 15:34

17 votes

5 answers

67k views

Transformer: Error importing packages. "ImportError: cannot import name 'SAVE_STATE_WARNING' from 'torch.optim.lr_scheduler'"

I am working on a machine learning project on Google Colab, it seems recently there is an issue when trying to import packages from transformers. The error message says: ImportError: cannot import ...

Spartan 332

231

asked Mar 11, 2021 at 21:43

17 votes

2 answers

11k views

Difficulty in understanding the tokenizer used in Roberta model

from transformers import AutoModel, AutoTokenizer tokenizer1 = AutoTokenizer.from_pretrained("roberta-base") tokenizer2 = AutoTokenizer.from_pretrained("bert-base-cased") sequence = "A Titan RTX has ...

Mr. NLP

971

asked Apr 10, 2020 at 4:58

16 votes

2 answers

31k views

Download pre-trained sentence-transformers model locally

I am using the SentenceTransformers library (here: https://pypi.org/project/sentence-transformers/#pretrained-models) for creating embeddings of sentences using the pre-trained model bert-base-nli-...

neha tamore

311

asked Dec 23, 2020 at 5:34

16 votes

3 answers

23k views

How to understand hidden_states of the returns in BertModel?(huggingface-transformers)

Returns last_hidden_state (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size)): Sequence of hidden-states at the output of the last layer of the model. pooler_output (torch....

island145287

211

asked Apr 20, 2020 at 13:26

15 votes

3 answers

21k views

BERT sentence embeddings from transformers

I'm trying to get sentence vectors from hidden states in a BERT model. Looking at the huggingface BertModel instructions here, which say: from transformers import BertTokenizer, BertModel tokenizer = ...

Mittenchops

19.2k

asked Aug 18, 2020 at 3:00

15 votes

3 answers

35k views

Python: BERT Error - Some weights of the model checkpoint at were not used when initializing BertModel

I am creating an entity extraction model in PyTorch using bert-base-uncased but when I try to run the model I get this error: Error: Some weights of the model checkpoint at D:\Transformers\bert-entity-...

Ishan Dutta

917

asked May 15, 2021 at 12:50

15 votes

6 answers

40k views

With BERT Text Classification, ValueError: too many dimensions 'str' error occuring

Trying to make a classifier for sentiments of texts with BERT model but getting ValueError : too many dimensions 'str' That is the DataFrame for values of train data; so they are train_labels 0 notr ...

KazımTibetSar

151

asked Jan 20, 2021 at 7:12

15 votes

2 answers

10k views

BertModel transformers outputs string instead of tensor

I'm following this tutorial that codes a sentiment analysis classifier using BERT with the huggingface library and I'm having a very odd behavior. When trying the BERT model with a sample text I get a ...

Miguel

2,922

asked Dec 3, 2020 at 18:42

14 votes

1 answer

14k views

PyTorch torch.no_grad() versus requires_grad=False

I'm following a PyTorch tutorial which uses the BERT NLP model (feature extractor) from the Huggingface Transformers library. There are two pieces of interrelated code for gradient updates that I don'...

stackoverflowuser2010

39.8k

asked Sep 7, 2020 at 23:23

13 votes

4 answers

8k views

How to fine tune BERT on unlabeled data?

I want to fine tune BERT on a specific domain. I have texts of that domain in text files. How can I use these to fine tune BERT? I am looking here currently. My main objective is to get sentence ...

Rish

541

asked May 22, 2020 at 19:42

12 votes

2 answers

12k views

How to train BERT from scratch on a new domain for both MLM and NSP?

I’m trying to train BERT model from scratch using my own dataset using HuggingFace library. I would like to train the model in a way that it has the exact architecture of the original BERT model. In ...

tlqn

379

asked Jan 9, 2021 at 19:46

12 votes

8 answers

37k views

SSLError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /dslim/bert-base-NER/resolve/main/tokenizer_config.json

I am facing below issue while loading the pretrained BERT model from HuggingFace due to SSL certificate error. Error: SSLError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries ...

Nikita Malviya

627

asked Jan 13, 2023 at 15:09

12 votes

1 answer

8k views

What is the difference between Sentence Encodings and Contextualized Word Embeddings?

I have seen both terms used while reading papers about BERT and ELMo so I wonder if there is a difference between them.

Rodrigo

133

asked Jan 23, 2020 at 11:20

12 votes

4 answers

11k views

Training TFBertForSequenceClassification with custom X and Y data

I am working on a TextClassification problem, for which I am trying to traing my model on TFBertForSequenceClassification given in huggingface-transformers library. I followed the example given on ...

Rahul Goel

872

asked Feb 29, 2020 at 9:49

12 votes

3 answers

37k views

OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index']

When I load the BERT pretrained model online I get this error OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index'] found in directory uncased_L-12_H-768_A-12 or '...

Asma

189

asked Jul 17, 2020 at 20:52

12 votes

2 answers

5k views

Get probability of multi-token word in MASK position

It is relatively easy to get a token's probability according to a language model, as the snippet below shows. You can get the output of a model, restrict yourself to the output of the masked token, ...

Bram Vanroy

27.7k

asked Dec 21, 2019 at 9:24

11 votes

2 answers

14k views

Continual pre-training vs. Fine-tuning a language model with MLM

I have some custom data I want to use to further pre-train the BERT model. I’ve tried the two following approaches so far: Starting with a pre-trained BERT checkpoint and continuing the pre-training ...

Pedram

2,531

asked Jul 20, 2021 at 20:52

11 votes

1 answer

2k views

what's the difference between "self-attention mechanism" and "full-connection" layer?

I am confused with these two structures. In theory, the output of them are all connected to their input. what magic make 'self-attention mechanism' is more powerful than the full-connection layer?

tom_cat

325

asked Oct 6, 2020 at 2:50

11 votes

2 answers

13k views

How to use Transformers for text classification?

I have two questions about how to use Tensorflow implementation of the Transformers for text classifications. First, it seems people mostly used only the encoder layer to do the text classification ...

khemedi

806

asked Sep 26, 2019 at 19:18

11 votes

3 answers

15k views

Transformers pretrained model with dropout setting

I'm trying to use transformer's huggingface pretrained model bert-base-uncased, but I want to increace dropout. There isn't any mention to this in from_pretrained method, but colab ran the object ...

Rafael Higa

685

asked Nov 21, 2020 at 19:14

11 votes

1 answer

6k views

what is so special about special tokens?

what exactly is the difference between "token" and a "special token"? I understand the following: what is a typical token what is a typical special token: MASK, UNK, SEP, etc when ...

ShaoMin Liu

123

asked Mar 30, 2022 at 14:58

11 votes

2 answers

3k views

Removing SEP token in Bert for text classification

Given a sentiment classification dataset, I want to fine-tune Bert. As you know that BERT created to predict the next sentence given the current sentence. Thus, to make the network aware of this, ...

Minions

5,327

asked Jan 13, 2020 at 15:15

10 votes

4 answers

14k views

Is it necessary to do stopwords removal ,Stemming/Lemmatization for text classification while using Spacy,Bert?

Is stopwords removal ,Stemming and Lemmatization necessary for text classification while using Spacy,Bert or other advanced NLP models for getting the vector embedding of the text ? text="The ...

star

254

asked Aug 28, 2020 at 12:10

10 votes

2 answers

21k views

How to add new special token to the tokenizer?

I want to build a multi-class classification model for which I have conversational data as input for the BERT model (using bert-base-uncased). QUERY: I want to ask a question. ANSWER: Sure, ask away. ...

sid8491

6,740

asked Sep 15, 2021 at 10:24

10 votes

3 answers

9k views

Using trained BERT Model and Data Preprocessing

When using a pre-trained BERT embeddings from pytorch (which are then fine-tuned), should the text data fed into the model be pre-processed like in any standard NLP task? For instance, should ...

SFD

575

asked Sep 20, 2020 at 13:33

10 votes

1 answer

14k views

How to get intermediate layers' output of pre-trained BERT model in HuggingFace Transformers library?

(I'm following this pytorch tutorial about BERT word embeddings, and in the tutorial the author is access the intermediate layers of the BERT model.) What I want is to access the last, lets say, 4 ...

Yagel

1,262

asked Apr 27, 2020 at 17:47

10 votes

1 answer

3k views

How to use existing huggingface-transformers model into spacy?

I'm here to ask you guys if it is possible to use an existing trained huggingface-transformers model with spacy. My first naive attempt was to load it via spacy.load('bert-base-uncased'), it didn't ...

rdemorais

253

asked Oct 27, 2021 at 12:44

10 votes

3 answers

12k views

BertTokenizer - when encoding and decoding sequences extra spaces appear

When using Transformers from HuggingFace I am facing a problem with the encoding and decoding method. I have a the following string: test_string = 'text with percentage%' Then I am running the ...

Henryk Borzymowski

1,058

asked Nov 21, 2019 at 16:43

9 votes

2 answers

12k views

How to find the closest word to a vector using BERT

I am trying to get textual representation(or the closest word) of given word embedding using BERT. Basically I am trying to get similar functionality as in gensim: >>> your_word_vector = ...

vishalaksh

2,134

asked Jan 22, 2020 at 18:00

9 votes

1 answer

23k views

RuntimeError: The size of tensor a (4000) must match the size of tensor b (512) at non-singleton dimension 1

I'm trying to build a model for document classification. I'm using BERT with PyTorch. I got the bert model with below code. bert = AutoModel.from_pretrained('bert-base-uncased') This is the code for ...

Venkatesh Dharavath

520

asked Nov 26, 2020 at 14:01

9 votes

1 answer

24k views

BERT tokenizer & model download

I`m beginner.. I'm working with Bert. However, due to the security of the company network, the following code does not receive the bert model directly. tokenizer = BertTokenizer.from_pretrained('bert-...

ybin

575

asked Jan 12, 2020 at 7:56

9 votes

2 answers

7k views

How to get all documents per topic in bertopic modeling

I have a dataset and trying to convert it to topics using berTopic modeling but the problem is, i cant get all the docoments of a topic. berTopic is only return 3 docoments per topic. topic_model = ...

Kaleem

91

asked Oct 27, 2021 at 14:52

9 votes

2 answers

3k views

BERT output not deterministic

BERT output is not deterministic. I expect the output values are deterministic when I put a same input, but my bert model the values are changing. Sounds awkwardly, the same value is returned twice, ...

Keanu Paik

314

asked Jun 17, 2019 at 23:17

9 votes

2 answers

12k views

Outputting attention for bert-base-uncased with huggingface/transformers (torch)

I was following a paper on BERT-based lexical substitution (specifically trying to implement equation (2) - if someone has already implemented the whole paper that would also be great). Thus, I wanted ...

Björn

674

asked Feb 7, 2020 at 20:46

9 votes

4 answers

32k views

How to resolve ERROR: Could not build wheels for hdbscan, which is required to install pyproject.toml-based projects

I am trying to install bertopic and I got this error: pip install bertopic Collecting bertopic > Using cached bertopic-0.11.0-py2.py3-none-any.whl (76 kB) > Collecting ...

DorothyK

97

asked Jul 29, 2022 at 22:16

9 votes

1 answer

7k views

Clause extraction / long sentence segmentation in python

I'm currently working on a project involving sentence vectors (from a RoBERTa pretrained model). These vectors are lower quality when sentences are long, and my corpus contains many long sentences ...

Paul Miller

483

asked Dec 10, 2020 at 1:04

Collectives™ on Stack Overflow

Questions tagged [bert-language-model]

Related Tags