Highest scored 'bert-language-model+pytorch' questions

50 votes

10 answers

125k views

CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)

I got the following error when I ran my PyTorch deep learning model in Google Colab /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias) 1370 ret = ...

Mr. NLP

971

asked Apr 28, 2020 at 5:39

39 votes

3 answers

36k views

dropout(): argument 'input' (position 1) must be Tensor, not str when using Bert with Huggingface

My code was working fine and when I tried to run it today without changing anything I got the following error: dropout(): argument 'input' (position 1) must be Tensor, not str Would appreciate if ...

Tashinga Musanhu

401

asked Nov 30, 2020 at 22:45

24 votes

1 answer

49k views

How does max_length, padding and truncation arguments work in HuggingFace' BertTokenizerFast.from_pretrained('bert-base-uncased')?

I am working with Text Classification problem where I want to use the BERT model as the base followed by Dense layers. I want to know how does the 3 arguments work? For example, if I have 3 sentences ...

Deshwal

3,872

asked Dec 11, 2020 at 6:26

24 votes

1 answer

62k views

PyTorch BERT TypeError: forward() got an unexpected keyword argument 'labels'

Training a BERT model using PyTorch transformers (following the tutorial here). Following statement in the tutorial loss = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask, labels=...

PinkBanter

1,826

asked Oct 18, 2019 at 15:42

22 votes

6 answers

27k views

AttributeError: module 'torch' has no attribute '_six'. Bert model in Pytorch

I tried to load pre-trained model by using BertModel class in pytorch. I have _six.py under torch, but it still shows module 'torch' has no attribute '_six' import torch from pytorch_pretrained_bert ...

Ruitong LIU

221

asked May 21, 2019 at 15:41

21 votes

1 answer

30k views

PyTorch: RuntimeError: Input, output and indices must be on the current device

I am running a BERT model on torch. It's a multi-class sentiment classification task with about 30,000 rows. I have already put everything on cuda, but not sure why I'm getting the following run time ...

Roy

984

asked Nov 19, 2020 at 15:17

19 votes

5 answers

68k views

Pytorch: IndexError: index out of range in self. How to solve?

This training code is based on the run_glue.py script found here: # Set the seed value all over the place to make this reproducible. seed_val = 42 random.seed(seed_val) np.random.seed(seed_val) torch....

sylvester

243

asked May 29, 2020 at 7:51

18 votes

1 answer

12k views

BertForSequenceClassification vs. BertForMultipleChoice for sentence multi-class classification

I'm working on a text classification problem (e.g. sentiment analysis), where I need to classify a text string into one of five classes. I just started using the Huggingface Transformer package and ...

stackoverflowuser2010

39.8k

asked Mar 10, 2020 at 1:02

17 votes

2 answers

33k views

The size of tensor a (707) must match the size of tensor b (512) at non-singleton dimension 1

I am trying to do text classification using pretrained BERT model. I trained the model on my dataset, and in the phase of testing; I know that BERT can only take to 512 tokens, so I wrote if condition ...

Mee

1,561

asked Oct 12, 2020 at 15:34

17 votes

2 answers

11k views

Difficulty in understanding the tokenizer used in Roberta model

from transformers import AutoModel, AutoTokenizer tokenizer1 = AutoTokenizer.from_pretrained("roberta-base") tokenizer2 = AutoTokenizer.from_pretrained("bert-base-cased") sequence = "A Titan RTX has ...

Mr. NLP

971

asked Apr 10, 2020 at 4:58

16 votes

3 answers

23k views

How to understand hidden_states of the returns in BertModel?(huggingface-transformers)

Returns last_hidden_state (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size)): Sequence of hidden-states at the output of the last layer of the model. pooler_output (torch....

island145287

211

asked Apr 20, 2020 at 13:26

15 votes

3 answers

35k views

Python: BERT Error - Some weights of the model checkpoint at were not used when initializing BertModel

I am creating an entity extraction model in PyTorch using bert-base-uncased but when I try to run the model I get this error: Error: Some weights of the model checkpoint at D:\Transformers\bert-entity-...

Ishan Dutta

917

asked May 15, 2021 at 12:50

14 votes

1 answer

14k views

PyTorch torch.no_grad() versus requires_grad=False

I'm following a PyTorch tutorial which uses the BERT NLP model (feature extractor) from the Huggingface Transformers library. There are two pieces of interrelated code for gradient updates that I don'...

stackoverflowuser2010

39.8k

asked Sep 7, 2020 at 23:23

13 votes

4 answers

8k views

How to fine tune BERT on unlabeled data?

I want to fine tune BERT on a specific domain. I have texts of that domain in text files. How can I use these to fine tune BERT? I am looking here currently. My main objective is to get sentence ...

Rish

541

asked May 22, 2020 at 19:42

12 votes

4 answers

11k views

Training TFBertForSequenceClassification with custom X and Y data

I am working on a TextClassification problem, for which I am trying to traing my model on TFBertForSequenceClassification given in huggingface-transformers library. I followed the example given on ...

Rahul Goel

872

asked Feb 29, 2020 at 9:49

12 votes

3 answers

37k views

OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index']

When I load the BERT pretrained model online I get this error OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index'] found in directory uncased_L-12_H-768_A-12 or '...

Asma

189

asked Jul 17, 2020 at 20:52

12 votes

2 answers

5k views

Get probability of multi-token word in MASK position

It is relatively easy to get a token's probability according to a language model, as the snippet below shows. You can get the output of a model, restrict yourself to the output of the masked token, ...

Bram Vanroy

27.7k

asked Dec 21, 2019 at 9:24

11 votes

1 answer

2k views

what's the difference between "self-attention mechanism" and "full-connection" layer?

I am confused with these two structures. In theory, the output of them are all connected to their input. what magic make 'self-attention mechanism' is more powerful than the full-connection layer?

tom_cat

325

asked Oct 6, 2020 at 2:50

10 votes

3 answers

9k views

Using trained BERT Model and Data Preprocessing

When using a pre-trained BERT embeddings from pytorch (which are then fine-tuned), should the text data fed into the model be pre-processed like in any standard NLP task? For instance, should ...

SFD

575

asked Sep 20, 2020 at 13:33

10 votes

3 answers

12k views

BertTokenizer - when encoding and decoding sequences extra spaces appear

When using Transformers from HuggingFace I am facing a problem with the encoding and decoding method. I have a the following string: test_string = 'text with percentage%' Then I am running the ...

Henryk Borzymowski

1,058

asked Nov 21, 2019 at 16:43

9 votes

1 answer

23k views

RuntimeError: The size of tensor a (4000) must match the size of tensor b (512) at non-singleton dimension 1

I'm trying to build a model for document classification. I'm using BERT with PyTorch. I got the bert model with below code. bert = AutoModel.from_pretrained('bert-base-uncased') This is the code for ...

Venkatesh Dharavath

520

asked Nov 26, 2020 at 14:01

9 votes

1 answer

24k views

BERT tokenizer & model download

I`m beginner.. I'm working with Bert. However, due to the security of the company network, the following code does not receive the bert model directly. tokenizer = BertTokenizer.from_pretrained('bert-...

ybin

575

asked Jan 12, 2020 at 7:56

9 votes

1 answer

8k views

How do I use BertForMaskedLM or BertModel to calculate perplexity of a sentence?

I want to use BertForMaskedLM or BertModel to calculate perplexity of a sentence, so I write code like this: import numpy as np import torch import torch.nn as nn from transformers import ...

Kaim hong

113

asked Jul 22, 2020 at 9:07

8 votes

1 answer

9k views

How to calculate perplexity of a sentence using huggingface masked language models?

I have several masked language models (mainly Bert, Roberta, Albert, Electra). I also have a dataset of sentences. How can I get the perplexity of each sentence? From the huggingface documentation ...

Penguin

2,148

asked Dec 23, 2021 at 15:50

8 votes

3 answers

5k views

How to compute mean/max of HuggingFace Transformers BERT token embeddings with attention mask?

I'm using the HuggingFace Transformers BERT model, and I want to compute a summary vector (a.k.a. embedding) over the tokens in a sentence, using either the mean or max function. The complication is ...

stackoverflowuser2010

39.8k

asked Dec 1, 2020 at 1:38

7 votes

1 answer

5k views

How exactly should the input file be formatted for the language model finetuning (BERT through Huggingface Transformers)?

I wanted to employ the examples/run_lm_finetuning.py from the Huggingface Transformers repository on a pretrained Bert model. However, from following the documentation it is not evident how a corpus ...

nminds

79

asked Jan 31, 2020 at 10:02

7 votes

2 answers

14k views

The model did not return a loss from the inputs - LabSE error

I want to fine tune LabSE for Question answering using squad dataset. and i got this error: ValueError: The model did not return a loss from the inputs, only the following keys: last_hidden_state,...

Mateusz Pasierbek

81

asked Aug 9, 2022 at 10:43

7 votes

2 answers

6k views

The essence of learnable positional embedding? Does embedding improve outcomes better?

I was recently reading the bert source code from the hugging face project. I noticed that the so-called "learnable position encoding" seems to refer to a specific nn.Parameter layer when it ...

AdamHommer

728

asked Jul 25, 2022 at 17:37

7 votes

1 answer

8k views

Mismatched size on BertForSequenceClassification from Transformers and multiclass problem

I just trained a BERT model on a Dataset composed by products and labels (departments) for an e-commerce website. It's a multiclass problem. I used BertForSequenceClassification to predict the ...

Guilherme Giuliano Nicolau

323

asked Sep 15, 2021 at 14:01

7 votes

1 answer

4k views

ModuleNotFoundError: No module named 'torch.utils._pytree'

I have installed PyTorch 1.7.1, and it works very well. However, when I try to run this code: import transformers from transformers import BertTokenizer from transformers.models.bert.modeling_bert ...

A_B_Y

422

asked Aug 31, 2023 at 6:36

6 votes

1 answer

8k views

huggingface bert showing poor accuracy / f1 score [pytorch]

I am trying BertForSequenceClassification for a simple article classification task. No matter how I train it (freeze all layers but the classification layer, all layers trainable, last k layers ...

Zabir Al Nazi

11k

asked May 23, 2020 at 9:11

6 votes

1 answer

34k views

Pytorch expects each tensor to be equal size

When running this code: embedding_matrix = torch.stack(embeddings) I got this error: RuntimeError: stack expects each tensor to be equal size, but got [7, 768] at entry 0 and [8, 768] at entry 1 I'm ...

sam

79

asked Feb 6, 2022 at 20:23

5 votes

1 answer

13k views

TypeError: linear(): argument 'input' (position 1) must be Tensor, not str

so ive been trying to work on some example of bert that i found on github as its the first time im trying to use bert and see how it works. The respiratory im working with is the following: https://...

user14388704

asked Mar 28, 2021 at 20:43

5 votes

1 answer

5k views

How to get the probability of a particular token(word) in a sentence given the context

I'm trying to calculate the probability or any type of score for words in a sentence using NLP. I've tried this approach with GPT2 model using Huggingface Transformers library, but, I couldn't get ...

Dilrukshi Perera

947

asked May 14, 2020 at 1:45

5 votes

1 answer

818 views

How to save parameters just related to classifier layer of pretrained bert model due to the memory concerns?

I fine tuned the pretrained model here by freezing all layers except the classifier layers. And I saved weight file with using pytorch as .bin format. Now instead of loading the 400mb pre-trained ...

Mehmet Calikus

123

asked Aug 17, 2021 at 8:28

5 votes

2 answers

2k views

BERT-based NER model giving inconsistent prediction when deserialized

I am trying to train an NER model using the HuggingFace transformers library on Colab cloud GPUs, pickle it and load the model on my own CPU to make predictions. Code The model is the following: from ...

flyingjapans

91

asked Oct 30, 2020 at 15:00

5 votes

2 answers

3k views

Can I use BERT as a feature extractor without any finetuning on my specific data set?

I'm trying to solve a multilabel classification task of 10 classes with a relatively balanced training set consists of ~25K samples and an evaluation set consists of ~5K samples. I'm using the ...

SBflying

71

asked Oct 25, 2020 at 17:42

5 votes

1 answer

2k views

Does BertForSequenceClassification classify on the CLS vector?

I'm using the Huggingface Transformer package and BERT with PyTorch. I'm trying to do 4-way sentiment classification and am using BertForSequenceClassification to build a model that eventually leads ...

stackoverflowuser2010

39.8k

asked Mar 26, 2020 at 21:27

5 votes

1 answer

9k views

Get the value of '[UNK]' in BERT

I have designed a model based on BERT to solve NER task. I am using transformers library with the "dccuchile/bert-base-spanish-wwm-cased" pre-trained model. The problem comes when my model detect an ...

Javier Jiménez de la Jara

515

asked Feb 12, 2020 at 16:07

4 votes

3 answers

5k views

How to apply max_length to truncate the token sequence from the left in a HuggingFace tokenizer?

In the HuggingFace tokenizer, applying the max_length argument specifies the length of the tokenized text. I believe it truncates the sequence to max_length-2 (if truncation=True) by cutting the ...

Ondrej Sotolar

1,382

asked May 11, 2022 at 13:52

4 votes

1 answer

788 views

pytorch model evaluation slow when deployed on kubernetes

I would like to make the result of a text classification model (finBERT pytorch model) available through an endpoint that is deployed on Kubernetes. The whole pipeline is working but it's super slow ...

move_ludwig

41

asked Aug 12, 2021 at 13:43

4 votes

1 answer

2k views

Correct Way to Fine-Tune/Train HuggingFace's Model from scratch (PyTorch)

For example, I want to train a BERT model from scratch but using the existing configuration. Is the following code the correct way to do so? model = BertModel.from_pretrained('bert-base-cased') model....

Allan-J

335

asked Aug 19, 2020 at 1:57

4 votes

1 answer

5k views

Why do we need state_dict = state_dict.copy()

I want to load the weights of a pre-trained model on my local model. I don’t understand why state_dict = state_dict.copy() is necessary if the two networks have the same name state_dict. # copy ...

dan

41

asked Apr 30, 2020 at 20:07

4 votes

2 answers

4k views

How to convert model.safetensor to pytorch_model.bin?

I'm fine tuning a pre-trained bert model and i have a weird problem: When i'm fine tuning using the CPU, the code saves the model like this: With the "pytorch_model.bin". But when i use ...

Gabriel Henrique

53

asked Dec 23, 2023 at 20:43

4 votes

1 answer

2k views

PyTorch tokenizers: how to truncate tokens from left?

As we can see in the below code snippet, specifying max_length and truncation for a tokenizer cuts excess tokens from the left: tokenizer("hello, my name", truncation=True, max_length=6).input_ids ...

aayc

41

asked Feb 13, 2022 at 18:44

4 votes

1 answer

5k views

PyTorch GPU memory leak during inference

I am trying to encode documents sentence-wise with a huggingface transformer module. I'm using the very small google/bert_uncased_L-2_H-128_A-2 pretrained model with the following code: def ...

Marco Moldovan

41

asked Jan 26, 2021 at 18:16

4 votes

1 answer

3k views

How to process TransformerEncoderLayer output in pytorch

I am trying to use bio-bert sentence embeddings for text classification of longer pieces of text. As it currently stands I standardize the number of sentences in each piece of text (some sentences are ...

Wackaman

161

asked Dec 7, 2020 at 22:12

4 votes

0 answers

469 views

How to train a Masked Language Model with a big text corpus(200GB) using PyTorch？

Recently I am training a masked language model with a big text corpus(200GB) using transformers. The training data is too big to fit into computer equiped with 512GB memory and V100(32GB)*8. Is it ...

Chirs

73

asked Mar 3, 2021 at 6:29

4 votes

0 answers

1k views

Word embeddings with BERT and map tensors to words

I try to aggregate BERT embeddings on the token level. For each token in the corpus vocabulary, I would like to create a list of all their contextual embeddings and average them to get one ...

Andrej

3,799

asked Aug 4, 2020 at 9:49

3 votes

4 answers

18k views

Cannot import BertModel from transformers

I am trying to import BertModel from transformers, but it fails. This is code I am using from transformers import BertModel, BertForMaskedLM This is the error I get ImportError: cannot import name '...

Moaz Mohammed Husain

107

asked Jun 15, 2020 at 10:47

Collectives™ on Stack Overflow

All Questions

Related Tags