Highest scored 'bert-language-model+nlp' questions

94 votes

10 answers

95k views

How to use Bert for long text classification?

We know that BERT has a max length limit of tokens = 512, So if an article has a length of much bigger than 512, such as 10000 tokens in text How can BERT be used?

user1337896

1,221

asked Oct 31, 2019 at 3:34

50 votes

10 answers

125k views

CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)

I got the following error when I ran my PyTorch deep learning model in Google Colab /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias) 1370 ret = ...

Mr. NLP

971

asked Apr 28, 2020 at 5:39

31 votes

6 answers

40k views

How to cluster similar sentences using BERT

For ElMo, FastText and Word2Vec, I'm averaging the word embeddings within a sentence and using HDBSCAN/KMeans clustering to group similar sentences. A good example of the implementation can be seen ...

somethingstrang

1,123

asked Apr 10, 2019 at 18:31

22 votes

6 answers

27k views

AttributeError: module 'torch' has no attribute '_six'. Bert model in Pytorch

I tried to load pre-trained model by using BertModel class in pytorch. I have _six.py under torch, but it still shows module 'torch' has no attribute '_six' import torch from pytorch_pretrained_bert ...

Ruitong LIU

221

asked May 21, 2019 at 15:41

21 votes

1 answer

30k views

PyTorch: RuntimeError: Input, output and indices must be on the current device

I am running a BERT model on torch. It's a multi-class sentiment classification task with about 30,000 rows. I have already put everything on cuda, but not sure why I'm getting the following run time ...

Roy

984

asked Nov 19, 2020 at 15:17

17 votes

5 answers

67k views

Transformer: Error importing packages. "ImportError: cannot import name 'SAVE_STATE_WARNING' from 'torch.optim.lr_scheduler'"

I am working on a machine learning project on Google Colab, it seems recently there is an issue when trying to import packages from transformers. The error message says: ImportError: cannot import ...

Spartan 332

231

asked Mar 11, 2021 at 21:43

17 votes

2 answers

11k views

Difficulty in understanding the tokenizer used in Roberta model

from transformers import AutoModel, AutoTokenizer tokenizer1 = AutoTokenizer.from_pretrained("roberta-base") tokenizer2 = AutoTokenizer.from_pretrained("bert-base-cased") sequence = "A Titan RTX has ...

Mr. NLP

971

asked Apr 10, 2020 at 4:58

16 votes

3 answers

23k views

How to understand hidden_states of the returns in BertModel?(huggingface-transformers)

Returns last_hidden_state (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size)): Sequence of hidden-states at the output of the last layer of the model. pooler_output (torch....

island145287

211

asked Apr 20, 2020 at 13:26

15 votes

3 answers

35k views

Python: BERT Error - Some weights of the model checkpoint at were not used when initializing BertModel

I am creating an entity extraction model in PyTorch using bert-base-uncased but when I try to run the model I get this error: Error: Some weights of the model checkpoint at D:\Transformers\bert-entity-...

Ishan Dutta

917

asked May 15, 2021 at 12:50

13 votes

4 answers

8k views

How to fine tune BERT on unlabeled data?

I want to fine tune BERT on a specific domain. I have texts of that domain in text files. How can I use these to fine tune BERT? I am looking here currently. My main objective is to get sentence ...

Rish

541

asked May 22, 2020 at 19:42

12 votes

2 answers

12k views

How to train BERT from scratch on a new domain for both MLM and NSP?

I’m trying to train BERT model from scratch using my own dataset using HuggingFace library. I would like to train the model in a way that it has the exact architecture of the original BERT model. In ...

tlqn

379

asked Jan 9, 2021 at 19:46

12 votes

1 answer

8k views

What is the difference between Sentence Encodings and Contextualized Word Embeddings?

I have seen both terms used while reading papers about BERT and ELMo so I wonder if there is a difference between them.

Rodrigo

133

asked Jan 23, 2020 at 11:20

12 votes

4 answers

11k views

Training TFBertForSequenceClassification with custom X and Y data

I am working on a TextClassification problem, for which I am trying to traing my model on TFBertForSequenceClassification given in huggingface-transformers library. I followed the example given on ...

Rahul Goel

872

asked Feb 29, 2020 at 9:49

11 votes

2 answers

14k views

Continual pre-training vs. Fine-tuning a language model with MLM

I have some custom data I want to use to further pre-train the BERT model. I’ve tried the two following approaches so far: Starting with a pre-trained BERT checkpoint and continuing the pre-training ...

Pedram

2,531

asked Jul 20, 2021 at 20:52

11 votes

2 answers

13k views

How to use Transformers for text classification?

I have two questions about how to use Tensorflow implementation of the Transformers for text classifications. First, it seems people mostly used only the encoder layer to do the text classification ...

khemedi

806

asked Sep 26, 2019 at 19:18

11 votes

1 answer

6k views

what is so special about special tokens?

what exactly is the difference between "token" and a "special token"? I understand the following: what is a typical token what is a typical special token: MASK, UNK, SEP, etc when ...

ShaoMin Liu

123

asked Mar 30, 2022 at 14:58

10 votes

4 answers

14k views

Is it necessary to do stopwords removal ,Stemming/Lemmatization for text classification while using Spacy,Bert?

Is stopwords removal ,Stemming and Lemmatization necessary for text classification while using Spacy,Bert or other advanced NLP models for getting the vector embedding of the text ? text="The ...

star

254

asked Aug 28, 2020 at 12:10

10 votes

3 answers

9k views

Using trained BERT Model and Data Preprocessing

When using a pre-trained BERT embeddings from pytorch (which are then fine-tuned), should the text data fed into the model be pre-processed like in any standard NLP task? For instance, should ...

SFD

575

asked Sep 20, 2020 at 13:33

9 votes

2 answers

12k views

How to find the closest word to a vector using BERT

I am trying to get textual representation(or the closest word) of given word embedding using BERT. Basically I am trying to get similar functionality as in gensim: >>> your_word_vector = ...

vishalaksh

2,134

asked Jan 22, 2020 at 18:00

9 votes

2 answers

7k views

How to get all documents per topic in bertopic modeling

I have a dataset and trying to convert it to topics using berTopic modeling but the problem is, i cant get all the docoments of a topic. berTopic is only return 3 docoments per topic. topic_model = ...

Kaleem

91

asked Oct 27, 2021 at 14:52

9 votes

2 answers

3k views

BERT output not deterministic

BERT output is not deterministic. I expect the output values are deterministic when I put a same input, but my bert model the values are changing. Sounds awkwardly, the same value is returned twice, ...

Keanu Paik

314

asked Jun 17, 2019 at 23:17

9 votes

1 answer

7k views

Clause extraction / long sentence segmentation in python

I'm currently working on a project involving sentence vectors (from a RoBERTa pretrained model). These vectors are lower quality when sentences are long, and my corpus contains many long sentences ...

Paul Miller

483

asked Dec 10, 2020 at 1:04

9 votes

1 answer

4k views

Why BERT model have to keep 10% MASK token unchanged?

I am reading BERT model paper. In Masked Language Model task during pre-training BERT model, the paper said the model will choose 15% token ramdomly. In the chose token (Ti), 80% it will be replaced ...

Thanh Kiet

131

asked Sep 22, 2020 at 16:20

9 votes

1 answer

8k views

How do I use BertForMaskedLM or BertModel to calculate perplexity of a sentence?

I want to use BertForMaskedLM or BertModel to calculate perplexity of a sentence, so I write code like this: import numpy as np import torch import torch.nn as nn from transformers import ...

Kaim hong

113

asked Jul 22, 2020 at 9:07

8 votes

1 answer

18k views

How is the number of parameters be calculated in BERT model?

The paper "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Devlin & Co. calculated for the base model size 110M parameters (i.e. L=12, H=768, A=12) ...

EchoCache

575

asked Oct 22, 2020 at 15:41

8 votes

1 answer

9k views

How to calculate perplexity of a sentence using huggingface masked language models?

I have several masked language models (mainly Bert, Roberta, Albert, Electra). I also have a dataset of sentences. How can I get the perplexity of each sentence? From the huggingface documentation ...

Penguin

2,148

asked Dec 23, 2021 at 15:50

8 votes

1 answer

4k views

Uni-directional Transformer VS Bi-directional BERT

I just finished reading the Transformer paper and BERT paper. But couldn't figure out why Transformer is uni-directional and BERT is bi-directional as mentioned in BERT paper. As they don't use ...

JShen

409

asked Mar 12, 2019 at 4:23

8 votes

1 answer

14k views

How to store Word vector Embeddings?

I am using BERT Word Embeddings for sentence classification task with 3 labels. I am using Google Colab for coding. My problem is, since I will have to execute the embedding part every time I restart ...

PeakyBlinder

1,107

asked Jul 3, 2020 at 7:51

8 votes

1 answer

3k views

HuggingFace BERT `inputs_embeds` giving unexpected result

The HuggingFace BERT TensorFlow implementation allows us to feed in a precomputed embedding in place of the embedding lookup that is native to BERT. This is done using the model's call method's ...

Vivek Subramanian

1,174

asked May 2, 2020 at 23:18

7 votes

3 answers

19k views

Why can't I import functions in bert after pip install bert

I am a beginner for bert, and I am trying to use files of bert given on the GitHub:https://github.com/google-research/bert However I cannot import files(such as run_classifier, optimisation and so on)...

Vicky Ding

352

asked Jun 12, 2019 at 3:43

7 votes

2 answers

14k views

The model did not return a loss from the inputs - LabSE error

I want to fine tune LabSE for Question answering using squad dataset. and i got this error: ValueError: The model did not return a loss from the inputs, only the following keys: last_hidden_state,...

Mateusz Pasierbek

81

asked Aug 9, 2022 at 10:43

7 votes

1 answer

8k views

max_seq_length for transformer (Sentence-BERT)

I'm using sentence-BERT from Huggingface in the following way: from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') model.max_seq_length = 512 model....

BlackHawk

779

asked Mar 31, 2023 at 17:29

7 votes

1 answer

6k views

Passing multiple sentences to BERT?

I have a dataset with paragraphs that I need to classify into two classes. These paragraphs are usually 3-5 sentences long. The overwhelming majority of them are less than 500 words long. I would like ...

jhfodr76

107

asked Nov 17, 2020 at 18:50

7 votes

1 answer

14k views

How padding in huggingface tokenizer works?

I tried following tokenization example: tokenizer = BertTokenizer.from_pretrained(MODEL_TYPE, do_lower_case=True) sent = "I hate this. Not that.", _tokenized = tokenizer(sent, ...

MsA

2,829

asked Nov 22, 2021 at 14:43

7 votes

1 answer

2k views

Fine-tune Bert for specific domain (unsupervised)

I want to fine-tune BERT on texts that are related to a specific domain (in my case related to engineering). The training should be unsupervised since I don't have any labels or anything. Is this ...

spadel

1,036

asked Nov 6, 2020 at 9:54

7 votes

1 answer

2k views

Use BERT under spaCy to get sentence embeddings

I am trying to use BERT to get sentence embeddings. Here is how I am doing it: import spacy nlp = spacy.load("en_core_web_trf") nlp("The quick brown fox jumps over the lazy dog")....

owise

1,065

asked Jun 14, 2021 at 20:42

7 votes

1 answer

2k views

What is the significance of the magnitude/norm of BERT word embeddings?

We generally compare similarity between word embeddings with cosine similarity, but this only takes into account the angle between the vectors, not the norm. With word2vec, the norm of the vector ...

Keshinko

328

asked Jul 23, 2019 at 13:00

7 votes

1 answer

5k views

Token indices sequence length error when using encode_plus method

I got a strange error when trying to encode question-answer pairs for BERT using the encode_plus method provided in the Transformers library. I am using data from this Kaggle competition. Given a ...

Niels

1,191

asked Apr 20, 2020 at 12:12

6 votes

2 answers

11k views

BERT get sentence embedding

I am replicating code from this page. I have downloaded the BERT model to my local system and getting sentence embedding. I have around 500,000 sentences for which I need sentence embedding and it is ...

user2543622

6,258

asked Oct 10, 2021 at 17:32

6 votes

1 answer

11k views

BertWordPieceTokenizer vs BertTokenizer from HuggingFace

I have the following pieces of code and trying to understand the difference between BertWordPieceTokenizer and BertTokenizer. BertWordPieceTokenizer (Rust based) from tokenizers import ...

HopeKing

3,413

asked Jun 16, 2020 at 9:19

6 votes

2 answers

6k views

Can you train a BERT model from scratch with task specific architecture?

BERT pre-training of the base-model is done by a language modeling approach, where we mask certain percent of tokens in a sentence, and we make the model learn those missing mask. Then, I think in ...

viopu

71

asked May 15, 2020 at 19:21

6 votes

3 answers

5k views

How to stop BERT from breaking apart specific words into word-piece

I am using a pre-trained BERT model to tokenize a text into meaningful tokens. However, the text has many specific words and I don't want BERT model to break them into word-pieces. Is there any ...

parvaneh shayegh

517

asked May 29, 2020 at 9:37

6 votes

2 answers

3k views

How to test masked language model after training it?

I have followed this tutorial for masked language modelling from Hugging Face using BERT, but I am unsure how to actually deploy the model. Tutorial: https://github.com/huggingface/notebooks/blob/...

user14946125

asked Jun 5, 2021 at 15:49

6 votes

1 answer

5k views

Using BERT to generate similar word or synonyms through word embeddings

As we all know the capability of BERT model for word embedding, it is probably better than the word2vec and any other models. I want to create a model on BERT word embedding to generate synonyms or ...

DevPy

467

asked Jul 14, 2021 at 11:56

6 votes

1 answer

8k views

Sliding window for long text in BERT for Question Answering

I've read post which explains how the sliding window works but I cannot find any information on how it is actually implemented. From what I understand if the input are too long, sliding window can be ...

Benj

63

asked Jul 19, 2020 at 10:18

6 votes

1 answer

9k views

Using BERT Embeddings in Keras Embedding layer

I want to use the BERT Word Vector Embeddings in the Embeddings layer of LSTM instead of the usual default embedding layer. Is there any way I can do it?

PeakyBlinder

1,107

asked Jul 7, 2020 at 9:12

6 votes

2 answers

7k views

Latest Pre-trained Multilingual Word Embedding

Are there any latest pre-trained multilingual word embeddings (multiple languages are jointly mapped to a same vector space)? I have looked at the following but they don't fit my needs: FastText / ...

MachineLearner

423

asked Jun 15, 2020 at 9:13

6 votes

1 answer

939 views

How to predict the probability of an empty string using BERT

Suppose we have a template sentence like this: "The ____ house is our meeting place." and we have a list of adjectives to fill in the blank, e.g.: "yellow" "large" &...

brienna

1,474

asked Dec 27, 2021 at 23:12

6 votes

0 answers

6k views

How to add index to python FAISS incrementally

I am using Faiss to index my huge dataset embeddings, embedding generated from bert model. I want to add the embeddings incrementally, it is working fine if I only add it with faiss.IndexFlatL2 , but ...

DevPy

467

asked Nov 12, 2021 at 5:23

5 votes

1 answer

5k views

How to get the probability of a particular token(word) in a sentence given the context

I'm trying to calculate the probability or any type of score for words in a sentence using NLP. I've tried this approach with GPT2 model using Huggingface Transformers library, but, I couldn't get ...

Dilrukshi Perera

947

asked May 14, 2020 at 1:45

Collectives™ on Stack Overflow

All Questions

Related Tags