All Questions

Filter by
Sorted by
Tagged with
12 votes
8 answers
37k views

SSLError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /dslim/bert-base-NER/resolve/main/tokenizer_config.json

I am facing below issue while loading the pretrained BERT model from HuggingFace due to SSL certificate error. Error: SSLError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries ...
Nikita Malviya's user avatar
9 votes
1 answer
9k views

BERT document embedding

I am trying to do document embedding using BERT. The code I use is a combination of two sources. I use BERT Document Classification Tutorial with Code, and BERT Word Embeddings Tutorial. Below is the ...
MRM's user avatar
  • 1,159
8 votes
1 answer
14k views

How to store Word vector Embeddings?

I am using BERT Word Embeddings for sentence classification task with 3 labels. I am using Google Colab for coding. My problem is, since I will have to execute the embedding part every time I restart ...
PeakyBlinder's user avatar
  • 1,107
6 votes
1 answer
9k views

Using BERT Embeddings in Keras Embedding layer

I want to use the BERT Word Vector Embeddings in the Embeddings layer of LSTM instead of the usual default embedding layer. Is there any way I can do it?
PeakyBlinder's user avatar
  • 1,107
6 votes
0 answers
6k views

How to add index to python FAISS incrementally

I am using Faiss to index my huge dataset embeddings, embedding generated from bert model. I want to add the embeddings incrementally, it is working fine if I only add it with faiss.IndexFlatL2 , but ...
DevPy's user avatar
  • 467
6 votes
0 answers
2k views

How to slice string depending on length of tokens

When I use (with a long test_text and short question): from transformers import BertTokenizer import torch from transformers import BertForQuestionAnswering tokenizer = BertTokenizer.from_pretrained('...
user avatar
5 votes
3 answers
6k views

AttributeError: 'str' object has no attribute 'dim' in pytorch

I got the following error output in the PyTorch when sent model predictions into the model. Does anyone know what's going on? Following are the architecture model that I created, in the error output, ...
Bei Zhao's user avatar
5 votes
1 answer
8k views

run python parameters in Google Colab

I am running a python file in Google Colab and getting an error. I am following a bert text classification example from this link; https://appliedmachinelearning.blog/2019/03/04/state-of-the-art-text-...
Mass17's user avatar
  • 1,575
5 votes
1 answer
612 views

Cast topic modeling outcome to dataframe

I have used BertTopic with KeyBERT to extract some topics from some docs from bertopic import BERTopic topic_model = BERTopic(nr_topics="auto", verbose=True, n_gram_range=(1, 4), ...
xavi's user avatar
  • 80
5 votes
1 answer
9k views

Get the value of '[UNK]' in BERT

I have designed a model based on BERT to solve NER task. I am using transformers library with the "dccuchile/bert-base-spanish-wwm-cased" pre-trained model. The problem comes when my model detect an ...
Javier Jiménez de la Jara's user avatar
5 votes
1 answer
6k views

Unable to use custom dataset: AttributeError: 'list' object has no attribute 'keys'

I am trying to train a classification model with a custom dataset using Huggingface Transformers, but I keep getting errors. Last error seems solvable but I somehow I do not understand how. What am I ...
Tommaso De Lorenzo's user avatar
4 votes
1 answer
7k views

How to prepare text for BERT - getting error

I am trying to learn BERT for text classification. I am finding some problem in preparing data for using BERT. From my Dataset, I am segregating the sentiments and reviews as: X = df['sentiments'] y = ...
K C's user avatar
  • 433
3 votes
3 answers
5k views

what is the difference between pooled output and sequence output in bert layer?

everyone! I was reading about Bert and wanted to do text classification with its word embeddings. I came across this line of code: pooled_output, sequence_output = self.bert_layer([input_word_ids, ...
mitra mirshafiee's user avatar
3 votes
1 answer
2k views

Bert pre-trained model giving random output each time

I was trying to add an additional layer after huggingface bert transformer, so I used BertForSequenceClassification inside my nn.Module Network. But, I see the model is giving me random outputs when ...
user avatar
3 votes
1 answer
3k views

How to combine embeddins vectors of bert with other features?

I am working on a classification task with 3 labels (0,1,2 = neg, pos, neu). Data are sentences. So to produce vectors/embeddings of sentences, I use a Bert encoder to get embeddings for each sentence ...
emma's user avatar
  • 323
3 votes
3 answers
2k views

Transformers pipeline model directory

I'm using the Huggingface's Transformers pipeline function to download the model and the tokenizer, my Windows PC downloaded them but I don't know where they are stored on my PC. Can you please help ...
Luan Tran's user avatar
  • 404
3 votes
0 answers
164 views

How to get the most similar match using BERT from a pandas column to an input string?

I am trying to find the most similar match in a column of a pandas dataframe to an input string that is not in English (Swedish). This is what I have tried. I have encoded both my input string and the ...
Vai's user avatar
  • 179
2 votes
1 answer
2k views

zsh: no matches found: bertopic[visualization] [duplicate]

I am trying to install bertopic[visualization] in my macbook pro using pip3 install bertopic[visualization] but I am getting an error whenever I am running the above command. The error is as given ...
Nayana Madhu's user avatar
  • 1,195
2 votes
2 answers
1k views

Map BERTopic topic IDs back to the training dataframe

I have trained a BERTopic model on a dataframe of length of 400k. I want to map the topics of each document in a new column inside the dataframe. I could do that by running a for loop on all the ...
Vai's user avatar
  • 179
2 votes
2 answers
5k views

"Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers." ValueError: Input is not valid

I am using Bert tokenizer for french and I am getting this error but I do not seems to solutionated it. If you have a suggestion. Traceback (most recent call last): File "training_cross_data_2....
emma's user avatar
  • 323
2 votes
1 answer
2k views

RuntimeError: Given groups=3, weight of size 12 64 3 768, expected input[32, 12, 30, 768] to have 192 channels, but got 12 channels instead

I started working with Pytorch recently so my understanding of it isn't quite strong. I previously had a 1 layer CNN but wanted to extend it to 2 layers, but the input and output channels have been ...
KoKo's user avatar
  • 379
2 votes
1 answer
3k views

How to freeze some layers of BERT in fine tuning in tf2.keras

I am trying to fine-tune 'bert-based-uncased' on a dataset for a text classification task. Here is the way I am downloading the model: import tensorflow as tf from transformers import ...
Masoud's user avatar
  • 108
2 votes
1 answer
3k views

Calculate precision, recall, f1 score for custom dataset for multiclass classification Huggingface library

I am trying to do multiclass classification for the sentence pair task. I uploaded my custom dataset of train and test separately in the hugging face data set and trained my model and tested it and ...
Alex Kujur's user avatar
2 votes
2 answers
2k views

Fine-tune BERT for a specific domain on a different language?

I want to fine-tune on a pre-trained BERT model. However, my task uses data within a specific domain (say biomedical data). Additionally, my data is also in a language different from English (say ...
Moonreaderx's user avatar
2 votes
1 answer
949 views

BERT: How to use bert-as-service with BioBERT?

bioBERT is throwing error mentioned down below : But I can able to run other BERT versions uncased_L-12_H-768_A-12 and sciBERT of BERT using below statement: bert-serving-start -model_dir C:\Users\...
Soumyaansh's user avatar
  • 8,870
2 votes
0 answers
496 views

How to resolve the mismatch of pre-trained model parameter and current parameter?

I'm using pre-trained BERT model for NER task(bert-base-NER) and I need more token categories than the model had(PER,LOC,ORG,MIS,O). Based on that I create my own dataset which include 7 categories, ...
onevholy's user avatar
2 votes
1 answer
388 views

BERTopic: pop from empty list IndexError while Inferencing

I have trained a BERTopic model on colab and I am now trying to use it locally I get the IndexError. IndexError: Failed in nopython mode pipeline (step: analyzing bytecode) pop from empty list The ...
Vai's user avatar
  • 179
2 votes
0 answers
500 views

Using RoBERTa model with transformers-interpret library

I've been trying to use transformers-interpret library and have been successful in getting the results for facebook's BART model, but not for the RoBERTa. My code goes as follows for the BART model : ...
Malik's user avatar
  • 49
2 votes
0 answers
558 views

TypeError: dropout(): argument 'input' (position 1) must be Tensor, not str Bert Model

Hi I encounter this error when I was training my Bert Model for sentiment analysis, where my classes have 3 outcomes and my input data is text. So I got the above error when I am training the model. I ...
DDM's user avatar
  • 313
2 votes
2 answers
2k views

Python RuntimeError: input sequence

I try to run NER in Indonesian Language I've read some resources, they said that the BERT model has positional embeddings only for first 512 subtokens. So, the model can't work with longer sequences. ...
winnie's user avatar
  • 135
1 vote
2 answers
5k views

Pytorch - Caught StopIteration in replica 1 on device 1 error while Training on GPU

I am trying to train a BertPunc model on the train2012 data used in the git link: https://github.com/nkrnrnk/BertPunc. While running on the server, with 4 GPUs enabled, below is the error I get: ...
Varun kadekar's user avatar
1 vote
1 answer
1k views

BERTopic Embeddings ValueError when transform a new text

I have created embeddings using SentenceTransformer and trained a BERTopic model on those embeddings. sentence_model = SentenceTransformer("all-MiniLM-L6-v2") embeddings = sentence_model....
Vai's user avatar
  • 179
1 vote
2 answers
2k views

How can I train an XGBoost with a generator?

I'm attempting to stack a BERT tensorflow model with and XGBoost model in python. To do this, I have trained the BERT model and and have a generator that takes the predicitons from BERT (which ...
DrRaspberry's user avatar
1 vote
1 answer
1k views

BERT binary Textclassification get different results every run

I do binary text classification with BERT from the Simpletransformer. I work in Colab with GPU runtime type. I have generated train and test set with the sklearn StratifiedKFold Method. I have two ...
rambutan's user avatar
  • 389
1 vote
1 answer
298 views

get contrastive_logits_per_image with flava model using huggingface library

I have used a code of Flava model from this link: https://huggingface.co/docs/transformers/model_doc/flava#transformers.FlavaModel.forward.example But I am getting the following error: '...
lazytux's user avatar
  • 167
1 vote
1 answer
525 views

How to add simple custom pytorch-crf layer on top of TokenClassification model using pytorch and Trainer

I followed this link, but its implemented in Keras. Cannot add CRF layer on top of BERT in keras for NER Model description Is it possible to add simple custom pytorch-crf layer on top of ...
MAC's user avatar
  • 1,455
1 vote
1 answer
551 views

AI Based Deduplication using Textual Similarity Measure in Python

Given I have a dataframe that contains rows like this ID Title Abstract Keywords Author Year 5875 Textual Similarity: A Review Textual Similarity has been used for measuring ... X, Y, Z James Thomas ...
saving_space's user avatar
1 vote
0 answers
80 views

Python BERTopic 'numpy.float64' object cannot be interpreted as an integer

I am trying to replicate the Topic Modeling exercise from this article titled NLP Tutorial: Topic Modeling in Python with BerTopic. The article comes from the website HackerNoon if you'd prefer to ...
user432299's user avatar
1 vote
0 answers
193 views

What if I have too many documents labelled in -1 cluster in bertopic?

I'm generating topics using bertopic on multilingual dataset (mainly Russian and English). I'm reducing the number of topics to 140. After generating topics, I'm analyzing its quality using the ...
ApaarBawa's user avatar
1 vote
0 answers
146 views

Classification report in multi label

I try to use BERT for multi-label tasks. My data set has 1000 data. I first use train_test_split to use 80% of my data set as a training set and 20% as a verification set. It is reasonable to say that ...
David's user avatar
  • 11
1 vote
1 answer
499 views

ValueError: [E109] Component 'tagger' could not be run. Did you forget to call `initialize()`?

I use jupyter notebook for writing code, but our team wants me to write code using visual studio code so we can do version control and merges in Git. I set up my environment with new versions of ...
GILO Technologies's user avatar
1 vote
1 answer
653 views

How to read BertForMaskedLM with BertModel?

I have fine-tuned BertForMaskedLM and now I want to read it with BertModel. But my saved model looks like this: BertForMaskedLM( (bert): BertModel( (embeddings): BertEmbeddings( (...
user avatar
1 vote
0 answers
3k views

How to pop elements from a tensor in Pytorch?

I want to drop/pop elements from a tensor in Pytorch, something similar to pop operation in python. In the following code , if the condition is met, it removes two elements from the array, current and ...
Ara's user avatar
  • 145
1 vote
0 answers
630 views

finBert Model - Config JSON File - Outputs Nothing

This is for running the ProsusAI finBert Model. (https://github.com/ProsusAI/finBERT - GitHub) (https://huggingface.co/ProsusAI/finbert - HuggingFace) I downloaded the pytorch_model.bin file and used ...
Calculate's user avatar
  • 345
1 vote
1 answer
931 views

Getting predict.proba from BERT classififer

I have a classifier on top of BERT, and I would like to see the predict probability for creating the ROC curve. How do I get the predict proba?. The predicted probas will be used to calculate the TPR ...
rickyfajrin93's user avatar
1 vote
1 answer
2k views

Cannot import name 'network' from 'tensorflow.python.keras.engine'

When trying to load BERT QA I get the following ImportError: "Cannot import name 'network' from 'tensorflow.python.keras.engine'" The full error log follows below Following this post, ...
user810643's user avatar
1 vote
0 answers
633 views

Microsoft LayoutLM model error with huggingface

I was trying to utilize the https://github.com/microsoft/unilm/tree/master/layoutlm for document classification purpose, but was constantly getting "OSError: Unable to load weights from pytorch ...
Riya Paul's user avatar
1 vote
0 answers
61 views

How to Local Bert to Bert_module_hub

I just want to my Local Bert to here: bert_module = hub.Module( BERT_MODEL_HUB, trainable=True) How to add my local bert? i have Tensorflow==1.15 and python==3.7 def create_model(is_predicting, ...
Bold Ganbaatar's user avatar
1 vote
1 answer
557 views

'list' object has no attribute 'shape

I am passing an embedding matrix to the embedding layer in Keras model = Sequential() model.add(Embedding(max_words, 30, input_length=max_len, weights=[all])) model.add(BatchNormalization()) model.add(...
PeakyBlinder's user avatar
  • 1,107
1 vote
0 answers
441 views

BERT - modify run_squad.py predictions file

I'm new to BERT and I'm trying to edit the output of run_squad.py for build up a Question Answering system and obtain an output file with the following structure: { "data": [ { "...
hera hoc's user avatar