All Questions

Filter by
Sorted by
Tagged with
16 votes
2 answers
31k views

Download pre-trained sentence-transformers model locally

I am using the SentenceTransformers library (here: https://pypi.org/project/sentence-transformers/#pretrained-models) for creating embeddings of sentences using the pre-trained model bert-base-nli-...
neha tamore's user avatar
7 votes
1 answer
8k views

max_seq_length for transformer (Sentence-BERT)

I'm using sentence-BERT from Huggingface in the following way: from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') model.max_seq_length = 512 model....
BlackHawk's user avatar
  • 779
6 votes
1 answer
4k views

Fine-tuning BERT sentence transformer model

I am using a pre-trained BERT sentence transformer model, as described here https://www.sbert.net/docs/training/overview.html , to get embeddings for sentences. I want to fine-tune these pre-trained ...
Fiori's user avatar
  • 301
5 votes
2 answers
258 views

Same sentences produces a different vector in XLNet

I have computed the vectors for two same sentences using XLNet embedding-as-service. But the model produces different vector embeddings for both the two same sentences hence the cosine similarity is ...
Anoop kottappuram's user avatar
5 votes
0 answers
2k views

Decode sentence representation derived from SentenceTransformer

Is it possible to decode a sentence representation derived from SentenceTransformer back to a sentence? See example from the documentation from sentence_transformers import SentenceTransformer model = ...
KoKo's user avatar
  • 379
4 votes
1 answer
763 views

Restrict Vocab for BERT Encoder-Decoder Text Generation

Is there any way to restrict the vocabulary of the decoder in a Huggingface BERT encoder-decoder model? I'd like to force the decoder to choose from a small vocabulary when generating text rather than ...
Joseph Harvey's user avatar
4 votes
2 answers
4k views

How to convert model.safetensor to pytorch_model.bin?

I'm fine tuning a pre-trained bert model and i have a weird problem: When i'm fine tuning using the CPU, the code saves the model like this: With the "pytorch_model.bin". But when i use ...
Gabriel Henrique's user avatar
3 votes
3 answers
1k views

String comparison with BERT seems to ignore "not" in sentence

I implemented a string comparison method using SentenceTransformers and BERT like following from sentence_transformers import SentenceTransformer from sklearn.metrics.pairwise import cosine_similarity ...
Tiago Bachiega de Almeida's user avatar
3 votes
2 answers
2k views

Sentence-Transformer Training and Validation Loss

I am using the Sentence-Transformers model to Fine Tune(using PyTorch) it on a custom dataset which is the same as the Semantic Text Similarity (STS) Dataset. I am unable to get(or print) the training ...
Abhas kumar's user avatar
3 votes
1 answer
5k views

How to save sentence-Bert output vectors to a file?

I am using Bert to get similarity between multi term words.here is my code that I used for embedding : from sentence_transformers import SentenceTransformer model = SentenceTransformer('bert-large-...
Sahar Rezazadeh's user avatar
2 votes
1 answer
752 views

ReadError: file could not be opened successfully. But I am not sure where the tar file is stored to resolve this

I am using biobert-embeddings==0.1.2 and torch==1.2.0 versions to embed some documents. But, I get the following error when I try to load the model by from biobert_embedding.embedding import ...
satish cc's user avatar
2 votes
3 answers
2k views

SimpleTransformers Error: VersionConflict: tokenizers==0.9.4? How do I fix this?

I'm trying to execute the simpletransformers example from their site on google colab. Example: from simpletransformers.classification import ClassificationModel, ClassificationArgs import pandas as pd ...
Reema Q Khan's user avatar
2 votes
1 answer
966 views

How to know if a word belong to a Transformer model?

I use the python library sentence_transformers with the models RoBERTa and FlauBERT. I use cosine scores to compute similarity but for some words it doesn't work well. Those words seems to be the one ...
Nathan Redin's user avatar
2 votes
0 answers
292 views

Sentence Transformers can not get a lot of images' embeddings

When I try to get embeddings from images I get error like this 'too many open files'. I have 50000 images, I do not want to split images into different folders and then concatenate embeddings (It is ...
Vadim's user avatar
  • 39
2 votes
0 answers
326 views

Performing MLM pretraining on BERT pretrained model to use model in Sentence Transformer for semantic similarity

I have a NLP use case to compute semantic similarity between sentences that are very specific to my use case. I want to use Sentence Transformers library to do this, which provides with state of the ...
Martin Becuwe's user avatar
2 votes
0 answers
192 views

Error loading quantized BERT model from local repository

After quantizing the BERT model, it works without any issue. But if I save the quantized model and load, it does not work. It shows an error message: 'LinearPackedParams' object has no attribute '...
user3190883's user avatar
1 vote
1 answer
1k views

BERTopic Embeddings ValueError when transform a new text

I have created embeddings using SentenceTransformer and trained a BERTopic model on those embeddings. sentence_model = SentenceTransformer("all-MiniLM-L6-v2") embeddings = sentence_model....
Vai's user avatar
  • 179
1 vote
1 answer
2k views

How to list all documents/words per topic in bert topic modelling?

I read the docs, but i can see the topics only show 3 or 4 documents per topic whereas the count is 2000+, is there a way i can see all the assigned documents, instead of three/four documents per ...
Noob Coder's user avatar
1 vote
1 answer
281 views

Error while using bert-base-nli-mean-tokens bert model

I am using this code: model = SentenceTransformer('bert-base-nli-mean-tokens') body = list(data['preprocessedBody']) bodyEmbedding = model.encode(body, show_progress_bar = True) However, I am getting ...
python_pi's user avatar
  • 113
1 vote
1 answer
27 views

ber-base-uncase does not use newly added suffix token

I want to add custom tokens to the BertTokenizer. However, the model does not use the new token. from transformers import BertTokenizer tokenizer = BertTokenizer.from_pretrained("bert-base-...
Lulacca's user avatar
  • 13
1 vote
0 answers
137 views

Fine-tune SentenceTransformer/SBERT for Extractive Text Summarization

Newbie here on NLP. I want to build extractive text summarization, try to read this https://huggingface.co/blog/how-to-train-sentence-transformers, I think there is a way to fine-tune the model with ...
Python Beginner's user avatar
1 vote
0 answers
164 views

Sequence to sequence classification (predicting sequence of labels) using Transformers

Im looking for a way to feed a transformer (HuggingFace trained model) a sequence of sentences(introducing context) in order to predict a sequence of labels. The goal is to predict each sentence by ...
Keren L's user avatar
  • 11
1 vote
0 answers
105 views

How can I fine tune sentence transfomer without any labels?

I only have product descriptions and nothing else. I need to match similar products using cosine similarity. I have achieved this by taking embeddings from the Sentence Transformer. However, I need to ...
Margam Rohith Kumar's user avatar
1 vote
1 answer
605 views

FastBert TypeError :forward() got an unexpected keyword argument 'masked_lm_labels'

I am following this tutorial and I have an error in this step: lm_learner.fit(epochs=30, lr=1e-4, validate=True, schedule_type="warmup_cosine", ...
Catapultaa's user avatar
1 vote
1 answer
654 views

save_pretrained function with fine tuned bert model with cnn

class MixModel(nn.Module): def __init__(self,pre_trained='bert-base-uncased'): super().__init__() config = BertConfig.from_pretrained('bert-base-uncased', output_hidden_states=...
Shorouk Adel's user avatar
1 vote
0 answers
1k views

What is the maximum text length in tokens that can be given as input for summarisation task using a sentence transformer models

Most Bert models take a maximum input length of 512 tokens. When I used sentence transformer multi-qa-distilbert-cos-v1 model with bert-extractive-summarizer for summarisation task. A text with 792 ...
pheonix4821's user avatar
1 vote
0 answers
346 views

Cant load pretrained model to generate embeddings

I am using this code to generate sentence embeddings with the hugging face transformer library, and I am getting this error. I can't seem to resolve this problem. Any pointers will help. Thanks. from ...
Maak's user avatar
  • 33
1 vote
0 answers
1k views

Improve the model prediction time in huggingface transformer models without GPU

I am using huggingface transformers models for quite a few tasks, it works good but the only problem is the response time. It takes around 6-7 seconds to generate result while some times it even takes ...
DevPy's user avatar
  • 467
1 vote
0 answers
669 views

Why are the three embedding results are so different from transformer models?

I want to get short text embedding from transformer models, so I had tested 3 ways to compute it. All 3 cases are using models from Huggingface Hub. inputs = tokenizer(text, padding=True, ...
marlon's user avatar
  • 6,847
0 votes
1 answer
1k views

sentence transformer using huggingface/transformers pre-trained model vs SentenceTransformer

This page has two scripts When should one use 1st method shown below vs 2nd? As nli-distilroberta-base-v2 trained specially for finding sentence embedding wont that always be better than the first ...
user2543622's user avatar
  • 6,258
0 votes
1 answer
803 views

bert sentence_transformers list index out of range

I'm trying to use sentence_transformers to get bert embeddings, but it can't process for example 300 documents, i keep getting error IndexError: list index out of range. How to fix that? from ...
Aska's user avatar
  • 141
0 votes
1 answer
4k views

Pytorch model object has no attribute 'predict' BERT

I had train a BertClassifier model using pytorch. After creating my best.pt I would like to make in production my model and using it to predict and classifier starting from a sample, so I resume them ...
Chiara's user avatar
  • 380
0 votes
2 answers
6k views

Can not find the pytorch model when loading BERT model in Python

I am following this article to find the text similarity. The code I have is this: from sentence_transformers import SentenceTransformer from tqdm import tqdm from sklearn.metrics.pairwise import ...
Feyzi Bagirov's user avatar
0 votes
0 answers
16 views

Improving Similarity Measurement of Event Dates in Sentence Transformer Models

I'm developing a system to compute the similarity between textual descriptions of events using the sentence-transformers library. Despite trying various models, I am particularly struggling to capture ...
ashfak's user avatar
  • 85
0 votes
0 answers
27 views

How to evaluate the performance of sentence embedding models against benchmark dataset

I am relatively new to this field and would like guidance on how to effectively test an embedding model using a benchmark dataset. Specifically, I have acquired a few embedding models related to ...
Muhammad Daniyal's user avatar
0 votes
0 answers
28 views

The using of golden dataset in Augmented SBERT Training

I use the training strategy of Augmented SBERT (Domain-Transfer). In the code example they use the golden-dataset (STSb) for the training evaluator. Here two code snippes of the example of sentence-...
Christian01's user avatar
0 votes
1 answer
62 views

classification report for adapters with transformers

I used this code, but I want to calculate classification report especially f1 score but I donnot kow how todo that import numpy as np from transformers import TrainingArguments, AdapterTrainer, ...
Shorouk Adel's user avatar
0 votes
0 answers
126 views

Bert Supervised model topics per class with only one class

I am trying to use BERT Supervised model for topic modeling. I dont have the liberty to use topic_model = BERTopic(verbose=True). I have to download the pre-trained model locally and use it. I have ...
Shekar Tippur's user avatar
0 votes
0 answers
28 views

What model can we use for sentence classification using the CSAbstruct dataset?

Trying to train a model for sentence classification on the CSAbstruct dataset : https://github.com/allenai/sequential_sentence_classification/tree/master/data/CSAbstruct Tried with RoBERTa base model ...
snk_24's user avatar
  • 9
0 votes
1 answer
407 views

BERT sentence embeddings as input features for support vector regression

I used the bert-base-multilingual-cased Tokenizer and model to extract sentence embeddings from Instagram captions. from transformers import BertTokenizer, BertModel tokenizer = BertTokenizer....
Tara van Mierlo's user avatar
0 votes
0 answers
90 views

Where did the Transformer embedding numbers come from?

I'm a student studying Transformer. I want to ask, when I will vectorize words with Transformer BERT and get 768 vector dimensions for each word, I'm confused about where these numbers come from, is ...
intodarkmoon's user avatar
0 votes
1 answer
686 views

How to add weights in BERT loss function

I have unbalanced dataset size N with such classes: class 1 - size 0.554*N class 2 - size 0.271*N class 3 - size 0.185*N I’m trying to solve NER task by fine-tuning Bert “dslim / bert-large-NER”, ...
dkagramanyan's user avatar
0 votes
1 answer
95 views

How to solve natural language inference using SentenceBERT?

How can I solve natural language inference using fine-tuned SentenceBERT models(ex. sentence-transformers/all-MiniLM-L6-v2 · Hugging Face) to obtain better sentence vectors? Many of these models have ...
tedmosby's user avatar
0 votes
1 answer
232 views

Print out the text value of the points on a cluster when using UMAP and HDBScan and BERT sentence transformer

I have seen a number of questions similar to this but my cluster labels consist of sentence embeddings, thus a better question may be how do I get text values from the sentence embeddings? How can I ...
Tam's user avatar
  • 105
0 votes
0 answers
577 views

Sentence Transformers - IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

I am using DistilBERT from sentence_transformers library on kaggle, but when I import my model and try to encode a sentence with it : modelB = SentenceTransformer('../input/sentence-transformer-models/...
Khadija 's user avatar
0 votes
1 answer
3k views

How to load Bert pretrained model with SentenceTransformers from local path?

I am using the SentenceTransformer library to use Bert pre-trained model I download the file in google Colabs and saved it with these commands: from sentence_transformers import SentenceTransformer ...
Sahar Rezazadeh's user avatar
0 votes
0 answers
357 views

How to download bert models and load in python?

How to download bert models and load in python? from sentence_transformers import SentenceTransformer model = SentenceTransformer('bert-base-nli-mean-tokens') How to save the pretrained model and ...
Nithin Reddy's user avatar
0 votes
1 answer
344 views

How can I train a bert model for representational learning task that is domain specific?

I am trying to generate good sentence embeddings for some specific type od texts, using sentence transformer models while testing the the similarity and clustering using kmeans doesnt give good ...
adit94's user avatar
  • 1
-1 votes
2 answers
497 views

FileNotFound error downloading roberta-model sentence transformers

I've already downloaded the "roberta-large-nli-stsb-mean-tokens" model, but it starts downloading again and again. Note: This is not related to space, the machine has space. And this error ...
Arjit Yadav's user avatar
-1 votes
1 answer
97 views

How does NLP model know the output length during translation tasks?

Translating English to French, we may have this: Input: "Please help me translate this sentence" 6 tokens Output: "Merci de m'aider à traduire cette phrase" 7 ...
Worldbuffer's user avatar