When using a pre-trained BERT embeddings from pytorch (which are then fine-tuned), should the text data fed into the model be pre-processed like in any standard NLP task?
For instance, should stemming, removing low frequency words, de-captilisation, be performed or should the raw text simply be passed to `transformers.BertTokenizer'?