Skip to content

[Solved] Some weights of the model checkpoint at distilbert-base-multilingual-cased were not used when initializing DistilBertForSequenceClassification: [‘vocab_projector.bias’, ‘vocab_layer_norm.bias’, ‘vocab_layer_norm.weight’, ‘vocab_transform.weight’, ‘vocab_transform.bias’]

Last Updated on 2023-06-19 by Clay

Problem

When using the transformers package, often we receive the following error message when we utilize different task models with various heads like AutoModelForSequenceClassification, AutoModelForSeq2SeqLM…

Some weights of the model checkpoint at distilbert-base-multilingual-cased were not used when initializing DistilBertForSequenceClassification: ['vocab_projector.bias', 'vocab_layer_norm.bias', 'vocab_layer_norm.weight', 'vocab_transform.weight', 'vocab_transform.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-multilingual-cased and are newly initialized: ['classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

This warning message is generated by the transformers library from Hugging Face. It informs us that the model being initialized (DistilBertForSequenceClassification) is not utilizing certain weights from the original pre-trained model (distilbert-base-multilingual-cased). This might be due to differences in architecture or training objectives between the original pre-trained model and the target model.

Additionally, the message indicates that certain weights in DistilBertForSequenceClassification were not initialized from the pre-trained model but newly created. The warning suggests that we should continue to fine-tune this model for downstream tasks for subsequent prediction or inference.

In other words, if you see this error message while initializing a model for training, it’s perfectly normal. That’s because these task-specific heads are intended to be trained in the downstream tasks.

However, be aware that if you see this error during the inference stage when loading your trained model, it’s a serious issue. This indicates that the architecture of your trained model does not match the task, suggesting a problem has occurred somewhere.


Solution

from transformers import logging as transformer_logging

# Ignore transformer warning
transformer_loggging.set_verbosity_error()

With this setup, only error-level messages will be printed, and all warning messages will be ignored.

However, it’s important to note that while this approach can make our output cleaner, it may also cause us to overlook some important warnings that might affect our model. Therefore, it’s recommended to use this method only when we are sure that these warnings will not impact our model’s training and prediction.


References


Read More

Leave a Reply