Skip to content

Using AutoModel.from_pretrained() In Transformers To Load Customized Model Architecture

Last Updated on 2024-08-22 by Clay

To this day, many AI applications and open-source projects are developed based on the HuggingFace transformers package. A large number of models and packages are written to be compatible with the transformers format, and even share the same functions and methods, which makes them more widely accepted.

Under this premise, I came across an open-source training framework that conveniently wraps the automatic reading of Transformer architectures. However, one unavoidable problem is I want to use my custom model for experiments. I tried several solutions, hoping that when using AutoModel.from_pretrained(), by simply providing the local path to my model, I could successfully use my custom model architecture. This article records the method that worked.

In short, there are two approaches:

  1. The register() method for AutoConfig and AutoModel
  2. The “auto_map” parameter setting in the model’s Config

Custom Model

Let’s assume we have the following custom bidirectional attention mechanism Mistral model (of course, I extended the Mistral model):

from transformers import MistralConfig, MistralModel

# Rename
class MistralBiConfig(MistralConfig):
    model_type = "bimistral"
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

class MistralBiModel(MistralModel):
    _no_split_modules = ["ModifiedMistralDecoderLayer"]
    config_class = str(type(MistralBiConfig()))

    def __init__(self, config: MistralConfig):
        MistralPreTrainedModel.__init__(self, config)
        ...


Note that when registering a custom Transformer model, the config.model_type must be a unique name that doesn’t conflict with any existing architectures, and the model’s config_class must be the class of your own config. This step is crucial because Transformer will indeed check if this is set correctly.


The register() Method for AutoConfig and AutoModel

First, let’s record the registration method. After registering, you can directly use from transformers import MistralBiConfig, MistralBiModel.

from transformers import AutoModel, AutoConfig

AutoConfig.register("bimistral", MistralBiConfig)
AutoModel.register(MistralBiConfig, MistralBiModel)


After that, you can modify the model_type in the model’s config.py to the newly registered “bimistral“, and use AutoModel.from_pretrained() to automatically load this model architecture.

However, note that during my testing, I found that I need to register at the beginning of each script for it to work; it’s not permanently registered within the transformers package. But this seems reasonable.


The “auto_map” Parameter Setting in the Model’s Config

This method is a bit simpler, but it requires me to modify some of the framework’s source code.

{
  "_name_or_path": "Mistral-7B",
  "model_type": "mistral",
  "architectures": [
    "MistralForCausalLM"
  ],
  "auto_map": {
    "AutoModel": "modeling_bimistral.MistralBiModel",
    "AutoModelForCausalLM": "modeling_bimistral.MistralBiForCausalLM",
    "AutoModelForSequenceClassification": "modeling_bimistral.MistralBiForSequenceClassification"
  },
...


We just need to place the custom model architecture in a specific .py file, and then AutoModel can read the specified architecture. However, when loading, you will need to add the trust_remote_code=True parameter, which is why I mentioned needing to modify the source code of the framework I want to use. (Though, there’s nothing wrong with that.)


References


Read More

Leave a Reply