Skip to content

[Solved] “TypeError: not a string” at Using AutoTokenizer.from_pretrained()

Today I used transformers package to build an ALBERT model to process a NLP task in the offline environment, and I got an error message when loading the ALBERT tokenizer:

tokenizer = AutoTokenizer.from_pretrained("/home/clay/transformers_model/albert_chinese_tiny")


The error message is:

TypeError: not a string


It told me the parameter I passed is not a string, but this is not the real error point. Of course the parameter I passed is a STRING!


Solution

so far, I see the three solutions, but which one can work is according our problem.

  1. Path error (If environment is online, key in the model name and it will be downloaded Ex. voidful/albert_chinese_tiny. But offline cannot work.)
  2. Update transformers package (In 2020, the package have a BUG about this, and updating can be solved.)
  3. Try to use BertTokenizer (My case)

If you got the same error when using ALBERT, maybe you need to use BertTokenizer to load the weights.

Please refer the document: https://huggingface.co/voidful/albert_chinese_tiny

I have this problem. When we using AutoTokenizer to load model, it will be use AlbertTokenizer structure automatically.

So change the code to:

tokenizer = BertTokenizer.from_pretrained("/home/clay/transformers_model/albert_chinese_tiny")


And the problem is disappear.


References


Read More

Leave a Reply