Last Updated on 2022-08-09 by Clay
Today I used transformers package to build an ALBERT model to process a NLP task in the offline environment, and I got an error message when loading the ALBERT tokenizer:
tokenizer = AutoTokenizer.from_pretrained("/home/clay/transformers_model/albert_chinese_tiny")
The error message is:
TypeError: not a string
It told me the parameter I passed is not a string, but this is not the real error point. Of course the parameter I passed is a STRING!
Solution
so far, I see the three solutions, but which one can work is according our problem.
- Path error (If environment is online, key in the model name and it will be downloaded Ex. voidful/albert_chinese_tiny. But offline cannot work.)
- Update transformers package (In 2020, the package have a BUG about this, and updating can be solved.)
- Try to use BertTokenizer (My case)
If you got the same error when using ALBERT, maybe you need to use BertTokenizer to load the weights.
Please refer the document: https://huggingface.co/voidful/albert_chinese_tiny
I have this problem. When we using AutoTokenizer to load model, it will be use AlbertTokenizer structure automatically.
So change the code to:
tokenizer = BertTokenizer.from_pretrained("/home/clay/transformers_model/albert_chinese_tiny")
And the problem is disappear.
References
- https://stackoverflow.com/questions/70709572/typeerror-not-a-string-parameters-in-autotokenizer-from-pretrained
- https://github.com/huggingface/transformers/issues/5040
- https://github.com/huggingface/transformers/issues/3673
Read More
- [Machine Learning] CodeBERT Introduction (With Example)
- [PyTorch] How to Use HuggingFace Transformers Package (With BERT Example)
- [Solved] huggingface/tokenizers: The current process just got forked. after parallelism has already been used. Disabling parallelism to avoid deadlocks
- [Machine Learning] Introduction of 'pytorch-lightning' package