If we want to translate a language to another language, many people will open the Google Translate web.
Of course, the reputation of Google Translate is quite strong, and the increasing maturity of NLP-related deep learning technology in recent years, the quality of Google Translate has also improved year by year.
So, if we need to use program to translate an article or text, we surely want to connect to Google Translate to do it.
Basically, when thinking of using Python to connect with Google Translate, I think there should be the following three methods:
- Use Google Translate API (this should be the most stable method)
- Crawl yourself
- Use packages packaged by others
Naturally, it is the fastest to use a package packaged by others. What I want to record today is how to use "googletrans", a package that directly calls Google Translate on Python. It can be said that pip is installed and used, which is very convenient.
Preparation
You need to use the following command to install googletrans python package.
sudo pip3 install googletrans
By the way, this package still has its limitations.
- String length must not exceed 15000 characters
- When the webpage is updated, you need to wait for this package to be updated to work properly
- HTTP 5xx Error means that Google blocked the IP address
However, this is still a great package, after all, it is easy to use.
Translate
First of all, since Google Translate is connected, the most important thing is the "translation" function!
import googletrans from pprint import pprint # Initial translator = googletrans.Translator() # Basic Translate results = translator.translate('我覺得今天天氣不好。') print(results) print(results.text)
Output:
Translated(src=zh-CN, dest=en, text=I think the weather is bad today., pronunciation=None, extra_data="{'translat…")
I think the weather is bad today.
It can be seen that without any settings, the input "string" will automatically detect the most likely language; and the output language is defaulted to English.
Of course, we can also specify the language we want to output. Let's look at an example:
print('English:', translator.translate('我覺得今天天氣不好', dest='en').text) print('Japanese:', translator.translate('我覺得今天天氣不好', dest='ja').text) print('Korean:', translator.translate('我覺得今天天氣不好', dest='ko').text)
Output:
English: I think the weather is bad today
Japanese: 私は、今日の天気は悪いだと思います
Korean: 나는 날씨가 오늘 나쁜 생각
We can see that setting the "dest" parameter can select the language we want to translate.
Detect language
What's interesting is that if we have an unknown language today, we can actually use this package to "detect" which language the unknown language is.
For example, the following paragraph of Japanese (yes, I know this is Japanese).
# Detect unknown_sentence = 'おはよう' results = translator.detect(unknown_sentence) print(results) print(results.lang)
Output:
Detected(lang=ja, confidence=1.0)
ja
The returned result also shows that this text is in Japanese.
Get Language Index
Of course, the functions of "Specify Translation Language", "Detect Language Type" and so on above all rely on the "Index" of the language we need to know.
For example, everyone knows that "en" is in English, no problem. But what if it is "af" today?
I checked, "af" stands for "afrikaans" (Afrikaans). yes, exactly as I expected!
well, then again, in fact, we can also use googletrans to check the language encoding.
from pprint import pprint pprint(googletrans.LANGCODES)
Output:
{'Filipino': 'fil',
'Hebrew': 'he',
'afrikaans': 'af',
'albanian': 'sq',
'amharic': 'am',
'arabic': 'ar',
'armenian': 'hy',
'azerbaijani': 'az',
'basque': 'eu',
'belarusian': 'be',
'bengali': 'bn',
'bosnian': 'bs',
'bulgarian': 'bg',
'catalan': 'ca',
'cebuano': 'ceb',
'chichewa': 'ny',
'chinese (simplified)': 'zh-cn',
'chinese (traditional)': 'zh-tw',
'corsican': 'co',
'croatian': 'hr',
'czech': 'cs',
'danish': 'da',
'dutch': 'nl',
'english': 'en',
'esperanto': 'eo',
'estonian': 'et',
'filipino': 'tl',
'finnish': 'fi',
'french': 'fr',
'frisian': 'fy',
'galician': 'gl',
'georgian': 'ka',
'german': 'de',
'greek': 'el',
'gujarati': 'gu',
'haitian creole': 'ht',
'hausa': 'ha',
'hawaiian': 'haw',
'hebrew': 'iw',
'hindi': 'hi',
'hmong': 'hmn',
'hungarian': 'hu',
'icelandic': 'is',
'igbo': 'ig',
'indonesian': 'id',
'irish': 'ga',
'italian': 'it',
'japanese': 'ja',
'javanese': 'jw',
'kannada': 'kn',
'kazakh': 'kk',
'khmer': 'km',
'korean': 'ko',
'kurdish (kurmanji)': 'ku',
'kyrgyz': 'ky',
'lao': 'lo',
'latin': 'la',
'latvian': 'lv',
'lithuanian': 'lt',
'luxembourgish': 'lb',
'macedonian': 'mk',
'malagasy': 'mg',
'malay': 'ms',
'malayalam': 'ml',
'maltese': 'mt',
'maori': 'mi',
'marathi': 'mr',
'mongolian': 'mn',
'myanmar (burmese)': 'my',
'nepali': 'ne',
'norwegian': 'no',
'pashto': 'ps',
'persian': 'fa',
'polish': 'pl',
'portuguese': 'pt',
'punjabi': 'pa',
'romanian': 'ro',
'russian': 'ru',
'samoan': 'sm',
'scots gaelic': 'gd',
'serbian': 'sr',
'sesotho': 'st',
'shona': 'sn',
'sindhi': 'sd',
'sinhala': 'si',
'slovak': 'sk',
'slovenian': 'sl',
'somali': 'so',
'spanish': 'es',
'sundanese': 'su',
'swahili': 'sw',
'swedish': 'sv',
'tajik': 'tg',
'tamil': 'ta',
'telugu': 'te',
'thai': 'th',
'turkish': 'tr',
'ukrainian': 'uk',
'urdu': 'ur',
'uzbek': 'uz',
'vietnamese': 'vi',
'welsh': 'cy',
'xhosa': 'xh',
'yiddish': 'yi',
'yoruba': 'yo',
'zulu': 'zu'}
References
- https://pypi.org/project/googletrans/
- https://py-googletrans.readthedocs.io/en/latest/
- https://github.com/ssut/py-googletrans
Read More
- [Python] Use Selenium package to crawl the google search engine
- [Python] Use the gdown package to download files from Google Drive
- Using Python package gkeepapi to access Google Keep
- [Linux] Use "wget" command to download files from Google Drive
- [Linux] Using Insync to sync your file to Google Drive in Linux
- [Python] Using "GoogleNews" package to get the Google News
- How to use the free GPU from Google Colab
please see this code here, will help: https://neculaifantanaru.com/en/python-code-text-google-translate-website-translation-beautifulsoup-library.html
Thank you for your recommendation! It is a very detailed article, and I will study it later.