Last Updated on 2020-12-07 by Clay
Introduction
word-cloud is a very famous word in natural language processing domain. At first I thought it was just to calculate the frequency of the vocabulary and display the high-frequency words large.
But not only that, the shapes and the styles of characters are all learned not as simple as I thought.
I am a person who likes new things, so I came to study the wordcloud module in Python today and record my experience here.
wordcloud example
For the first use, we need to install with the following instructions:
pip3 install wordcloud
Then, we need to have the text before we can start "counting word frequency". The text I selected here is a number of articles on the teaching of "Word Vector" that I have saved before. In order to test the effect of the word cloud, I selected ten of them and combined them, and then segmented them.
Since it is an English corpus, the word segmentation tool I choose is NLTK. If you are interested, maybe you can refer to what I wrote before: NLTK Tutorial —— A Python package
The following is a simple sample code:
# -*- coding: utf-8 -*-
import nltk
from wordcloud import WordCloud
text = open('data.txt', 'r', encoding='utf-8').read()
text = ' '.join(nltk.word_tokenize(text))
cloud = WordCloud().generate(text)
cloud.to_file('output.png')
Output:
You can use generate()
function to count the words and make a word cloud, and use to_file()
function to save it to picture.