Skip to content

Quickly generate word cloud Python module: wordcloud


Introduction

word-cloud is a very famous word in natural language processing domain. At first I thought it was just to calculate the frequency of the vocabulary and display the high-frequency words large.

But not only that, the shapes and the styles of characters are all learned not as simple as I thought.

I am a person who likes new things, so I came to study the wordcloud module in Python today and record my experience here.


wordcloud example

For the first use, we need to install with the following instructions:

pip3 install wordcloud

Then, we need to have the text before we can start “counting word frequency”. The text I selected here is a number of articles on the teaching of “Word Vector” that I have saved before. In order to test the effect of the word cloud, I selected ten of them and combined them, and then segmented them.

Since it is an English corpus, the word segmentation tool I choose is NLTK. If you are interested, maybe you can refer to what I wrote before: NLTK Tutorial —— A Python package

The following is a simple sample code:

# -*- coding: utf-8 -*-
import nltk
from wordcloud import WordCloud

text = open('data.txt', 'r', encoding='utf-8').read()
text = ' '.join(nltk.word_tokenize(text))
cloud = WordCloud().generate(text)
cloud.to_file('output.png')


Output:

You can use generate() function to count the words and make a word cloud, and use to_file() function to save it to picture.


References

Leave a Reply