Skip to content

NLP

[NLP][Python] NLP tool for Chinese text: HanLP

HanLp (Han Language Processing) is open-source project on Github, it provided many functions:

  • Segmentation
  • Part-of-Speech
  • Named entity recognition
  • Keyword extraction
  • Text summarization
  • Convert Traditional to Simplified
  • Text recommendation
  • Text classification
  • Word2Vec
  • ...

If you want to read more document of it, you can refer here: https://github.com/hankcs/HanLP

Or you want to take a demo: http://hanlp.com/

Read More »[NLP][Python] NLP tool for Chinese text: HanLP

[Python] Convert Glove model to a format Gensim can read

Python is the most popular programming language!

Introduction

Those who are familiar with natural language processing (NLP) must be familiar with Glove and Python package Gensim.

GloveGlobal Vectors for Word Representation)is a paper published by Stanford NLP Group, and it is also an open source pre-trained word embedding model. The Glove that you often see on the Internet now refers to this open source pre-trained model.

Gensim is a Python implementation of the Word2Vec paper proposed by Google in 2013, allowing us to easily train the word vector model using our own corpus via this package.

Read More »[Python] Convert Glove model to a format Gensim can read

Quickly generate word cloud Python module: wordcloud


Introduction

word-cloud is a very famous word in natural language processing domain. At first I thought it was just to calculate the frequency of the vocabulary and display the high-frequency words large.

But not only that, the shapes and the styles of characters are all learned not as simple as I thought.

I am a person who likes new things, so I came to study the wordcloud module in Python today and record my experience here.

Read More »Quickly generate word cloud Python module: wordcloud

[Solved] OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

Today when I use Python to process a NLP task, I get a error message:

OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
Read More »[Solved] OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.