NLP

[NLP][Python] NLP tool for Chinese text: HanLP

Clay
2021-04-012021-04-01
NLP, Python

HanLp (Han Language Processing) is open-source project on Github, it provided many functions:

Segmentation
Part-of-Speech
Named entity recognition
Keyword extraction
Text summarization
Convert Traditional to Simplified
Text recommendation
Text classification
Word2Vec
…

If you want to read more document of it, you can refer here: https://github.com/hankcs/HanLP

Or you want to take a demo: http://hanlp.com/

[NLP][Python] Use OpenCC to convert simplified Chinese to traditional Chinese via opencc-python-reimplemented

Clay
2021-03-312021-03-31
1 Comment
NLP, Python

Chinese characters can be roughly divided info “Traditional” and “Simplified“. If you need to convert one to another, I think the most convenient tool is OpenCC in Python.

[NLP][Python] How to use CKIP to analyzeTraditional Chinese

Clay
2021-03-302021-03-30
NLP, Python

If you want to use Python NLP toolkit to analyze Traditional Chinese text, CKIP is your first choice. CKIP is developed by Taiwan Institute of Information Science, Academia Sinica, And won rankings in many competitions.

[NLP][Python] Use “Jieba” package to segment Chinese words

Clay
2021-03-292021-03-29
NLP, Python

To segment words from sentence is very important in Chinese. In English you can segment words with space but Chinese cannot.

Let’s take an example.

Clay
2021-03-072021-03-07
NLP

Introduction

“Word Embedding” is a technology that is often used in natural language processing (NLP), and its concept is convert text into numerical format (numbers).

Clay
2021-01-022021-01-02
NLP, Python

Python is the most popular programming language!

Introduction

Those who are familiar with natural language processing (NLP) must be familiar with Glove and Python package Gensim.

Glove（Global Vectors for Word Representation）is a paper published by Stanford NLP Group, and it is also an open source pre-trained word embedding model. The Glove that you often see on the Internet now refers to this open source pre-trained model.

Gensim is a Python implementation of the Word2Vec paper proposed by Google in 2013, allowing us to easily train the word vector model using our own corpus via this package.

Clay
2020-12-072020-12-07
NLP, Packages, Python

Introduction

word-cloud is a very famous word in natural language processing domain. At first I thought it was just to calculate the frequency of the vocabulary and display the high-frequency words large.

But not only that, the shapes and the styles of characters are all learned not as simple as I thought.

I am a person who likes new things, so I came to study the wordcloud module in Python today and record my experience here.

[Solved] OSError: [E050] Can’t find model ‘en’. It doesn’t seem to be a shortcut link, a Python package or a valid path to a data directory.

Clay
2020-05-122021-07-05
9 Comments
NLP, Packages, Python

Today when I use Python to process a NLP task, I get a error message:

OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

How to calculate Cosine Similarity (With code)

Clay
2020-03-272021-06-19
NLP, Python

Cosine Similarity is a common calculation method for calculating text similarity. The basic concept is very simple, it is to calculate the angle between two vectors.

[NLP][Python] How to use NLTK package to process NLP tasks

Clay
2019-08-162021-04-07
NLP, Python, Python Tutorial

The full text of the NLTK is Nature Language Tool Kit, a package of natural language processing in Python.

Although Chinese can also be processed, but the support for Chinese is not as good as English, so today’s examples are all handled by English corpus.

NLP

[NLP][Python] NLP tool for Chinese text: HanLP

[NLP][Python] Use OpenCC to convert simplified Chinese to traditional Chinese via opencc-python-reimplemented

[NLP][Python] How to use CKIP to analyzeTraditional Chinese

[NLP][Python] Use “Jieba” package to segment Chinese words

[NLP] What is “Word Embedding”

Introduction

[Python] Convert Glove model to a format Gensim can read

Introduction

Quickly generate word cloud Python module: wordcloud

Introduction

[Solved] OSError: [E050] Can’t find model ‘en’. It doesn’t seem to be a shortcut link, a Python package or a valid path to a data directory.

How to calculate Cosine Similarity (With code)

[NLP][Python] How to use NLTK package to process NLP tasks