Last Updated on 2021-03-29 by Clay
Stanford CoreNLP, it is a dedicated to Natural Language Processing (NLP).
As the name implies, such a useful tool is naturally developed by Stanford University. (We thanks them!)
The functions the tool includes:
- Tokenize
- Part of speech (POS)
- Named entity identification (NER)
- Constituency Parser
- Dependency Parser
...... and many more functions will be introduced later.
Today, I want to record my experience about how to use these tools and share for you. Although the tools are not very difficult —— but it have some traps I met when I'm beginning. (Basically, the office tutorial is very clear.)
I will introduce how to use this useful tool by Python step by step. Python can call the APIs from Stanford CoreNLP.
before we beginning, I have to remind you: if you just only want to understand how did the tool work, may be you can just only go to their online Demo website:
http://corenlp.run/
— Text to annotate —: Input the sentence you want to input.
— Annotations —: The functions you want to.
— Language —: Select the language you want.
press submit button, and the graphic result will print! It's very convenience!
Step 1: Download Stanford CoreNLP
https://stanfordnlp.github.io/CoreNLP/
Fisrt, go to the website, and you will find the "Download CoreNLP 3.9.2" (Warning! your version has possible difference from me!)
But it looks like this:
Click it and you will download the Stanford CoreNLP package. But if the language you want to parse is not English, you have to download the language model what you need.
Language model:
For example, if you want to parse Chinese, after downloading the Stanford CoreNLP zip file, first unzip the compression, here we will get ta folder "stanford-corenlp-full-2018-10-05" (of course, again, this is the version I download, you may download the version with me difference.)
And then we put our Chinese model "stanford-chinese-corenlp-2018-10-05-models.jar" to the folder we compressed.
Step 2: Install Python's Stanford CoreNLP package
If you always install the package of Python by terminal, this is easy for you:
pip3 install stanfordcorenlp
key in these in your terminal, you may start the download processing.
If you are using the IDE like Pycharm, you must click File -> Settings -> Project Interpreter -> clicked + symbol to search "stanfordcorenlp", if you find it, you have to download it.
Step 3: Write Python code
from stanfordcorenlp import StanfordCoreNLP from opencc import OpenCC # Preset nlp = StanfordCoreNLP('stanford-corenlp-full-2018-10-05/', lang='zh', memory='8g') cc = OpenCC('t2s')
In the first step, we need to import the package "stanfordcorenlp".
The package opencc is a tool to convert traditional Chinese to simplified Chinese. That's requirement in my project, you can ignore this import command. (the simplified Chinese support in Stanford CoreNLP is better than the other one.)
Under the line "Preset", the first line's path have to link to the folder which we unzip, of course your zip file may be have a different name with mine. By the way, if you want to use the Chinese, you have to set the language is "zh". If you just only use English, you can ignore this setting.
And then we can get a start for Stanford CoreNLP!
Next, we use a classic sentence "I eat a big and red apple" to test.
# The sentence you want to parse sentence = 'I eat a big and red apple.' # POS print('POS:', nlp.pos_tag(sentence)) # Tokenize print('Tokenize:', nlp.word_tokenize(sentence)) # NER print('NER:', nlp.ner(sentence)) # Parser print('Parser:') print(nlp.parse(sentence)) print(nlp.dependency_parse(sentence)) # Close Stanford Parser nlp.close()
Output:
POS:
[('I', 'PN'), ('eat', 'VV'), ('a', 'CD'), ('big', 'JJ'), ('and', 'CC'), ('red', 'JJ'), ('apple', 'NN'), ('.', 'PU')]
Tokenize:
['I', 'eat', 'a', 'big', 'and', 'red', 'apple', '.']
NER:
[('I', 'O'), ('eat', 'O'), ('a', 'NUMBER'), ('big', 'O'), ('and', 'O'), ('red', 'O'), ('apple', 'O'), ('.', 'O')]
Parser:
(ROOT
(IP
(IP
(NP (PN I))
(VP (VV eat)
(NP
(QP (CD a))
(ADJP
(ADJP (JJ big))
(CC and)
(JJ red))
(NP (NN apple)))))
(PU .)))
[('ROOT', 0, 2), ('nsubj', 2, 1), ('dep', 7, 3), ('conj', 7, 4), ('cc', 7, 5), ('amod', 7, 6), ('dobj', 2, 7), ('punct', 2, 8)]
The above is the basic program that Stanford CoreNLP uses to call Python!
You can try any other features you want!
Error Report
If there is a problem in the process of executing the program, or there is an unresolved error, you may refer to my other problem record: [Solved] Stanford CoreNLP Error
Howdy! Quick question that’ѕ totally off topic. Ⅾo you қnoԝ hoѡ to mɑke youг site mobile friendly?
Мy web site l᧐oks weird whеn browsing from my iphone. I’m trying to find a theme
or plugin that mіght ƅe ablе to resolve tһis issue.
If you havе any recommendations, pⅼease share. Ꭺppreciate
іt!
Uh… in my mobile device, your website is looked very stable.
Hope you have solved the problem.
Greetings! I’ve been reading your web site for a while
now and finally got the courage to go ahead and give you a shout out from
Lubbock Texas! Just wanted to say keep up the excellent job!
Thank you for your reading.
Writing these articles help me, I hope these can help others, too.
Let’s work hard together. =)
Hi, do you need an internet connection to run Stanford CoreNLP? Does it work remotely, from a server? Or can I run it offline, locally?
PS: Thank you for the useful article!
I can run it offline, the effectiveness of the analysis depends on the model you use. 🤓