Skip to content

[NLP][Python] How to use Stanford CoreNLP

Last Updated on 2021-03-29 by Clay

Stanford CoreNLP, it is a dedicated to Natural Language Processing (NLP).

As the name implies, such a useful tool is naturally developed by Stanford University. (We thanks them!)

The functions the tool includes:

  • Tokenize
  • Part of speech (POS)
  • Named entity identification (NER)
  • Constituency Parser
  • Dependency Parser
    ...... and many more functions will be introduced later.

Today, I want to record my experience about how to use these tools and share for you. Although the tools are not very difficult —— but it have some traps I met when I'm beginning. (Basically, the office tutorial is very clear.)

I will introduce how to use this useful tool by Python step by step. Python can call the APIs from Stanford CoreNLP.

before we beginning, I have to remind you: if you just only want to understand how did the tool work, may be you can just only go to their online Demo website:
http://corenlp.run/

— Text to annotate —: Input the sentence you want to input.
— Annotations —: The functions you want to.
— Language —: Select the language you want.

press submit button, and the graphic result will print! It's very convenience!


Step 1: Download Stanford CoreNLP

https://stanfordnlp.github.io/CoreNLP/

Fisrt, go to the website, and you will find the "Download CoreNLP 3.9.2" (Warning! your version has possible difference from me!)

But it looks like this:

Click it and you will download the Stanford CoreNLP package. But if the language you want to parse is not English, you have to download the language model what you need.

Language model:

For example, if you want to parse Chinese, after downloading the Stanford CoreNLP zip file, first unzip the compression, here we will get ta folder "stanford-corenlp-full-2018-10-05" (of course, again, this is the version I download, you may download the version with me difference.)

And then we put our Chinese model "stanford-chinese-corenlp-2018-10-05-models.jar" to the folder we compressed.


Step 2: Install Python's Stanford CoreNLP package

If you always install the package of Python by terminal, this is easy for you:

pip3 install stanfordcorenlp

key in these in your terminal, you may start the download processing.

If you are using the IDE like Pycharm, you must click File -> Settings -> Project Interpreter -> clicked + symbol to search "stanfordcorenlp", if you find it, you have to download it.


Step 3: Write Python code

from stanfordcorenlp import StanfordCoreNLP
from opencc import OpenCC

# Preset
nlp = StanfordCoreNLP('stanford-corenlp-full-2018-10-05/', lang='zh', memory='8g')
cc = OpenCC('t2s')


In the first step, we need to import the package "stanfordcorenlp".

The package opencc is a tool to convert traditional Chinese to simplified Chinese. That's requirement in my project, you can ignore this import command. (the simplified Chinese support in Stanford CoreNLP is better than the other one.)

Under the line "Preset", the first line's path have to link to the folder which we unzip, of course your zip file may be have a different name with mine. By the way, if you want to use the Chinese, you have to set the language is "zh". If you just only use English, you can ignore this setting.

And then we can get a start for Stanford CoreNLP!

Next, we use a classic sentence "I eat a big and red apple" to test.

# The sentence you want to parse
sentence = 'I eat a big and red apple.'

# POS
print('POS:', nlp.pos_tag(sentence))

# Tokenize
print('Tokenize:', nlp.word_tokenize(sentence))

# NER
print('NER:', nlp.ner(sentence))

# Parser
print('Parser:')
print(nlp.parse(sentence))
print(nlp.dependency_parse(sentence))

# Close Stanford Parser
nlp.close()


Output:

POS:
[('I', 'PN'), ('eat', 'VV'), ('a', 'CD'), ('big', 'JJ'), ('and', 'CC'), ('red', 'JJ'), ('apple', 'NN'), ('.', 'PU')]
Tokenize:
['I', 'eat', 'a', 'big', 'and', 'red', 'apple', '.']
NER:
[('I', 'O'), ('eat', 'O'), ('a', 'NUMBER'), ('big', 'O'), ('and', 'O'), ('red', 'O'), ('apple', 'O'), ('.', 'O')]
Parser:
 (ROOT
   (IP
     (IP
       (NP (PN I))
       (VP (VV eat)
         (NP
           (QP (CD a))
           (ADJP
             (ADJP (JJ big))
             (CC and)
             (JJ red))
           (NP (NN apple)))))
     (PU .)))

 [('ROOT', 0, 2), ('nsubj', 2, 1), ('dep', 7, 3), ('conj', 7, 4), ('cc', 7, 5), ('amod', 7, 6), ('dobj', 2, 7), ('punct', 2, 8)]

The above is the basic program that Stanford CoreNLP uses to call Python!

You can try any other features you want!


Error Report

If there is a problem in the process of executing the program, or there is an unresolved error, you may refer to my other problem record: [Solved] Stanford CoreNLP Error

Tags:

6 thoughts on “[NLP][Python] How to use Stanford CoreNLP”

  1. Howdy! Quick question that’ѕ totally off topic. Ⅾo you қnoԝ hoѡ to mɑke youг site mobile friendly?
    Мy web site l᧐oks weird whеn browsing from my iphone. I’m trying to find a theme
    or plugin that mіght ƅe ablе to resolve tһis issue.
    If you havе any recommendations, pⅼease share. Ꭺppreciate
    іt!

  2. Hi, do you need an internet connection to run Stanford CoreNLP? Does it work remotely, from a server? Or can I run it offline, locally?

    PS: Thank you for the useful article!

Leave a Reply