Last Updated on 2023-11-27 by Clay
介紹
Neo4j 是一個圖形資料庫(Graph Database),跟一般傳統的資料庫相比,圖形資料庫的重點是『圖』,也就是節點(實體/Entity)之間的關係與連接。每個節點可以代表一個對象(如人、事物、地點…),而邊則表示節點之間的關係(如朋友、擁有、位於…)
而 Neo4j 是圖形資料庫這方面的佼佼者,有一般社群板、也有企業板。它可以讓我們靈活地建立複雜的資料關聯,並提供高效能地查詢與儲存。
值得一提的是,Neo4j 的查詢語言為 Cypher,這是一種專為圖形資料結構設計的聲明式語言,使資料查詢變得簡潔且直觀。
在大型語言模型(Large Language Model, LLM)興起的現在,不僅僅是與之搭配的 RAG 架構興起(可以參考我之前寫的 [論文閱讀] Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection),連 Graph RAG 架構都緊接著被提出用來改善 LLM 的回答性能。
安裝
以下紀錄 Linux(Debian/Ubuntu 等發行板)的安裝流程。
其他作業系統可參考官網:https://neo4j.com/docs/operations-manual/current/installation/
其他發行版可參考:https://neo4j.com/docs/operations-manual/current/installation/linux/
官方有說明,若要使用指定版本的 Java,需要在開始安裝前準備好相關的環境;若無,則套件管理器會預設安裝 OpenJDK。
wget -O - https://debian.neo4j.com/neotechnology.gpg.key | sudo apt-key add -
echo 'deb https://debian.neo4j.com stable latest' | sudo tee -a /etc/apt/sources.list.d/neo4j.list
sudo apt-get update
之後可以使用 apt list -a neo4j
去檢查哪些版本可用。
sudo apt install neo4j=1:5.14.0
安裝結束之後,開始配置 neo4j 服務。
systemctl cat neo4j.service
systemctl start neo4j.service
之後,來到 http://localhost:7474/browser/
使用 :server connect
可以連線到資料庫。預設的密碼都是 neo4j。之後請勿必修改。
Neo4j 基礎指令
CREATE
Create node
CREATE
(p1:Person {name: "Clay", age: 28}),
(p2:Person {name: "Wendy", age: 23})
Create relationship
MATCH (p1:Person {name: "Clay"}), (p2:Person {name: "Wendy"})
CREATE (p1)-[:KNOWS]->(p2)
MATCH
建立好後,可以使用 MATCH (n) RETURN n
來看目前的關係。
MERGE
MERGE 跟 CREATE 很像,但是在在已經有該節點存在的情況下,不會像 CREATE 那樣重複建立節點。
DELETE
完整刪除資料,包含關係,可以使用:
MATCH (n) DETACH DELETE n
WHERE
判斷式,學過 SQL 語法的人想必都不陌生。這裡,我限定只查詢年齡大於 25 歲的人。
MATCH (person: Person)
WHERE person.age > 25
RETURN person
SET
SET
指令用於更新節點/關係的屬性值或是新的屬性。
MATCH (person:Person {name: "Wendy"})
SET person.age = 26
更新之後,這次我們再次查看大於 25 歲的人。
MATCH (person: Person)
WHERE person.age > 25
RETURN person
REMOVE
刪除資料屬性或是關係。
MATCH (p1:Person {name: "Clay"})-[knows:KNOWS]->(p2:Person {name: "Wendy"})
DELETE knows
ORDER BY
在排序前,先建立許多隨機值的節點。
FOREACH (x IN RANGE(1, 30) |
CREATE (:RandomNode {value: rand()*100}))
首先,看升序的排列:
MATCH (n:RandomNode)
RETURN n.value
ORDER BY n.value ASC
減序的話只需要把 ASC 改成 DESC。
MATCH (n:RandomNode)
RETURN n.value
ORDER BY n.value DESC
透過 Python 使用
首先,需要安裝 Neo4j 的支援驅動。
pip3 install neo4j
接著可以使用以下的程式碼,進行第一次的訪問。
from neo4j import GraphDatabase
classHelloWorldExample:
def__init__(self, uri, user, password):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
defclose(self):
self.driver.close()
defprint_greeting(self, message):
with self.driver.session() as session:
greeting = session.execute_write(self._create_and_return_greeting, message)
print(greeting)
@staticmethoddef_create_and_return_greeting(tx, message):
result = tx.run("CREATE (a:Greeting) ""SET a.message = $message ""RETURN a.message + ', from node ' + id(a)", message=message)
return result.single()[0]
if __name__ == "__main__":
greeter = HelloWorldExample("bolt://localhost:7687", "neo4j", "<PASSWORD>")
greeter.print_greeting("hello, world")
greeter.close()
上方式官網的範例,需要把 neo4j 使用者的密碼自行填上。
以下是我改寫的一個用於任意指令的版本:
# coding: utf-8from typing importAnyfrom neo4j import GraphDatabase
classHelloWorldExample:
def__init__(self, uri, user, password):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
defclose(self):
self.driver.close()
defrun_command(self, command: str):
with self.driver.session() as session:
result = session.execute_write(self._run_command, command=command)
for node in result:
print(node)
@staticmethoddef_run_command(tx, command: str) -> Any:
result = tx.run(command)
return [record["n"] for record in result]
if __name__ == "__main__":
greeter = HelloWorldExample("bolt://localhost:7687", "neo4j", "<PASSWORD>")
greeter.run_command("MATCH (n) RETURN n")
greeter.close()
Output:
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:0' labels=frozenset({'RandomNode'}) properties={'value': 14.91331766897468}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:1' labels=frozenset({'RandomNode'}) properties={'value': 64.12597979227405}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:2' labels=frozenset({'RandomNode'}) properties={'value': 6.659881736827988}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:3' labels=frozenset({'RandomNode'}) properties={'value': 62.344004262015886}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:4' labels=frozenset({'RandomNode'}) properties={'value': 47.06140788848291}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:5' labels=frozenset({'RandomNode'}) properties={'value': 98.48678651276849}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:6' labels=frozenset({'RandomNode'}) properties={'value': 11.624137783292642}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:7' labels=frozenset({'RandomNode'}) properties={'value': 73.20785865608688}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:8' labels=frozenset({'RandomNode'}) properties={'value': 5.725035007107349}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:9' labels=frozenset({'RandomNode'}) properties={'value': 39.15713857517649}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:10' labels=frozenset({'RandomNode'}) properties={'value': 81.65845901535084}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:11' labels=frozenset({'RandomNode'}) properties={'value': 76.64097851356087}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:12' labels=frozenset({'RandomNode'}) properties={'value': 61.20185745578327}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:13' labels=frozenset({'RandomNode'}) properties={'value': 84.321903673718}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:14' labels=frozenset({'RandomNode'}) properties={'value': 99.06982902742982}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:15' labels=frozenset({'RandomNode'}) properties={'value': 14.973763461153355}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:16' labels=frozenset({'RandomNode'}) properties={'value': 48.75925569130533}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:17' labels=frozenset({'RandomNode'}) properties={'value': 65.62099589037868}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:18' labels=frozenset({'RandomNode'}) properties={'value': 16.454027920004965}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:19' labels=frozenset({'RandomNode'}) properties={'value': 33.57229784313483}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:20' labels=frozenset({'RandomNode'}) properties={'value': 82.57752301141993}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:21' labels=frozenset({'RandomNode'}) properties={'value': 92.66588311559443}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:22' labels=frozenset({'RandomNode'}) properties={'value': 52.42799428949757}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:23' labels=frozenset({'RandomNode'}) properties={'value': 33.97996393406255}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:24' labels=frozenset({'RandomNode'}) properties={'value': 89.20285500184697}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:25' labels=frozenset({'RandomNode'}) properties={'value': 34.886005957548505}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:26' labels=frozenset({'RandomNode'}) properties={'value': 59.9946545591492}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:27' labels=frozenset({'RandomNode'}) properties={'value': 84.47136098014757}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:28' labels=frozenset({'RandomNode'}) properties={'value': 10.115251856095897}>
<Node element_id='4:1899af0f-0304-4a53-a50f-537b83484eb2:29' labels=frozenset({'RandomNode'}) properties={'value': 20.454310053933554}>