在 Neo4j 中使用分层树
介绍
我的名字是 Tomaz Bratanic。我想演示一下如何在 Neo4j 中处理分层位置树。根据我在导入/查询它们过程中学到的知识,我提出了一些应该遵循的基本规则,以便获得正确的查询结果。
位置树规则
-
所有关系都从子节点指向父节点,向上层级发展。
-
我们为所有关系使用单一类型。(PARENT;FROM;IS_IN)
-
每个节点都有一条指向其父节点的出度关系。
-
每个节点可以有一个或多个来自其子节点的入度关系。
联系方式
-
twitter: @tb_tomaz
-
github: https://github.com/tomasonjo
导入
让我们将一些数据加载到我们的图中进行探索。
添加约束和索引
首先,我们需要添加索引和约束,因为它们会优化我们的查询。下面过程中的第一个数组设置索引,第二个数组包含唯一约束。您需要安装 APOC 库。
CREATE INDEX ON :County(name);
CREATE INDEX ON :City(name);
CREATE INDEX ON :ZipCode(name);
CREATE INDEX ON :Address(name);
CREATE CONSTRAINT ON (h:Hospital) ASSERT h.id IS UNIQUE;
CREATE CONSTRAINT ON (s:State) ASSERT s.name IS UNIQUE;
位置分层树导入
您可以注意到,我们没有采用标准方法,即分别合并每个节点,而是以模式及其父节点在分层树中的方式合并它们,因为某些县/市/地址具有相同的名称。
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/tomasonjo/hospitals-neo4j/master/Hospital%20General%20Information.csv" as row
WITH row
WHERE row.State = 'New York'
// state name is unique
MERGE (state:State{name:row.State})
// merge by pattern with their parents
MERGE (state)<-[:IS_IN]-(county:County{name:row.`County Name`})
MERGE (county)<-[:IS_IN]-(city:City{name:row.City})
MERGE (city)<-[:IS_IN]-(zip:ZipCode{name:row.`ZIP Code`})
MERGE (zip)<-[:IS_IN]-(address:Address{name:row.Address})
// for entities, it is best to have an id system
MERGE (h:Hospital{id:row.`Provider ID`})
MERGE (h)-[:IS_IN]->(address)
其他医院信息
我们还将导入有关医院的一些其他信息,例如其评级、所有权等。
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/tomasonjo/hospitals-neo4j/master/Hospital%20General%20Information.csv" as row
WITH row
WHERE row.State = 'New York'
MATCH (h:Hospital{id:row.`Provider ID`})
SET h.phone=row.`Phone Number`,
h.emergency_services = row.`Emergency Services`,
h.name= row.`Hospital Name`,
h.mortality = row.`Mortality national comparison`,
h.safety = row.`Safety of care national comparison`,
h.timeliness = row.`Timeliness of care national comparison`,
h.experience = row.`Patient experience national comparison`,
h.effectiveness = row.`Effectiveness of care national comparison`
MERGE (type:HospitalType{name:row.`Hospital Type`})
MERGE (h)-[:HAS_TYPE]->(type)
MERGE (ownership:Ownership{name: row.`Hospital Ownership`})
MERGE (h)-[:HAS_OWNERSHIP]->(ownership)
MERGE (rating:Rating{name:row.`Hospital overall rating`})
MERGE (h)-[:HAS_RATING]->(rating)
地理空间导入
最后要导入的是医院的地理空间信息。
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/tomasonjo/hospitals-neo4j/master/gpsinfo.csv" as row
WITH row
WHERE row.State = 'New York'
MATCH (hospital:Hospital {id:row.id})
SET hospital.latitude = toFloat(row.latitude),
hospital.longitude = toFloat(row.longitude)
空间查询示例
假设您在自由岛
迷路了,并且想要找到最近的 10 家医院。距离以米为单位。**注意:在 Neo4j 沙箱中不起作用。**
WITH "Liberty Island, Manhattan" as myLocation
CALL apoc.spatial.geocodeOnce(myLocation) YIELD location
WITH point({longitude: location.longitude, latitude: location.latitude}) as myPosition,100 as distanceInKm
MATCH (h:Hospital)-->(rating:Rating)
WHERE exists(h.latitude) and
distance(myPosition, point({longitude:h.longitude,latitude:h.latitude})) < (distanceInKm * 100)
RETURN h.name as hospital,rating.name as rating,distance(myPosition,
point({longitude:h.longitude,latitude:h.latitude})) as distance
ORDER BY distance LIMIT 10
数据验证
验证 #1
我们可以检查是否有任何:Address
具有多个向上层级发展的关系。每个节点都有一条指向其父节点的出度关系规则。
MATCH (a:Address)
WHERE size((a)-[:IS_IN]->()) > 1
RETURN a
查询
让我们运行一些查询并了解我们的数据。
按所有权计算平均评分
MATCH (r)<-[:HAS_RATING]-(h:Hospital)-[:HAS_OWNERSHIP]->(o)
RETURN o.name as ownership,avg(toInteger(r.name)) as averageRating
ORDER BY averageRating DESC LIMIT 15
每个城市的医院数量
MATCH (h:Hospital)-[:IS_IN*3..3]->(city)
RETURN city.name as city,count(h) as NumberOfHospitals
ORDER BY NumberOfHospitals DESC LIMIT 15
按评分排名前 10 的州
MATCH (r)<-[:HAS_RATING]-(h:Hospital)-[:IS_IN*5..5]->(state)
WHERE NOT r.name="Not Available"
RETURN state.name as state,avg(toInteger(r.name)) as averageRating,count(h) as numberOfHospitals
ORDER BY averageRating DESC LIMIT 15
此页面是否有帮助?