在 Neo4j 中处理层次结构树
引言
我叫 Tomaz Bratanic。我想演示如何在 Neo4j 中处理层次结构位置树。根据我在导入/查询这些树时学到的知识,我总结了一些基本规则,您应该遵循这些规则才能获得正确的查询结果。
位置树规则
-
所有关系都指向从子节点到父节点,向上构建层次结构。
-
所有关系使用单一类型。(PARENT;FROM;IS_IN)
-
每个节点只有一个指向其父节点的出站关系。
-
每个节点可以有一个或多个来自其子节点的入站关系。
联系方式
-
twitter: @tb_tomaz
-
github: https://github.com/tomasonjo
导入
让我们向图谱中加载一些数据进行探索。
添加约束和索引
首先,我们需要添加索引和约束,因为它们将优化我们的查询。下面过程中的第一个数组设置索引,第二个数组包含唯一约束。您需要安装 APOC 库。
CREATE INDEX ON :County(name);
CREATE INDEX ON :City(name);
CREATE INDEX ON :ZipCode(name);
CREATE INDEX ON :Address(name);
CREATE CONSTRAINT ON (h:Hospital) ASSERT h.id IS UNIQUE;
CREATE CONSTRAINT ON (s:State) ASSERT s.name IS UNIQUE;
位置层次结构树导入
您可能会注意到,我们没有采用单独合并每个节点的标准方法,而是根据节点及其父节点在层次结构树中的模式进行合并,因为一些县/市/地址名称相同。
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/tomasonjo/hospitals-neo4j/master/Hospital%20General%20Information.csv" as row
WITH row
WHERE row.State = 'New York'
// state name is unique
MERGE (state:State{name:row.State})
// merge by pattern with their parents
MERGE (state)<-[:IS_IN]-(county:County{name:row.`County Name`})
MERGE (county)<-[:IS_IN]-(city:City{name:row.City})
MERGE (city)<-[:IS_IN]-(zip:ZipCode{name:row.`ZIP Code`})
MERGE (zip)<-[:IS_IN]-(address:Address{name:row.Address})
// for entities, it is best to have an id system
MERGE (h:Hospital{id:row.`Provider ID`})
MERGE (h)-[:IS_IN]->(address)
附加医院信息
我们还将导入一些关于医院的附加信息,例如它们的评级、所有权等。
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/tomasonjo/hospitals-neo4j/master/Hospital%20General%20Information.csv" as row
WITH row
WHERE row.State = 'New York'
MATCH (h:Hospital{id:row.`Provider ID`})
SET h.phone=row.`Phone Number`,
h.emergency_services = row.`Emergency Services`,
h.name= row.`Hospital Name`,
h.mortality = row.`Mortality national comparison`,
h.safety = row.`Safety of care national comparison`,
h.timeliness = row.`Timeliness of care national comparison`,
h.experience = row.`Patient experience national comparison`,
h.effectiveness = row.`Effectiveness of care national comparison`
MERGE (type:HospitalType{name:row.`Hospital Type`})
MERGE (h)-[:HAS_TYPE]->(type)
MERGE (ownership:Ownership{name: row.`Hospital Ownership`})
MERGE (h)-[:HAS_OWNERSHIP]->(ownership)
MERGE (rating:Rating{name:row.`Hospital overall rating`})
MERGE (h)-[:HAS_RATING]->(rating)
地理空间数据导入
最后要导入的是医院的地理空间信息。
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/tomasonjo/hospitals-neo4j/master/gpsinfo.csv" as row
WITH row
WHERE row.State = 'New York'
MATCH (hospital:Hospital {id:row.id})
SET hospital.latitude = toFloat(row.latitude),
hospital.longitude = toFloat(row.longitude)
空间查询示例
假设您在Liberty Island
迷路了,想找到最近的10家医院。距离以米为单位。注意:在 Neo4j Sandbox 中不起作用。
WITH "Liberty Island, Manhattan" as myLocation
CALL apoc.spatial.geocodeOnce(myLocation) YIELD location
WITH point({longitude: location.longitude, latitude: location.latitude}) as myPosition,100 as distanceInKm
MATCH (h:Hospital)-->(rating:Rating)
WHERE exists(h.latitude) and
distance(myPosition, point({longitude:h.longitude,latitude:h.latitude})) < (distanceInKm * 100)
RETURN h.name as hospital,rating.name as rating,distance(myPosition,
point({longitude:h.longitude,latitude:h.latitude})) as distance
ORDER BY distance LIMIT 10
数据验证
验证 #1
我们可以检查是否有任何:Address
节点有多个向上构建层次结构的关系。每个节点只有一个指向其父节点的出站关系规则。
MATCH (a:Address)
WHERE size((a)-[:IS_IN]->()) > 1
RETURN a
查询
让我们运行一些查询并了解我们的数据。
按所有权划分的平均评级
MATCH (r)<-[:HAS_RATING]-(h:Hospital)-[:HAS_OWNERSHIP]->(o)
RETURN o.name as ownership,avg(toInteger(r.name)) as averageRating
ORDER BY averageRating DESC LIMIT 15
每个城市的医院数量
MATCH (h:Hospital)-[:IS_IN*3..3]->(city)
RETURN city.name as city,count(h) as NumberOfHospitals
ORDER BY NumberOfHospitals DESC LIMIT 15
按评级排名的前10个州
MATCH (r)<-[:HAS_RATING]-(h:Hospital)-[:IS_IN*5..5]->(state)
WHERE NOT r.name="Not Available"
RETURN state.name as state,avg(toInteger(r.name)) as averageRating,count(h) as numberOfHospitals
ORDER BY averageRating DESC LIMIT 15
此页面有帮助吗?