GraphGists

在 Neo4j 中处理层次结构树

hospitalmeta

引言

我叫 Tomaz Bratanic。我想演示如何在 Neo4j 中处理层次结构位置树。根据我在导入/查询这些树时学到的知识,我总结了一些基本规则,您应该遵循这些规则才能获得正确的查询结果。

位置树规则

  • 所有关系都指向从子节点到父节点,向上构建层次结构。

  • 所有关系使用单一类型。(PARENT;FROM;IS_IN)

  • 每个节点只有一个指向其父节点的出站关系。

  • 每个节点可以有一个或多个来自其子节点的入站关系。

导入

让我们向图谱中加载一些数据进行探索。

添加约束和索引

首先,我们需要添加索引和约束,因为它们将优化我们的查询。下面过程中的第一个数组设置索引,第二个数组包含唯一约束。您需要安装 APOC 库。

CREATE INDEX ON :County(name);
CREATE INDEX ON :City(name);
CREATE INDEX ON :ZipCode(name);
CREATE INDEX ON :Address(name);

CREATE CONSTRAINT ON (h:Hospital) ASSERT h.id IS UNIQUE;
CREATE CONSTRAINT ON (s:State) ASSERT s.name IS UNIQUE;

位置层次结构树导入

您可能会注意到,我们没有采用单独合并每个节点的标准方法,而是根据节点及其父节点在层次结构树中的模式进行合并,因为一些县/市/地址名称相同。

LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/tomasonjo/hospitals-neo4j/master/Hospital%20General%20Information.csv" as row
WITH row
WHERE row.State = 'New York'
// state name is unique
MERGE (state:State{name:row.State})
// merge by pattern with their parents
MERGE (state)<-[:IS_IN]-(county:County{name:row.`County Name`})
MERGE (county)<-[:IS_IN]-(city:City{name:row.City})
MERGE (city)<-[:IS_IN]-(zip:ZipCode{name:row.`ZIP Code`})
MERGE (zip)<-[:IS_IN]-(address:Address{name:row.Address})
// for entities, it is best to have an id system
MERGE (h:Hospital{id:row.`Provider ID`})
MERGE (h)-[:IS_IN]->(address)

附加医院信息

我们还将导入一些关于医院的附加信息,例如它们的评级、所有权等。

LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/tomasonjo/hospitals-neo4j/master/Hospital%20General%20Information.csv" as row
WITH row
WHERE row.State = 'New York'
MATCH (h:Hospital{id:row.`Provider ID`})
SET h.phone=row.`Phone Number`,
    h.emergency_services = row.`Emergency Services`,
    h.name= row.`Hospital Name`,
    h.mortality = row.`Mortality national comparison`,
    h.safety = row.`Safety of care national comparison`,
    h.timeliness = row.`Timeliness of care national comparison`,
    h.experience = row.`Patient experience national comparison`,
    h.effectiveness = row.`Effectiveness of care national comparison`
MERGE (type:HospitalType{name:row.`Hospital Type`})
MERGE (h)-[:HAS_TYPE]->(type)
MERGE (ownership:Ownership{name: row.`Hospital Ownership`})
MERGE (h)-[:HAS_OWNERSHIP]->(ownership)
MERGE (rating:Rating{name:row.`Hospital overall rating`})
MERGE (h)-[:HAS_RATING]->(rating)

地理空间数据导入

最后要导入的是医院的地理空间信息。

LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/tomasonjo/hospitals-neo4j/master/gpsinfo.csv" as row
WITH row
WHERE row.State = 'New York'
MATCH (hospital:Hospital {id:row.id})
SET hospital.latitude = toFloat(row.latitude),
    hospital.longitude = toFloat(row.longitude)

空间查询示例

假设您在Liberty Island迷路了,想找到最近的10家医院。距离以米为单位。注意:在 Neo4j Sandbox 中不起作用。

WITH "Liberty Island, Manhattan" as myLocation
CALL apoc.spatial.geocodeOnce(myLocation) YIELD location
WITH point({longitude: location.longitude, latitude: location.latitude}) as myPosition,100 as distanceInKm
MATCH (h:Hospital)-->(rating:Rating)
WHERE exists(h.latitude) and
  distance(myPosition, point({longitude:h.longitude,latitude:h.latitude})) < (distanceInKm * 100)
RETURN h.name as hospital,rating.name as rating,distance(myPosition,
       point({longitude:h.longitude,latitude:h.latitude})) as distance
ORDER BY distance LIMIT 10

数据验证

验证 #1

我们可以检查是否有任何:Address节点有多个向上构建层次结构的关系。每个节点只有一个指向其父节点的出站关系规则。

MATCH (a:Address)
WHERE size((a)-[:IS_IN]->()) > 1
RETURN a

验证 #2

我们还可以检查位置树中所有路径的长度。根据我们制定的规则,每家医院必须有且只有一个位置路径,因为每家医院有且只有一个地址。

MATCH path=(h:Hospital)-[:IS_IN*..10]->(location)
WHERE NOT (location)-[:IS_IN]->()
RETURN distinct(length(path)) as length,
       count(*) as numberOfPaths,
       count(distinct(h)) as numberOfHospitals

验证 #3

检查每个节点有多少个标签。这在学习时很有用。您不希望节点没有标签。

MATCH (n)
RETURN size(labels(n)) as size,count(*) as count

查询

让我们运行一些查询并了解我们的数据。

按所有权划分的平均评级

MATCH (r)<-[:HAS_RATING]-(h:Hospital)-[:HAS_OWNERSHIP]->(o)
RETURN o.name as ownership,avg(toInteger(r.name)) as averageRating
ORDER BY averageRating DESC LIMIT 15

每个城市的医院数量

MATCH (h:Hospital)-[:IS_IN*3..3]->(city)
RETURN city.name as city,count(h) as NumberOfHospitals
ORDER BY NumberOfHospitals DESC LIMIT 15

按评级排名的前10个州

MATCH (r)<-[:HAS_RATING]-(h:Hospital)-[:IS_IN*5..5]->(state)
WHERE NOT r.name="Not Available"
RETURN state.name as state,avg(toInteger(r.name)) as averageRating,count(h) as numberOfHospitals
ORDER BY averageRating DESC LIMIT 15

哪些州拥有效率高于平均水平的医院数量最多

MATCH (h:Hospital)-[:IS_IN*5..5]->(state)
WHERE h.effectiveness = "Above the National average"
RETURN state.name as state,h.effectiveness,count(h) as numberOfHospitals
ORDER BY numberOfHospitals DESC LIMIT 15

哪些州拥有死亡率低于平均水平的医院数量最多

MATCH (h:Hospital)-[:IS_IN*5..5]->(state)
WHERE h.mortality = "Below the National average"
RETURN state.name as state,h.mortality,count(h) as numberOfHospitals
ORDER BY numberOfHospitals DESC LIMIT 15
© . All rights reserved.