伦敦市犯罪分析

我使用 Police UK Open Data 数据集已经有一段时间了。但这需要服务器端脚本进行爬取并将数据导入数据库以供进一步处理。这很麻烦。幸运的是，Neo4j现在提供了数据导入功能。因此，我现在接受这个挑战，使用Neo4j重做我现有应用程序的一小部分。 :)

以下是它的功能

将犯罪案件存储为 :Case 节点
将犯罪发生的时间存储在 :Month 和 :Year 中
将犯罪发生的地点存储在 :Place 中
按 :Category 存储犯罪类型

Police UK开放数据

为了使用小数据集开发一个演示。我们将直接从这个数据集导入数据。

更新：直接从Police UK导入时遇到了一些小问题，因为csv文件中有杂散的单引号 '。所以我创建了一个清理过的csv文件，其中去除了所有杂散的引号。

它包含2015年9月在伦敦市记录的400多起犯罪案件。在节点上设置索引

create index on :Year(value);
create index on :Month(value);
create index on :Place(name);
create index on :Category(name);

load csv with headers from
"https://www.dropbox.com/s/0uffu7j65dn2uz1/2015-09-city-of-london-street.csv?dl=1" as csv
with csv as crimecsv
where crimecsv.`Location` is not null
merge (p:Place {name: crimecsv.`Location`})
with crimecsv, split(crimecsv.`Month`, "-") AS yearMonth
where yearMonth[0] is not null
merge (y:Year {value: toInt(yearMonth[0])})
with crimecsv, split(crimecsv.`Month`, "-") AS yearMonth
where yearMonth[1] is not null
merge (m:Month {value: toInt(yearMonth[1])})
with crimecsv
where crimecsv.`Crime type` is not null
merge (ctg:Category {name: crimecsv.`Crime type`})
with crimecsv
where crimecsv.`Latitude` is not null
merge (c:Case {ref: case when crimecsv.`Crime ID` is null then '' else crimecsv.`Crime ID` end ,lat: toFloat(crimecsv.`Latitude`), lon: toFloat(crimecsv.`Longitude`), outcome: case when crimecsv.`Last outcome category` is null then 'n/a' else crimecsv.`Last outcome category` end })

with crimecsv, split(crimecsv.`Month`, "-") AS yearMonth
match (xc:Case {ref: case when crimecsv.`Crime ID` is null then '' else crimecsv.`Crime ID` end ,lat: toFloat(crimecsv.`Latitude`), lon: toFloat(crimecsv.`Longitude`), outcome: case when crimecsv.`Last outcome category` is null then 'n/a' else crimecsv.`Last outcome category` end }),
(xy:Year {value: toInt(yearMonth[0])}),
(xm:Month {value: toInt(yearMonth[1])}),
(xp:Place {name: crimecsv.`Location`}),
(xctg:Category {name: crimecsv.`Crime type`})

create (xm)-[:YEAR_OF]->(xy),
(xc)-[:HAPPEN_IN]->(xm),
(xc)-[:TYPE_OF]->(xctg),
(xc)-[:AT]->(xp);

高犯罪率地区

显示2015年9月犯罪发生最多的前5个地点！

MATCH (y:Year {value:2015})<-[:YEAR_OF]-(m:Month {value:9})<-[:HAPPEN_IN]-(c:Case)-[:AT]->(p:Place)
RETURN p.name AS `Places`, count(DISTINCT c) AS Occurrences
ORDER BY Occurrences DESC LIMIT 5

常见犯罪类型

2015年9月发生最多的犯罪类型是什么。

MATCH (y:Year {value:2015})<-[:YEAR_OF]-(m:Month {value:9})<-[:HAPPEN_IN]-(c:Case)-[:TYPE_OF]->(ctg:Category)
RETURN ctg.name AS `Crimes`, count(DISTINCT c) AS Occurrences
ORDER BY Occurrences DESC LIMIT 5

犯罪案件状态

有多少案件正在调查、已结案等。

MATCH (y:Year {value:2015})<-[:YEAR_OF]-(m:Month {value:9})<-[:HAPPEN_IN]-(c:Case)
RETURN c.outcome AS `Status`, count(DISTINCT c) AS Occurrences
ORDER BY Occurrences DESC

未来工作

这是一个非常简单的应用程序，只进行导入和简单的聚合与计数。期待未来使用空间查询和openstreetmap api。

我很久没有使用Neo4j了，它现在有了许多过去没有的新功能。很高兴现在能再次尝试它。顺便说一句，这个应用程序是直接用ascii doc编写的，而无需在我的电脑上运行neo4j服务器。这太棒了！

这个页面有帮助吗？

GraphGists