媒体、政治与图

这里是Rik Van Bruggen的原博文。

我的好朋友兼 Neo4j 社区成员 Ron 最近向我介绍了一项了不起的工作。Thomas Boeschoten，来自乌得勒支数据学校（等等），发表了一些令人惊叹的作品，从不同角度分析了荷兰脱口秀，并使用 Gephi 作为工具之一。他的一些结果令人着迷，而且非常酷炫。

netwerk

…

我不会试图帮助您理解 Thomas 研究的深度，我只是想使用 Neo4j 对他慷慨分享的这个数据集进行一番探索。

导入数据集

Rik 最初从 Gephi 导入了放大20倍的数据集，但此 GraphGist 使用的是原始数据的抽样版本。

create (_0:`SHOW` {`Modularity Name`:"B&vD", `id`:"B&vD", `label`:"B&vD", `modularity_class`:3, `weighted outdegree`:0.000000})
create (_1:`SHOW` {`Modularity Name`:"P&W", `id`:"P&W", `label`:"P&W", `modularity_class`:4, `weighted outdegree`:0.000000})
create (_2:`SHOW` {`Modularity Name`:"DWDD", `id`:"DWDD", `label`:"DWDD", `modularity_class`:5, `weighted outdegree`:0.000000})
create (_3:`SHOW` {`Modularity Name`:"Zomergasten", `id`:"Zomergasten", `label`:"Zomergasten", `modularity_class`:0, `weighted outdegree`:0.000000})
create (_5:`SHOW` {`Modularity Name`:"KvdB", `id`:"KvdB", `label`:"KvdB", `modularity_class`:1, `weighted outdegree`:0.000000})
create (_11:`SHOW` {`Modularity Name`:"Jinek", `id`:"Jinek op Zondag", `label`:"Jinek op Zondag", `modularity_class`:2, `weighted outdegree`:0.000000})
create (_16:`SHOW` {`Modularity Name`:"Jinek", `id`:"Jinek", `label`:"Jinek", `modularity_class`:2, `weighted outdegree`:0.000000})
create (_35:`GUEST` {`Modularity Name`:"Jinek", `id`:"Annejet van der Zijl", `label`:"Annejet van der Zijl", `modularity_class`:2, `weighted outdegree`:4.000000})
create (_42:`GUEST` {`Modularity Name`:"KvdB", `Partij`:"VVD", `id`:"Arend Jan Boekestijn", `label`:"Arend Jan Boekestijn", `modularity_class`:1, `weighted outdegree`:12.000000})
create (_48:`GUEST` {`Modularity Name`:"P&W", `id`:"Arthur Japin", `label`:"Arthur Japin", `modularity_class`:4, `weighted outdegree`:5.000000})
create (_55:`GUEST` {`Modularity Name`:"DWDD", `id`:"Barry Atsma", `label`:"Barry Atsma", `modularity_class`:5, `weighted outdegree`:6.000000})
create (_57:`GUEST` {`Modularity Name`:"P&W", `id`:"Bart Chabot", `label`:"Bart Chabot", `modularity_class`:4, `weighted outdegree`:72.000000})
create (_84:`GUEST` {`Modularity Name`:"B&vD", `id`:"Cees Geel", `label`:"Cees Geel", `modularity_class`:3, `weighted outdegree`:5.000000})
create (_106:`GUEST` {`Modularity Name`:"DWDD", `id`:"Derk Sauer", `label`:"Derk Sauer", `modularity_class`:5, `weighted outdegree`:19.000000})
create (_119:`GUEST` {`Modularity Name`:"KvdB", `Partij`:"CDA", `id`:"Dries van Agt", `label`:"Dries van Agt", `modularity_class`:1, `weighted outdegree`:12.000000})
create (_128:`GUEST` {`Modularity Name`:"B&vD", `id`:"Ellen Ten Damme", `label`:"Ellen Ten Damme", `modularity_class`:3, `weighted outdegree`:8.000000})
create (_137:`GUEST` {`Modularity Name`:"KvdB", `id`:"Ernst Dani��l Smid", `label`:"Ernst Dani��l Smid", `modularity_class`:1, `weighted outdegree`:6.000000})
create (_170:`GUEST` {`Modularity Name`:"DWDD", `Partij`:"PVV", `id`:"Geert Wilders", `label`:"Geert Wilders", `modularity_class`:5, `weighted outdegree`:13.000000})
create (_180:`GUEST` {`Modularity Name`:"DWDD", `id`:"Giel Beelen", `label`:"Giel Beelen", `modularity_class`:5, `weighted outdegree`:155.000000})
create (_190:`GUEST` {`Modularity Name`:"DWDD", `id`:"Hadewych Minis", `label`:"Hadewych Minis", `modularity_class`:5, `weighted outdegree`:7.000000})
create (_192:`GUEST` {`Modularity Name`:"P&W", `Partij`:"VVD", `id`:"Halbe Zijlstra", `label`:"Halbe Zijlstra", `modularity_class`:4, `weighted outdegree`:14.000000})
create (_210:`GUEST` {`Modularity Name`:"P&W", `Partij`:"SP", `id`:"Harry van Bommel", `label`:"Harry van Bommel", `modularity_class`:4, `weighted outdegree`:18.000000})
create (_243:`GUEST` {`Modularity Name`:"P&W", `Partij`:"GL", `id`:"Ineke van Gent", `label`:"Ineke van Gent", `modularity_class`:4, `weighted outdegree`:6.000000})
create (_245:`GUEST` {`Modularity Name`:"P&W", `id`:"Ingeborg Beugel", `label`:"Ingeborg Beugel", `modularity_class`:4, `weighted outdegree`:13.000000})
create (_252:`GUEST` {`Modularity Name`:"KvdB", `id`:"Jaap Jongbloed", `label`:"Jaap Jongbloed", `modularity_class`:1, `weighted outdegree`:8.000000})
create (_279:`GUEST` {`Modularity Name`:"B&vD", `id`:"Jenny Arean", `label`:"Jenny Arean", `modularity_class`:3, `weighted outdegree`:6.000000})
create (_291:`GUEST` {`Modularity Name`:"P&W", `Partij`:"PVDA", `id`:"Job Cohen", `label`:"Job Cohen", `modularity_class`:4, `weighted outdegree`:53.000000})
create (_302:`GUEST` {`Modularity Name`:"P&W", `Partij`:"GL", `id`:"Jolande Sap", `label`:"Jolande Sap", `modularity_class`:4, `weighted outdegree`:23.000000})
create (_330:`GUEST` {`Modularity Name`:"Jinek", `id`:"Lange Frans", `label`:"Lange Frans", `modularity_class`:2, `weighted outdegree`:4.000000})
create (_380:`GUEST` {`Modularity Name`:"P&W", `Partij`:"VVD", `id`:"Melanie Schultz-Maas van Haegen Geesteranus", `label`:"Melanie Schultz-Maas van Haegen Geesteranus", `modularity_class`:4, `weighted outdegree`:7.000000})
create (_423:`GUEST` {`Modularity Name`:"P&W", `id`:"Peter Paul de Vries", `label`:"Peter Paul de Vries", `modularity_class`:4, `weighted outdegree`:33.000000})
create (_429:`GUEST` {`Modularity Name`:"P&W", `id`:"Peter Verhaar", `label`:"Peter Verhaar", `modularity_class`:4, `weighted outdegree`:24.000000})
create (_445:`GUEST` {`Modularity Name`:"DWDD", `id`:"Ramsey Nasr", `label`:"Ramsey Nasr", `modularity_class`:5, `weighted outdegree`:10.000000})
create (_448:`GUEST` {`Modularity Name`:"B&vD", `id`:"Ren�� Froger", `label`:"Ren�� Froger", `modularity_class`:3, `weighted outdegree`:8.000000})
create (_509:`GUEST` {`Modularity Name`:"KvdB", `id`:"Thomas Dekker", `label`:"Thomas Dekker", `modularity_class`:1, `weighted outdegree`:4.000000})
create (_552:`PARTY` {`name`:"CDA"})
create (_553:`PARTY` {`name`:"SP"})
create (_554:`PARTY` {`name`:"PVDA"})
create (_555:`PARTY` {`name`:"D66"})
create (_556:`PARTY` {`name`:"CU"})
create (_557:`PARTY` {`name`:"VVD"})
create (_558:`PARTY` {`name`:"PVV"})
create (_559:`PARTY` {`name`:"GL"})
create (_560:`PARTY` {`name`:"50PLUS"})
create (_561:`PARTY` {`name`:"?"})
create (_562:`PARTY` {`name`:"SGP"})
create (_563:`PARTY` {`name`:"EenNL"})
create (_564:`PARTY` {`name`:"PVDD"})
create (_565:`PARTY` {`name`:"LPF"})
create (_566:`PARTY` {`name`:"TROTS"})
create (_567:`GENDER` {`name`:"Male"})
create (_568:`GENDER` {`name`:"Female"})
create _35-[:`HAS_GENDER`]->_568
create _35-[:`VISITED` {`quantity`:1}]->_11
create _35-[:`VISITED` {`quantity`:1}]->_5
create _35-[:`VISITED` {`quantity`:1}]->_1
create _35-[:`VISITED` {`quantity`:1}]->_0
create _42-[:`AFFILIATED_WITH`]->_557
create _42-[:`HAS_GENDER`]->_567
create _42-[:`VISITED` {`quantity`:1}]->_11
create _42-[:`VISITED` {`quantity`:4}]->_5
create _42-[:`VISITED` {`quantity`:3}]->_2
create _42-[:`VISITED` {`quantity`:2}]->_1
create _42-[:`VISITED` {`quantity`:2}]->_0
create _48-[:`HAS_GENDER`]->_567
create _48-[:`VISITED` {`quantity`:1}]->_11
create _48-[:`VISITED` {`quantity`:1}]->_2
create _48-[:`VISITED` {`quantity`:3}]->_1
create _55-[:`HAS_GENDER`]->_567
create _55-[:`VISITED` {`quantity`:1}]->_11
create _55-[:`VISITED` {`quantity`:4}]->_2
create _55-[:`VISITED` {`quantity`:1}]->_1
create _57-[:`HAS_GENDER`]->_567
create _57-[:`VISITED` {`quantity`:2}]->_5
create _57-[:`VISITED` {`quantity`:8}]->_2
create _57-[:`VISITED` {`quantity`:41}]->_1
create _57-[:`VISITED` {`quantity`:21}]->_0
create _84-[:`HAS_GENDER`]->_567
create _84-[:`VISITED` {`quantity`:2}]->_2
create _84-[:`VISITED` {`quantity`:1}]->_1
create _84-[:`VISITED` {`quantity`:2}]->_0
create _106-[:`HAS_GENDER`]->_567
create _106-[:`VISITED` {`quantity`:1}]->_16
create _106-[:`VISITED` {`quantity`:1}]->_11
create _106-[:`VISITED` {`quantity`:7}]->_2
create _106-[:`VISITED` {`quantity`:4}]->_1
create _106-[:`VISITED` {`quantity`:6}]->_0
create _119-[:`AFFILIATED_WITH`]->_552
create _119-[:`HAS_GENDER`]->_567
create _119-[:`VISITED` {`quantity`:1}]->_11
create _119-[:`VISITED` {`quantity`:4}]->_5
create _119-[:`VISITED` {`quantity`:1}]->_3
create _119-[:`VISITED` {`quantity`:2}]->_2
create _119-[:`VISITED` {`quantity`:4}]->_1
create _128-[:`HAS_GENDER`]->_568
create _128-[:`VISITED` {`quantity`:2}]->_11
create _128-[:`VISITED` {`quantity`:1}]->_5
create _128-[:`VISITED` {`quantity`:1}]->_1
create _128-[:`VISITED` {`quantity`:4}]->_0
create _137-[:`HAS_GENDER`]->_567
create _137-[:`VISITED` {`quantity`:2}]->_5
create _137-[:`VISITED` {`quantity`:3}]->_2
create _137-[:`VISITED` {`quantity`:1}]->_1
create _170-[:`AFFILIATED_WITH`]->_558
create _170-[:`HAS_GENDER`]->_567
create _170-[:`VISITED` {`quantity`:1}]->_11
create _170-[:`VISITED` {`quantity`:3}]->_5
create _170-[:`VISITED` {`quantity`:7}]->_2
create _170-[:`VISITED` {`quantity`:2}]->_0
create _180-[:`HAS_GENDER`]->_567
create _180-[:`VISITED` {`quantity`:154}]->_2
create _180-[:`VISITED` {`quantity`:1}]->_0
create _190-[:`HAS_GENDER`]->_568
create _190-[:`VISITED` {`quantity`:1}]->_11
create _190-[:`VISITED` {`quantity`:1}]->_5
create _190-[:`VISITED` {`quantity`:3}]->_2
create _190-[:`VISITED` {`quantity`:1}]->_1
create _190-[:`VISITED` {`quantity`:1}]->_0
create _192-[:`AFFILIATED_WITH`]->_557
create _192-[:`HAS_GENDER`]->_567
create _192-[:`VISITED` {`quantity`:1}]->_11
create _192-[:`VISITED` {`quantity`:2}]->_5
create _192-[:`VISITED` {`quantity`:11}]->_1
create _210-[:`AFFILIATED_WITH`]->_553
create _210-[:`HAS_GENDER`]->_567
create _210-[:`VISITED` {`quantity`:1}]->_11
create _210-[:`VISITED` {`quantity`:3}]->_5
create _210-[:`VISITED` {`quantity`:2}]->_2
create _210-[:`VISITED` {`quantity`:9}]->_1
create _210-[:`VISITED` {`quantity`:3}]->_0
create _243-[:`AFFILIATED_WITH`]->_559
create _243-[:`HAS_GENDER`]->_568
create _243-[:`VISITED` {`quantity`:1}]->_5
create _243-[:`VISITED` {`quantity`:1}]->_2
create _243-[:`VISITED` {`quantity`:4}]->_1
create _245-[:`HAS_GENDER`]->_568
create _245-[:`VISITED` {`quantity`:1}]->_5
create _245-[:`VISITED` {`quantity`:1}]->_2
create _245-[:`VISITED` {`quantity`:10}]->_1
create _245-[:`VISITED` {`quantity`:1}]->_0
create _252-[:`HAS_GENDER`]->_567
create _252-[:`VISITED` {`quantity`:1}]->_11
create _252-[:`VISITED` {`quantity`:2}]->_5
create _252-[:`VISITED` {`quantity`:1}]->_2
create _252-[:`VISITED` {`quantity`:3}]->_1
create _252-[:`VISITED` {`quantity`:1}]->_0
create _279-[:`HAS_GENDER`]->_568
create _279-[:`VISITED` {`quantity`:2}]->_2
create _279-[:`VISITED` {`quantity`:2}]->_1
create _279-[:`VISITED` {`quantity`:2}]->_0
create _291-[:`AFFILIATED_WITH`]->_554
create _291-[:`HAS_GENDER`]->_567
create _291-[:`VISITED` {`quantity`:5}]->_5
create _291-[:`VISITED` {`quantity`:18}]->_2
create _291-[:`VISITED` {`quantity`:24}]->_1
create _291-[:`VISITED` {`quantity`:6}]->_0
create _302-[:`AFFILIATED_WITH`]->_559
create _302-[:`HAS_GENDER`]->_568
create _302-[:`VISITED` {`quantity`:1}]->_11
create _302-[:`VISITED` {`quantity`:3}]->_5
create _302-[:`VISITED` {`quantity`:8}]->_2
create _302-[:`VISITED` {`quantity`:11}]->_1
create _330-[:`HAS_GENDER`]->_567
create _330-[:`VISITED` {`quantity`:1}]->_11
create _330-[:`VISITED` {`quantity`:1}]->_2
create _330-[:`VISITED` {`quantity`:1}]->_1
create _330-[:`VISITED` {`quantity`:1}]->_0
create _380-[:`AFFILIATED_WITH`]->_557
create _380-[:`HAS_GENDER`]->_568
create _380-[:`VISITED` {`quantity`:1}]->_11
create _380-[:`VISITED` {`quantity`:1}]->_2
create _380-[:`VISITED` {`quantity`:3}]->_1
create _380-[:`VISITED` {`quantity`:2}]->_0
create _423-[:`HAS_GENDER`]->_567
create _423-[:`VISITED` {`quantity`:1}]->_11
create _423-[:`VISITED` {`quantity`:2}]->_5
create _423-[:`VISITED` {`quantity`:29}]->_1
create _423-[:`VISITED` {`quantity`:1}]->_0
create _429-[:`HAS_GENDER`]->_567
create _429-[:`VISITED` {`quantity`:2}]->_11
create _429-[:`VISITED` {`quantity`:1}]->_5
create _429-[:`VISITED` {`quantity`:21}]->_1
create _445-[:`HAS_GENDER`]->_567
create _445-[:`VISITED` {`quantity`:5}]->_2
create _445-[:`VISITED` {`quantity`:4}]->_1
create _445-[:`VISITED` {`quantity`:1}]->_0
create _448-[:`HAS_GENDER`]->_567
create _448-[:`VISITED` {`quantity`:2}]->_2
create _448-[:`VISITED` {`quantity`:2}]->_1
create _448-[:`VISITED` {`quantity`:4}]->_0
create _509-[:`HAS_GENDER`]->_567
create _509-[:`VISITED` {`quantity`:1}]->_5
create _509-[:`VISITED` {`quantity`:1}]->_2
create _509-[:`VISITED` {`quantity`:1}]->_1
create _509-[:`VISITED` {`quantity`:1}]->_0

…

然而，当我启动服务器时，我很快发现我需要做一些工作 :) ... Thomas 创建的图没有真正的“数据库式”模型（例如，它没有对模型进行任何规范化） - 而且 Neo4j Browser 看起来有点无聊。

Screen%2BShot%2B2014 03 23%2Bat%2B19.28.11

我需要为这一切添加一些结构，以便能够有意义地查询它。

添加模型

在浏览了数据之后，我决定我将使用的数据模型看起来会像这样

Screen%2BShot%2B2014 03 23%2Bat%2B19.34.51

你可以看到它不是一个非常大的图

MATCH (n)
RETURN head(labels(n)) as labels,count(*) as count

但它的连接相当密集 - 节点之间有很多关系

MATCH (n)-[r]->(m)
RETURN head(labels(n)) as start, type(r) as rel, head(labels(m)) as end, count(*) as count

所以现在我可以对数据进行一些更有趣的查询，看看是否 - 就像 Thomas 的研究一样 - 我能发现关于这个数据集的一些有趣信息。来试一下：CYPHER查询！

让我们从一些简单的查询开始。让我们看看有多少人参加过不同的脱口秀

match (g:GUEST)-[v:VISITED]->(sh:SHOW)
return sh.id as Show, count(v) as NrOfVisits
order by NrOfVisits desc;

我们立刻就能感受到哪些是主要的脱口秀

但接下来我们来看看这些脱口秀嘉宾中有多少是政治家（或者至少有政治背景）。让我们稍微扩展一下查询

match (g:GUEST)-[v:VISITED]->(sh:SHOW),
g-[:AFFILIATED_WITH]->(p:PARTY)
return sh.id as Show, count(v) as NrOfVisits
order by NrOfVisits desc;

看看脱口秀的排名方式是否有差异

有趣。正如你所见，确实存在一些差异。

现在让我们从数据集的另一个角度来看：性别。我们来看看所有这些脱口秀中男性/女性嘉宾的分布情况

match (g:GUEST)-[:HAS_GENDER]->(gen:GENDER),
(g)-[v:VISITED]->(sh:SHOW)
return gen.name, count(v)
order by gen.name ASC;

我们可以清楚地看到男性在这些脱口秀中仍然占主导地位

如果我们再次加入政治维度，看看参加脱口秀的政治嘉宾的性别分布

match (g:GUEST)-[:HAS_GENDER]->(gen:GENDER),
(g)-[v:VISITED]->(sh:SHOW),
(g)-[:AFFILIATED_WITH]->(p:PARTY)
return gen.name, count(v)
order by gen.name ASC;

那么我们可以看到分布情况大体相同

我确信还有很多其他可以想到的查询，但在这篇文章中我再做一个：让我们看看不同脱口秀之间嘉宾访问的重叠度。为此，我们只需要计算两个脱口秀之间的路径：DWDD和P&W。

match p = AllShortestPaths((s1:SHOW {id:"DWDD"})-[*..2]-(s2:SHOW {id:"P&W"}))
return nodes(p)
limit 5;

结果正如你所料：大量的重叠 - 至少在这两个（见上文：最大的）脱口秀之间。因此，查询中的“limit 5”是为了让我的可怜的 Neo4j Browser 不至于崩溃。

总结

目前为止我就这些内容。你可以在这里下载数据库。我上面使用的查询都在GitHub上。

从我的角度来看，我认为这类数据集极其有趣且强大。我希望看到更多像 Thomas 这样的工作，无论来自我的国家还是国外，并从更广泛的角度来看待这个问题。无论如何，我要感谢并称赞 Thomas 的工作，并期待您的反馈。

希望这有用。

致意

Rik

再次链接到原博文

此页面有帮助吗？

GraphGists

媒体、政治与图