GraphGists

如何从发布此类内容的用户中提取信息

此gist是我撰写的一篇博文的实现,关于社交网络之间的关系。

link blogger faceboook twitter gplus youtube pinterest

在博客示例中,为了提取信息,我使用了YouTube、Twitter和Soundcloud的API以及REST API方法。在此例中,我准备了数据集。我们可以关联不同的社交网络来提取账户信息。首先,Twitter是一个持续的发布流,我们可以提取关于一条推文的不同元素(标签、提及、URL)。URL是将一个社交网络链接到另一个社交网络的元素,这为我们提供了巨大的潜力。

样本数据集

实体

  • 带有screen_name、text、时间戳和国家属性的推文

  • 带有url属性的URL

  • 带有title、channelid、tags、ytAgeRestricted和update属性的YouTube视频

  • 带有title、description和upload属性的曲目

  • 带有screen_name、description、googleplusid、relatedvideos和channelcreate属性的频道

  • 带有displayName、aboutMe、image、location和gender属性的Google用户

//Nodes.
CREATE (Tweet1:TWITTER {screen_name: "Alberto", text: "Metallica with Jason Newsted Creeping death LIVE San Francisco, USA 2011... https://#/RuI72pyx2v via @YouTube" , timepstamp :"1450687966" , country :"ES" })
CREATE (Tweet2:TWITTER {screen_name:"Pedro", text: " RT Metallica with Jason Newsted Creeping death LIVE San Francisco, USA 2011... https://#/RuI72pyx2v via @YouTube" , timepstamp :"1450612800" , country :"ES" })
CREATE (Tweet3:TWITTER {screen_name:"Eva", text: "RT Metallica with Jason Newsted Creeping death LIVE San Francisco, USA 2011... https://#/RuI72pyx2v via @YouTube" , timepstamp :"1450609980" , country :"ES" })
CREATE (Tweet4:TWITTER {screen_name: "BadGuy", text: "LIVE Figth in school, https://#/ZcQ72pyx2v", timepstamp :"1450687966" , country :"ES" })
CREATE (Tweet5:TWITTER {screen_name:"Mariano" , text:"Metallica https://#/RuI72pyx2v via @Soundcloud", timepstamp :"1450612800" , country :"ES" })
CREATE (Tweet6:TWITTER {screen_name:"Miguel",text:"BBC NEWS https://#/RuI72pyx2v"  , timepstamp :"1450609980" , country :"ES" })
CREATE (URL1:URL:SOCIALNETWORK {url: "https://youtu.be/ASZXbb3a24t" })
CREATE (URL2:URL:SOCIALNETWORK {url:"https://youtu.be/CURLzg0ia5w" })
CREATE (URL3:URL:SOCIALNETWORK {url:"https://soundcloud.com/hassan-awaly/8ik0r4axk78m" })
CREATE (URL4:URL {url:"https://bbc.in/1MZFNWu"})
CREATE (YouTubeVideo1:YOUTUBE {title: "Metallica",channelid: "456456456", tags: "music", ytAgeRestricted: "false", update :"1450685966" })
CREATE (YouTubeVideo2:YOUTUBE {title:"School fight", channelid: "123123123", tags: "violence", ytAgeRestricted: "true", update :"1450614800" })
CREATE (Track1:SOUNDCLOUD {title:"Queen - we are rock you", description: "The best song ever" , upload :"1450604980" })
CREATE (Channel1:CHANNELYOUTUBE {screen_name: "Alberto", description:"I'm the best", googleplusid:"123321", relatedvideos:"Trailers 2016" , channelcreate :"1450604123" })
CREATE (Channel2:CHANNELYOUTUBE {screen_name:"BadGuy", description:"I'm a bad guy", googleplusid:"678876", relatedvideos:"Why not?" , channelcreate :"1450234123" })
CREATE (Googleuser1:GOOGLEPLUS {displayName:"BadGuy", aboutMe:"I'm 24 years old, (personal information)", image: "https://lh3.googleusercontent.com/ry5g21lx8j8/photo.jpg", location: "Spain", gender: "male" })
CREATE (Googleuser2:GOOGLEPLUS {displayName:"Geroma", aboutMe:"Not all, (personal information)", image: "https://lh3.googleusercontent.com/asfdi23594/photo2.jpg", location: "USA", gender: "female" })


// Relations.
//TWITTER - URLS
CREATE (Tweet1)-[:PUBLISHED {time:'4/17/2014'}]->(URL1)
CREATE (Tweet2)-[:PUBLISHED {time:'5/15/2014'}]->(URL1)
CREATE (Tweet3)-[:PUBLISHED {time:'3/28/2014'}]->(URL1)
CREATE (Tweet4)-[:PUBLISHED {time:'3/20/2014'}]->(URL2)
CREATE (Tweet5)-[:PUBLISHED {time:'7/24/2014'}]->(URL3)
CREATE (Tweet6)-[:PUBLISHED {time:'7/24/2014'}]->(URL4)
// URL - SOCIAL NETWORK
CREATE (URL1)-[:RELATED]->(YouTubeVideo1)
CREATE (URL2)-[:RELATED]->(YouTubeVideo2)
CREATE (URL3)-[:RELATED]->(Track1)
// YOUTUBEVIDEO - YOUTUBECHANNEL
CREATE (YouTubeVideo1)-[:AUTHOR]->(Channel1)
CREATE (YouTubeVideo2)-[:AUTHOR]->(Channel2)
// YOUTUBECHANNEL - GOOGLE+
CREATE (Channel2)-[:LINK]->(Googleuser1)
CREATE (Channel1)-[:LINK]->(Googleuser2)

图谱

查询

识别推文中的URL

推文可以包含URL,也可以不包含,这个查询会提取我们感兴趣的部分。

MATCH (n1:URL)<-[:PUBLISHED]-(n2:TWITTER)
RETURN n2.text

这些URL链接到社交网络或其他平台,我们只对社交网络感兴趣。

MATCH (n1:URL:SOCIALNETWORK)<-[:PUBLISHED]-(n2:TWITTER)
RETURN n2.text

识别YouTube社交网络

为了提取最大量的信息,我们希望URL指向YouTube。

MATCH (n1:URL:SOCIALNETWORK)<-[:PUBLISHED]-(n2:TWITTER)
WITH n1 AS URL
MATCH (n3:YOUTUBE)<-[:RELATED]-(URL)
RETURN distinct n3

识别YouTube上的不当视频

按年龄限制进行过滤,这些视频在某种程度上很重要。

MATCH (n1:URL:SOCIALNETWORK)<-[:PUBLISHED]-(n2:TWITTER)
WITH n1 AS URL
MATCH (n3:YOUTUBE)<-[:RELATED]-(URL)
WITH n3 as VIDEO
MATCH (VIDEO {ytAgeRestricted:"true"})
RETURN distinct VIDEO

返回账户信息

MATCH (n5:GOOGLEPLUS)<--(n4:CHANNELYOUTUBE)<--(n3:YOUTUBE {ytAgeRestricted:"true"})<-[:RELATED]-(n2:URL:SOCIALNETWORK)<-[:PUBLISHED]-(n1:TWITTER)
RETURN n1,n3,n4,n5

结论

通过这个例子,我们可以看到社交网络的力量,以及我们如何找到发布不当内容的人并通知当局,特别是通过内容而不是寻找这些人。

© . All rights reserved.