社交网络不当内容检测
如何从发布此类内容的用户中提取信息
此gist是我撰写的一篇博文的实现,关于社交网络之间的关系。

在博客示例中,为了提取信息,我使用了YouTube、Twitter和Soundcloud的API以及REST API方法。在此例中,我准备了数据集。我们可以关联不同的社交网络来提取账户信息。首先,Twitter是一个持续的发布流,我们可以提取关于一条推文的不同元素(标签、提及、URL)。URL是将一个社交网络链接到另一个社交网络的元素,这为我们提供了巨大的潜力。
样本数据集
实体
-
带有screen_name、text、时间戳和国家属性的推文。
-
带有url属性的URL。
-
带有title、channelid、tags、ytAgeRestricted和update属性的YouTube视频。
-
带有title、description和upload属性的曲目。
-
带有screen_name、description、googleplusid、relatedvideos和channelcreate属性的频道。
-
带有displayName、aboutMe、image、location和gender属性的Google用户。
//Nodes.
CREATE (Tweet1:TWITTER {screen_name: "Alberto", text: "Metallica with Jason Newsted Creeping death LIVE San Francisco, USA 2011... https://#/RuI72pyx2v via @YouTube" , timepstamp :"1450687966" , country :"ES" })
CREATE (Tweet2:TWITTER {screen_name:"Pedro", text: " RT Metallica with Jason Newsted Creeping death LIVE San Francisco, USA 2011... https://#/RuI72pyx2v via @YouTube" , timepstamp :"1450612800" , country :"ES" })
CREATE (Tweet3:TWITTER {screen_name:"Eva", text: "RT Metallica with Jason Newsted Creeping death LIVE San Francisco, USA 2011... https://#/RuI72pyx2v via @YouTube" , timepstamp :"1450609980" , country :"ES" })
CREATE (Tweet4:TWITTER {screen_name: "BadGuy", text: "LIVE Figth in school, https://#/ZcQ72pyx2v", timepstamp :"1450687966" , country :"ES" })
CREATE (Tweet5:TWITTER {screen_name:"Mariano" , text:"Metallica https://#/RuI72pyx2v via @Soundcloud", timepstamp :"1450612800" , country :"ES" })
CREATE (Tweet6:TWITTER {screen_name:"Miguel",text:"BBC NEWS https://#/RuI72pyx2v" , timepstamp :"1450609980" , country :"ES" })
CREATE (URL1:URL:SOCIALNETWORK {url: "https://youtu.be/ASZXbb3a24t" })
CREATE (URL2:URL:SOCIALNETWORK {url:"https://youtu.be/CURLzg0ia5w" })
CREATE (URL3:URL:SOCIALNETWORK {url:"https://soundcloud.com/hassan-awaly/8ik0r4axk78m" })
CREATE (URL4:URL {url:"https://bbc.in/1MZFNWu"})
CREATE (YouTubeVideo1:YOUTUBE {title: "Metallica",channelid: "456456456", tags: "music", ytAgeRestricted: "false", update :"1450685966" })
CREATE (YouTubeVideo2:YOUTUBE {title:"School fight", channelid: "123123123", tags: "violence", ytAgeRestricted: "true", update :"1450614800" })
CREATE (Track1:SOUNDCLOUD {title:"Queen - we are rock you", description: "The best song ever" , upload :"1450604980" })
CREATE (Channel1:CHANNELYOUTUBE {screen_name: "Alberto", description:"I'm the best", googleplusid:"123321", relatedvideos:"Trailers 2016" , channelcreate :"1450604123" })
CREATE (Channel2:CHANNELYOUTUBE {screen_name:"BadGuy", description:"I'm a bad guy", googleplusid:"678876", relatedvideos:"Why not?" , channelcreate :"1450234123" })
CREATE (Googleuser1:GOOGLEPLUS {displayName:"BadGuy", aboutMe:"I'm 24 years old, (personal information)", image: "https://lh3.googleusercontent.com/ry5g21lx8j8/photo.jpg", location: "Spain", gender: "male" })
CREATE (Googleuser2:GOOGLEPLUS {displayName:"Geroma", aboutMe:"Not all, (personal information)", image: "https://lh3.googleusercontent.com/asfdi23594/photo2.jpg", location: "USA", gender: "female" })
// Relations.
//TWITTER - URLS
CREATE (Tweet1)-[:PUBLISHED {time:'4/17/2014'}]->(URL1)
CREATE (Tweet2)-[:PUBLISHED {time:'5/15/2014'}]->(URL1)
CREATE (Tweet3)-[:PUBLISHED {time:'3/28/2014'}]->(URL1)
CREATE (Tweet4)-[:PUBLISHED {time:'3/20/2014'}]->(URL2)
CREATE (Tweet5)-[:PUBLISHED {time:'7/24/2014'}]->(URL3)
CREATE (Tweet6)-[:PUBLISHED {time:'7/24/2014'}]->(URL4)
// URL - SOCIAL NETWORK
CREATE (URL1)-[:RELATED]->(YouTubeVideo1)
CREATE (URL2)-[:RELATED]->(YouTubeVideo2)
CREATE (URL3)-[:RELATED]->(Track1)
// YOUTUBEVIDEO - YOUTUBECHANNEL
CREATE (YouTubeVideo1)-[:AUTHOR]->(Channel1)
CREATE (YouTubeVideo2)-[:AUTHOR]->(Channel2)
// YOUTUBECHANNEL - GOOGLE+
CREATE (Channel2)-[:LINK]->(Googleuser1)
CREATE (Channel1)-[:LINK]->(Googleuser2)
识别推文中的URL
推文可以包含URL,也可以不包含,这个查询会提取我们感兴趣的部分。
MATCH (n1:URL)<-[:PUBLISHED]-(n2:TWITTER)
RETURN n2.text
识别链接社交网络的URL
这些URL链接到社交网络或其他平台,我们只对社交网络感兴趣。
MATCH (n1:URL:SOCIALNETWORK)<-[:PUBLISHED]-(n2:TWITTER)
RETURN n2.text
识别YouTube社交网络
为了提取最大量的信息,我们希望URL指向YouTube。
MATCH (n1:URL:SOCIALNETWORK)<-[:PUBLISHED]-(n2:TWITTER)
WITH n1 AS URL
MATCH (n3:YOUTUBE)<-[:RELATED]-(URL)
RETURN distinct n3
识别YouTube上的不当视频
按年龄限制进行过滤,这些视频在某种程度上很重要。
MATCH (n1:URL:SOCIALNETWORK)<-[:PUBLISHED]-(n2:TWITTER)
WITH n1 AS URL
MATCH (n3:YOUTUBE)<-[:RELATED]-(URL)
WITH n3 as VIDEO
MATCH (VIDEO {ytAgeRestricted:"true"})
RETURN distinct VIDEO
返回账户信息
MATCH (n5:GOOGLEPLUS)<--(n4:CHANNELYOUTUBE)<--(n3:YOUTUBE {ytAgeRestricted:"true"})<-[:RELATED]-(n2:URL:SOCIALNETWORK)<-[:PUBLISHED]-(n1:TWITTER)
RETURN n1,n3,n4,n5
此页面有帮助吗?