GraphGists

推荐系统的多维方法

引言

说到推荐系统,我总是想起 80 年代情景喜剧《干杯酒吧》的主题曲。它是这样唱的:

"你想去一个人们互相了解的地方,人们都一样" - 基于内容

"你想待在一个你能看到的地方,我们的烦恼都一样" - 协同过滤

"从所有烦恼中休息一下,肯定会有很大帮助。你不想离开吗?" ……去哪儿?和谁一起? - 情境感知。

在这里,我想介绍一个使用 Neo4j 基于情境信息的餐厅推荐方法。

什么是情境?

情境是可用于描述实体状况的任何信息。实体是指与用户和应用程序之间的交互相关的个人、地点或对象,包括用户和应用程序本身 (AK Dey & GD Abowd - ACM Conference on Human Factors in Computer Systems (CHI 2000), Vol.5/Iss.1, pp.4-7, 2001)。

最常见的情境类型是身份(“谁”)、活动(“什么”)、时间(“何时”)和地点(“何地”),人们可以使用这些信息来确定情况“为何”发生。

传统的推荐系统使用两个实体(二维)用户和物品。通过使用多维方法包含情境信息,可以使这种推荐更加个性化。

应用于餐厅推荐

https://archive.ics.uci.edu/ml/datasets/Restaurant+%26+consumer+data 导入了餐厅和消费者数据(与墨西哥相关)。

用户数据包括位置、习惯(如吸烟、饮酒)、婚姻状况、菜系、预算以及许多其他信息。对于用户,我们可以将每个情境(如吸烟、菜系、预算)视为一个维度。

5973f520 38b5 11e7 9c6d 85b91fa95fc6

餐厅数据包括位置、菜系、吸烟、酒精、价格及其他信息。对于餐厅,我们也将有多个维度,以便根据用户偏好提供更好的餐厅选择。

下方展示了三维方法的图示说明

fedfa15c 37ee 11e7 918c cb7ae1b62915

用户立方体中的三个维度是用户、吸烟、菜系,而餐厅立方体中的三个维度是餐厅、吸烟、菜系。用户立方体的彩色切片代表喜欢相同菜系但在吸烟偏好上有所差异的用户。这些选择将用于选择与用户所选偏好匹配的餐厅。餐厅立方体的彩色切片展示了与用户所选偏好匹配的餐厅。

同样的方法也适用于更高的维度。

数据模型

1d2ba07a 37ef 11e7 867d c8a133df4563

元数据

3f2e66da 37ef 11e7 831e e9be1b9b0dc7

设置

四个数据文件:Users_50.csv(选定的 50 位用户)UCuisine.csv(用户/菜系)Restaurants.csv RestCuisine.csv(餐厅/菜系)

用于设置数据库的 Cypher 查询

使用此查询可以更好地可视化 USER_PROFILE 和 RESTAURANT 路径

MATCH (c)-[r:USER_PROFILE|RESTAURANT]->(n)-[]->(p)
WHERE n.uid IN['U1001', 'U1002', 'U1003'] or  n.pid IN [132609, 132613, 132630]
RETURN c, n, p LIMIT 20;
091a7baa 38fe 11e7 8396 21f181b1a808

左半部分是餐厅,右半部分是用户档案。

根据用户偏好推荐餐厅

三个偏好:墨西哥食物、非吸烟和中等价格。

//Users with selected choices..............

MATCH (c)-[]->(n)-[:CUISINE]->(r)-[:LIKES]->(t:Food {name: "Mexican"})
WITH COLLECT (n) as nodes, t
UNWIND nodes as n1

MATCH (c)-[]->(n1)-[:HABITS]->(q)-[:SMOKER]->(v:Smoker {attr1: "false"})
WITH COLLECT (n1) as nodes, t, v
UNWIND nodes as n2

MATCH (c)-[]->(n2)-[:HABITS]->(q)-[:BUDGET]->(v1:Budget {attr13: "medium"})
WITH v1, t, v MATCH (c)-[]->(n)-[:CUISINE]->(r)-[:LIKES]->(t:Food {name: "Mexican"})
WITH COLLECT (n) as nodes, t
UNWIND nodes as n1

MATCH (c)-[]->(n1)-[:HABITS]->(q)-[:SMOKER]->(v:Smoker {attr1: "false"})
WITH COLLECT (n1) as nodes, t, v
UNWIND nodes as n2

MATCH (c)-[]->(n2)-[:HABITS]->(q)-[:BUDGET]->(v1:Budget {attr13: "medium"})
WITH v1, t

//WITH v1, t, v1, n2
//RETURN n2.uid as User, t.name as Cuisine, v.attr1 as Smoker, v1.attr13 as Budget;


// Find the restaurants that match the user preferences......

MATCH (c)-[]->(n2)-[:REST_CUISINE]->(p:Cusine {name: t.name})
WITH COLLECT(n2) as pn, v1
UNWIND pn as n3
MATCH (c)-[]->(n3)-[:FEATURES]->(q1:Features {price: v1.attr13, smoking: "none"})
WITH COLLECT(n3) as pn

UNWIND pn as n4
WITH DISTINCT n4
MATCH (c)-[]->(n4)-[:ADDRESS]-(k)
RETURN n4.name as Restaurant,  k.city as City;

c770ae5a 398d 11e7 98fc 3445805c0502cec737f0 398d 11e7 9fd9 4fb2b4c5b24b

四个偏好:日本食物、非吸烟、中等价格和氛围(朋友聚会)。

MATCH (c)-[]->(n)-[:CUISINE]->(r)-[:LIKES]->(t:Food {name: "Japanese"})
WITH COLLECT (n) as nodes, t
UNWIND nodes as n1

MATCH (c)-[]->(n1)-[:HABITS]->(q)-[:SMOKER]->(v:Smoker {attr1: "false"})
WITH COLLECT (n1) as nodes, t, v
UNWIND nodes as n2

MATCH (c)-[]->(n2)-[:HABITS]->(q)-[:BUDGET]->(v1:Budget {attr13: "medium"})
WITH COLLECT (n2) as nodes, t, v, v1
UNWIND nodes as n3

MATCH (c)-[]->(n3)-[:HABITS]->(q)-[:AMBIENCE]->(v2:Ambnce {attr4: "friends"})
WITH COLLECT (n3) as nodes, t, v, v1, v2
UNWIND nodes as n4

WITH t, v1

//WITH v, t, v1, v2, n4
//RETURN n4.uid as User, t.name as Cuisine, v.attr1 as Smoker, v1.attr13 as Budget, v2.attr4 as Ambience;


MATCH (c)-[]->(n2)-[:REST_CUISINE]->(p:Cusine {name: t.name})
WITH COLLECT(n2) as pn, v1
UNWIND pn as n3
MATCH (c)-[]->(n3)-[:FEATURES]->(q1:Features {price: v1.attr13, smoking: "none", ambience: "familiar"})
WITH COLLECT(n3) as pn2

UNWIND pn2 as n4
WITH DISTINCT n4
MATCH (c)-[]->(n4)-[:ADDRESS]-(k)

RETURN n4.name as Restaurant, k.city as City;

a81da9b4 398c 11e7 8a97 f8e1978d5ccc6740af64 39c7 11e7 9bd8 bede1a6434a5

结论...

根据用户偏好提供了更个性化的餐厅推荐。这里的一个问题是相关数据集的可用性。数据集提供的信息越多,对分析和结果就越有帮助。

© . All rights reserved.