报价欺诈
1. 引言
保险报价欺诈是指在获取保险报价过程中提供虚假或误导性信息的欺骗行为。从事这种欺诈活动的个人或组织会故意篡改个人详细信息、资产或理赔历史等数据,以获得更低的保险费。
“研究显示,一半的英国消费者认为撒谎无伤大雅”
通过歪曲自身情况,他们旨在欺骗保险公司,使其提供比通常情况下更优惠的费率或承保范围。保险报价欺诈不仅欺骗了保险公司,还可能通过推高保费来影响其他投保人。保险公司采取各种措施,如数据验证和交叉核对,来检测和预防此类欺诈行为。
2. 场景
保险报价欺诈对全球保险公司来说是一个重大的业务问题。根据行业报告,欺诈活动每年给保险业造成数十亿美元的损失。保险信息协会最近的一项研究显示,大约 10-20% 的保险理赔是欺诈性的,而报价是欺诈可能发生的初始阶段。其影响是深远的,损害了保险公司的盈利能力,增加了诚实投保人的保费,并侵蚀了行业信任。检测和预防保险报价欺诈已成为保险公司的首要任务,促使他们采用先进技术和数据分析方法来缓解这一普遍问题。
3. 解决方案
在打击保险报价欺诈方面,企业正在转向先进技术以寻求有效的解决方案。Neo4j 就是其中一项技术,它是一个图数据库,提供强大的数据建模和分析能力。通过利用 Neo4j,保险公司可以连接和分析数据中复杂的关系,发现模式,检测欺诈网络,并增强欺诈检测算法。Neo4j 的图基方法使保险公司能够高效识别欺诈活动、降低风险并提高打击保险报价欺诈的整体运营效率。
4. 建模
本节将展示在示例图上运行 Cypher 查询的示例。目的是说明查询的样子,并提供如何在实际设置中构建数据的指南。我们将在包含多个节点的小型图上进行此操作。示例图将基于以下数据模型
4.1. 数据模型
4.1.1 必需字段
以下是入门所需的字段
Quote
节点
-
firstname
:包含申请人的名字 -
surname
:包含申请人的姓氏 -
dob
:包含申请人的出生日期 -
postcode
:包含申请人的邮政编码 -
passport
:包含申请人的护照号码 -
change_date
:报价或申请提交的日期时间
在报价/申请过程中,您可以向此节点添加属性,以监控任何您希望的内容。在我的数据模型和测试数据中,您可能会注意到一个 change_info
属性。请注意,此属性仅用于演示目的,以便更容易理解自上次报价以来所做的任何更改。
NEXT_QUOTE
关系
-
diff_seconds
:这是上次报价与当前报价之间的时间差(以秒为单位)。
4.2. 演示数据
以下 Cypher 语句将在 Neo4j 数据库中创建示例图
// Create quote nodes
CREATE (q1:Quote {firstname: "Micheal", surname: "Down", dob: date("1988-02-02"), postcode: "YO30 7DW", longitude: -1.0927426, latitude: 53.96372145, passport: 584699531, created_date: datetime()-duration({years: 1, months: 1, minutes: 9}), change_info: "first quote"})
CREATE (q2:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "YO30 7DW", longitude: -1.0927426, latitude: 53.96372145, passport: 584699531, created_date: datetime()-duration({years: 1, months: 1, minutes: 4}), change_info: "name change ea to ae"})
CREATE (q3:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "YO30 7DW", longitude: -1.0927426, latitude: 53.96372145, passport: 584699531, created_date: datetime()-duration({years: 1, months: 1, minutes: 3}), change_info: "postcode_change"})
CREATE (q4:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "PA62 6AA", longitude: -5.851487, latitude: 56.359258, passport: 584699530, created_date: datetime()-duration({years: 1, months: 1}), change_info: "passport number"})
CREATE (q5:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "PA62 6AA", longitude: -5.851487, latitude: 56.359258, passport: 584699530, created_date: datetime()-duration({months: 1}), change_info: "quote 1yr later"})
CREATE (q6:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "PA62 6AA", longitude: -5.851487, latitude: 56.359258, passport: 584699530, created_date: datetime(), change_info: "quote 1m later"})
// Create all relationships
CREATE (q1)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q1.created_date, q2.created_date).seconds}]->(q2)
CREATE (q2)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q2.created_date, q3.created_date).seconds}]->(q3)
CREATE (q3)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q3.created_date, q4.created_date).seconds}]->(q4)
CREATE (q4)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q4.created_date, q5.created_date).seconds}]->(q5)
CREATE (q5)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q5.created_date, q6.created_date).seconds}]->(q6)
5. Cypher 查询
5.1. 查看链中的所有报价
在此查询中,我们将根据以下要求识别报价链
一个报价连接到另一个报价
// View all quotes
MATCH path=()-[r:NEXT_QUOTE]->()
RETURN path;
5.2. 根据时间差分割 Quote
链
在报价领域,报价之间的时间是一个非常重要的因素。请想象以下场景。
购买汽车保险:通常情况下,汽车保险需要每年购买,保期为 12 个月。因此,在比较去年的报价和新报价时,可能会出现显著差异。
-
无理赔奖金——(希望)比上一年多 1 年。
-
车龄——会老 1 岁
-
里程——我们预计会更高。此外,这还取决于一个人的年龄、工作、地址等因素。
为了识别报价中的差异,我们应该将它们划分为小的时间间隔,就像一个网络会话一样。
在此查询中,我们将根据以下要求识别报价链
-
所有报价发生的时间都在彼此相距 3600 秒(或 1 小时)之内
// Split Quote Chain
MATCH path=()-[rel:NEXT_QUOTE]->()
WHERE rel.diff_seconds < 3600
RETURN path;
此查询的问题在于,当以表格形式查看时,它会显示消息 Started streaming 3 records
。实质上,Neo4j 返回了 3 条符合路径条件的独立记录,并将其发送到浏览器进行显示。虽然这在视觉上可能很吸引人,但在分析整个路径时却会带来问题。这将在下一个查询中得到解决。

5.3. 单个 Quote
路径记录
这是对之前 Cypher 查询的升级版本。它具有高级模式匹配功能,并保证只返回一条记录。它保留了与上一版本相同的特性。
-
单链
-
所有报价都在彼此相距 1 小时之内
-
所有报价都发生在最近 1000 天内
-
返回 1 条记录以供进一步分析
MATCH path=(firstQ)-[r:NEXT_QUOTE*..1000]->(lastQ)
WHERE
// Path termination condition (first)
(not exists{ (firstQ)<-[:NEXT_QUOTE]-() } or exists{ (firstQ)<-[x:NEXT_QUOTE]-() where x.diff_seconds >= 3600 } )
AND
// Path termination condition (last)
(not exists{ (lastQ)-[:NEXT_QUOTE]->() } or exists{ (lastQ)-[x:NEXT_QUOTE]->() where x.diff_seconds >= 3600 } )
AND
// No gaps condition (if you remove this condition then gaps are allowed and you get spurious longer chains that verify the end of path but not the max diff condition)
all(x in relationships(path) where x.diff_seconds < 3600 )
AND
// Filter based on quote in the last N days
firstQ.created_date > datetime() - Duration({days: 1000})
AND
// Where there are more than one quote in the chain otherwise there is nothing to compare against
length(path)> 1
RETURN path
现在您可以再次从表格视图中看到,只返回了一条记录

5.4. 创建带有分数的 SIMILARITY
关系
为了给报价打分,我们必须建立一个连接,整合所有报价属性,以便进行单独和整体评估。
为了给报价打分,我们必须建立一个连接,整合所有报价属性,以便进行单独和整体评估。
在此查询中,我们将根据以下要求识别报价链
-
获取所有在最近 1000 天内的
Quote
节点的完整Quote
链 -
获取所有
Quote
节点(每个单独报价之间的时间差不超过 1 小时)的完整Quote
链 -
计算属性分数
-
为
Quote
链写入新的SIMILARITY
关系
// Create Similarity Relationship
MATCH path=(firstQ)-[r:NEXT_QUOTE*..1000]->(lastQ)
WHERE
// Path termination condition (first)
(NOT EXISTS{ (firstQ)<-[:NEXT_QUOTE]-() } OR EXISTS{ (firstQ)<-[x:NEXT_QUOTE]-() WHERE x.diff_seconds >= 3600 } )
AND
// Path termination condition (last)
(NOT EXISTS{ (lastQ)-[:NEXT_QUOTE]->() } OR EXISTS{ (lastQ)-[x:NEXT_QUOTE]->() WHERE x.diff_seconds >= 3600 } )
AND
// No gaps condition (if you remove this condition then gaps are allowed and you get spurious longer chains that verify the end of path but not the max diff condition)
ALL(x IN relationships(path) WHERE x.diff_seconds < 3600 )
AND
// Filter based on quote in the last N days
firstQ.created_date > datetime() - duration({days: 1000})
AND
// Where there are more than one quote in the chain otherwise there is nothing to compare against
length(path)> 1
WITH nodes(path) as nodes
// Iterate over the list in chain order we create an array [0,1,2,3... length - 2]
UNWIND range(0,size(nodes)-2) as index
// For each position (index) in the list take the node at that position (current) and the rest
WITH nodes[index] as current, nodes[index+1..size(nodes)] as rest
// Iterate over the rest keeping current to get all pairs of nodes without repetitions
UNWIND rest as subsequent
WITH current, subsequent,
// Build up similarity scores for all properties
// Strings
apoc.text.levenshteinSimilarity(current.firstname, subsequent.firstname) AS firstname,
apoc.text.levenshteinSimilarity(current.surname, subsequent.surname) AS surname,
apoc.text.levenshteinSimilarity(current.postcode, subsequent.postcode) AS postcode,
// Numbers
(current.passport - subsequent.passport) AS passport_number,
apoc.text.levenshteinSimilarity(toString(current.passport), toString(subsequent.passport)) AS passport_similarity,
// Dates
duration.inDays(current.dob, subsequent.dob).days AS dob,
// Location
toInteger(point.distance(point({longitude: current.longitude, latitude: current.latitude}), point({longitude: subsequent.longitude, latitude: subsequent.latitude}))) AS location
// Create :SIMILARITY Relationship
CREATE (current)-[:SIMILARITY {
// Add change string for simplicity
change: subsequent.change_info,
// Strings
firstname: firstname,
surname: surname,
postcode: postcode,
// Numbers
passport_number: passport_number,
passport_similarity: passport_similarity,
// Dates
dob: dob,
// Location
location: location,
// Calulcated Similarity Score
similarity_score: (firstname + surname + postcode + passport_similarity ) / 4
}]->(subsequent)
查看新创建的关系
// View all SIMILARITY relationships
MATCH path=()-[r:SIMILARITY]->()
RETURN path;
5.5. 静态评分
在此查询中,我们将根据以下要求识别报价链
-
根据 5.4 查询中的
SIMILARITY
关系计算分数,然后将其返回给用户。
// Calculate static Fraud Score
MATCH path=(a)-[r:SIMILARITY]->(b)
WHERE a.created_date > datetime() - Duration({days: 1000})
RETURN sum(r.similarity_score)/COUNT(relationships(path)) AS Similarity,
CASE
WHEN COUNT(relationships(p)) = 0 THEN 'Additional Quote Needs Adding'
WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(path)) * 100) > 70 THEN 'LOW'
WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(path)) * 100) < 70 AND toInteger(sum(r.similarity_score)/COUNT(relationships(path)) * 100) > 50 THEN 'MEDIUM'
WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(path)) * 100) < 50 THEN 'HIGH'
END AS Fraud_Level
5.6. 实时欺诈评分
对于我们最后一个 Cypher 查询,我们将向 Neo4j 添加一个新的报价,并运行欺诈分数计算,以获得显示相似度分数的实时响应。此代码可以在 API 后面使用,或直接在 Cypher 中使用,这将提供欺诈的即时指示。
在此查询中,我们将根据以下要求识别报价链
-
获取上一个报价
-
在链的末端创建一个新的
Quote
-
获取所有在最近 1000 天内的
Quote
节点的完整Quote
链 -
获取所有
Quote
节点(每个单独报价之间的时间差不超过 1 小时)的完整Quote
链 -
计算属性分数
-
为
Quote
链写入新的SIMILARITY
关系 -
计算分数,然后返回给用户
// // // Realtime Quote Score // // //
// Get last `Quote` node in quote chain
MATCH (last:Quote)
WITH last
ORDER BY last.created_date DESC
LIMIT 1
WITH last
// Create new quote node
MERGE (current:Quote {
change_info: "changed dob",
created_date: datetime(),
dob: Date("1978-11-30"),
firstname: "Michael",
surname: "Down",
latitude: 56.359258,
longitude: -5.851487,
passport: 584699530,
postcode: "PA62 6AA"
})
WITH last, current, duration.inSeconds(DateTime(last.created_date), DateTime(current.created_date)) AS time
// Create relationship
CREATE (last)-[:NEXT_QUOTE {diff_seconds: time.seconds}]->(current)
WITH current
// Minimum comparison
MATCH path=(firstQ)-[r:NEXT_QUOTE*0..100]->(current)
WHERE
// Path termination condition (first)
(NOT EXISTS{ (firstQ)<-[:NEXT_QUOTE]-() } OR EXISTS{ (firstQ)<-[x:NEXT_QUOTE]-() WHERE x.diff_seconds >= 3600 } )
AND
// Path termination condition (last)
(NOT EXISTS{ (lastQ)-[:NEXT_QUOTE]->() } OR EXISTS{ (lastQ)-[x:NEXT_QUOTE]->() WHERE x.diff_seconds >= 3600 } )
AND
// No gaps condition (if you remove this condition then gaps are allowed and you get spurious longer chains that verify the end of path but not the max diff condition)
ALL(x IN relationships(path) WHERE x.diff_seconds < 3600 )
AND
// Filter based on quote in the last N days
firstQ.created_date > datetime() - duration({days: 1000})
AND
// Where there are more than one quote in the chain otherwise there is nothing to compare against
length(path)> 1
//let's keep just the nodes in the chain
UNWIND nodes(path)[0..-1] as subsequent
WITH current, subsequent,
// Build up similarity scores for all properties
// Strings
apoc.text.levenshteinSimilarity(current.firstname, subsequent.firstname) AS firstname,
apoc.text.levenshteinSimilarity(current.surname, subsequent.surname) AS surname,
apoc.text.levenshteinSimilarity(current.postcode, subsequent.postcode) AS postcode,
// Numbers
(current.passport - subsequent.passport) AS passport_number,
apoc.text.levenshteinSimilarity(toString(current.passport), toString(subsequent.passport)) AS passport_similarity,
// Dates
duration.inDays(current.dob, subsequent.dob).days AS dob,
// Location
toInteger(point.distance(point({longitude: current.longitude, latitude: current.latitude}), point({longitude: subsequent.longitude, latitude: subsequent.latitude}))) AS location
// Create :SIMILARITY Relationship
CREATE (current)-[:SIMILARITY {
// Add change string for simplicity
change: subsequent.change_info,
// Strings
firstname: firstname,
surname: surname,
postcode: postcode,
// Numbers
passport_number: passport_number,
passport_similarity: passport_similarity,
// Dates
dob: dob,
// Location
location: location,
// Calulcated Similarity Score
similarity_score: (firstname + surname + postcode + passport_similarity ) / 4
}]->(subsequent)
WITH *
// Quote - 3 - Calculate Fraud Score
MATCH p=(a)-[r:SIMILARITY]->(b)
WHERE a.created_date > datetime() - Duration({days: 1000})
RETURN avg(r.similarity_score) AS Similarity,
CASE
WHEN COUNT(relationships(p)) = 0 THEN 'Run Agiain'
WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(p)) * 100) > 70 THEN 'LOW'
WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(p)) * 100) < 70 AND toInteger(sum(r.similarity_score)/COUNT(relationships(p)) * 100) > 50 THEN 'MEDIUM'
WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(p)) * 100) < 50 THEN 'HIGH'
END AS Fraud_Level;