报价欺诈

1. 引言

保险报价欺诈是指在获取保险报价过程中提供虚假或误导性信息的欺骗行为。从事这种欺诈活动的个人或组织会故意篡改个人详细信息、资产或理赔历史等数据,以获得更低的保险费。

“研究显示,一半的英国消费者认为撒谎无伤大雅”
—— 律商联讯

通过歪曲自身情况,他们旨在欺骗保险公司,使其提供比通常情况下更优惠的费率或承保范围。保险报价欺诈不仅欺骗了保险公司,还可能通过推高保费来影响其他投保人。保险公司采取各种措施,如数据验证和交叉核对,来检测和预防此类欺诈行为。

2. 场景

保险报价欺诈对全球保险公司来说是一个重大的业务问题。根据行业报告,欺诈活动每年给保险业造成数十亿美元的损失。保险信息协会最近的一项研究显示,大约 10-20% 的保险理赔是欺诈性的,而报价是欺诈可能发生的初始阶段。其影响是深远的,损害了保险公司的盈利能力,增加了诚实投保人的保费,并侵蚀了行业信任。检测和预防保险报价欺诈已成为保险公司的首要任务,促使他们采用先进技术和数据分析方法来缓解这一普遍问题。

3. 解决方案

在打击保险报价欺诈方面,企业正在转向先进技术以寻求有效的解决方案。Neo4j 就是其中一项技术,它是一个图数据库,提供强大的数据建模和分析能力。通过利用 Neo4j,保险公司可以连接和分析数据中复杂的关系,发现模式,检测欺诈网络,并增强欺诈检测算法。Neo4j 的图基方法使保险公司能够高效识别欺诈活动、降低风险并提高打击保险报价欺诈的整体运营效率。

3.1. 图数据库如何提供帮助?

  1. 实时欺诈检测:Neo4j 的实时数据处理能力通过快速识别报价、保单和理赔中的异常和可疑模式,帮助保险公司检测和预防欺诈。

  2. 图数据建模:Neo4j 通过将数据建模为图,帮助保险公司更准确地检测和预防欺诈,这使得能够识别投保人、理赔、代理人及欺诈指标等实体之间隐藏的关系和模式。

  3. 网络分析:Neo4j 的图算法和遍历能力可以帮助保险公司识别涉及多份保单、理赔人或代理人的欺诈网络和模式。

4. 建模

本节将展示在示例图上运行 Cypher 查询的示例。目的是说明查询的样子,并提供如何在实际设置中构建数据的指南。我们将在包含多个节点的小型图上进行此操作。示例图将基于以下数据模型

4.1. 数据模型

insurance quote fraud data model

4.1.1 必需字段

以下是入门所需的字段

Quote 节点

  • firstname:包含申请人的名字

  • surname:包含申请人的姓氏

  • dob:包含申请人的出生日期

  • postcode:包含申请人的邮政编码

  • passport:包含申请人的护照号码

  • change_date:报价或申请提交的日期时间

在报价/申请过程中,您可以向此节点添加属性,以监控任何您希望的内容。在我的数据模型和测试数据中,您可能会注意到一个 change_info 属性。请注意,此属性仅用于演示目的,以便更容易理解自上次报价以来所做的任何更改。

NEXT_QUOTE 关系

  • diff_seconds:这是上次报价与当前报价之间的时间差(以秒为单位)。

4.2. 演示数据

以下 Cypher 语句将在 Neo4j 数据库中创建示例图

// Create quote nodes
CREATE (q1:Quote {firstname: "Micheal", surname: "Down", dob: date("1988-02-02"), postcode: "YO30 7DW", longitude: -1.0927426, latitude: 53.96372145, passport: 584699531, created_date: datetime()-duration({years: 1, months: 1, minutes: 9}), change_info: "first quote"})
CREATE (q2:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "YO30 7DW", longitude: -1.0927426, latitude: 53.96372145, passport: 584699531, created_date: datetime()-duration({years: 1, months: 1, minutes: 4}), change_info: "name change ea to ae"})
CREATE (q3:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "YO30 7DW", longitude: -1.0927426, latitude: 53.96372145, passport: 584699531, created_date: datetime()-duration({years: 1, months: 1, minutes: 3}), change_info: "postcode_change"})
CREATE (q4:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "PA62 6AA", longitude: -5.851487, latitude: 56.359258, passport: 584699530, created_date: datetime()-duration({years: 1, months: 1}), change_info: "passport number"})
CREATE (q5:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "PA62 6AA", longitude: -5.851487, latitude: 56.359258, passport: 584699530, created_date: datetime()-duration({months: 1}), change_info: "quote 1yr later"})
CREATE (q6:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "PA62 6AA", longitude: -5.851487, latitude: 56.359258, passport: 584699530, created_date: datetime(), change_info: "quote 1m later"})


// Create all relationships
CREATE (q1)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q1.created_date, q2.created_date).seconds}]->(q2)
CREATE (q2)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q2.created_date, q3.created_date).seconds}]->(q3)
CREATE (q3)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q3.created_date, q4.created_date).seconds}]->(q4)
CREATE (q4)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q4.created_date, q5.created_date).seconds}]->(q5)
CREATE (q5)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q5.created_date, q6.created_date).seconds}]->(q6)

4.3. Neo4j 方案

如果您调用

// Show neo4j scheme
CALL db.schema.visualization()

您将看到以下响应

insurance quote fraud data schema

5. Cypher 查询

5.1. 查看链中的所有报价

在此查询中,我们将根据以下要求识别报价链

一个报价连接到另一个报价

// View all quotes
MATCH path=()-[r:NEXT_QUOTE]->()
RETURN path;

5.2. 根据时间差分割 Quote

在报价领域,报价之间的时间是一个非常重要的因素。请想象以下场景。

购买汽车保险:通常情况下,汽车保险需要每年购买,保期为 12 个月。因此,在比较去年的报价和新报价时,可能会出现显著差异。

  • 无理赔奖金——(希望)比上一年多 1 年。

  • 车龄——会老 1 岁

  • 里程——我们预计会更高。此外,这还取决于一个人的年龄、工作、地址等因素。

为了识别报价中的差异,我们应该将它们划分为小的时间间隔,就像一个网络会话一样。

在此查询中,我们将根据以下要求识别报价链

  • 所有报价发生的时间都在彼此相距 3600 秒(或 1 小时)之内

// Split Quote Chain
MATCH path=()-[rel:NEXT_QUOTE]->()
WHERE rel.diff_seconds < 3600
RETURN path;

此查询的问题在于,当以表格形式查看时,它会显示消息 Started streaming 3 records。实质上,Neo4j 返回了 3 条符合路径条件的独立记录,并将其发送到浏览器进行显示。虽然这在视觉上可能很吸引人,但在分析整个路径时却会带来问题。这将在下一个查询中得到解决。

insurance quote fraud data stream 3 records

5.3. 单个 Quote 路径记录

这是对之前 Cypher 查询的升级版本。它具有高级模式匹配功能,并保证只返回一条记录。它保留了与上一版本相同的特性。

  • 单链

  • 所有报价都在彼此相距 1 小时之内

  • 所有报价都发生在最近 1000 天内

  • 返回 1 条记录以供进一步分析

MATCH path=(firstQ)-[r:NEXT_QUOTE*..1000]->(lastQ)
WHERE

    // Path termination condition (first)
    (not exists{ (firstQ)<-[:NEXT_QUOTE]-() } or exists{ (firstQ)<-[x:NEXT_QUOTE]-() where x.diff_seconds >= 3600 } )
    AND

    // Path termination condition (last)
    (not exists{ (lastQ)-[:NEXT_QUOTE]->() } or exists{ (lastQ)-[x:NEXT_QUOTE]->() where x.diff_seconds >= 3600 } )
    AND

    // No gaps condition (if you remove this condition then gaps are allowed and you get spurious longer chains that verify the end of path but not the max diff condition)
    all(x in relationships(path) where x.diff_seconds < 3600 )
    AND

    // Filter based on quote in the last N days
    firstQ.created_date > datetime() - Duration({days: 1000})
    AND

    // Where there are more than one quote in the chain otherwise there is nothing to compare against
    length(path)> 1

RETURN path

现在您可以再次从表格视图中看到,只返回了一条记录

insurance quote fraud data stream 1 record

5.4. 创建带有分数的 SIMILARITY 关系

为了给报价打分,我们必须建立一个连接,整合所有报价属性,以便进行单独和整体评估。

为了给报价打分,我们必须建立一个连接,整合所有报价属性,以便进行单独和整体评估。

在此查询中,我们将根据以下要求识别报价链

  • 获取所有在最近 1000 天内的 Quote 节点的完整 Quote

  • 获取所有 Quote 节点(每个单独报价之间的时间差不超过 1 小时)的完整 Quote

  • 计算属性分数

  • Quote 链写入新的 SIMILARITY 关系

// Create Similarity Relationship
MATCH path=(firstQ)-[r:NEXT_QUOTE*..1000]->(lastQ)
WHERE

    // Path termination condition (first)
    (NOT EXISTS{ (firstQ)<-[:NEXT_QUOTE]-() } OR EXISTS{ (firstQ)<-[x:NEXT_QUOTE]-() WHERE x.diff_seconds >= 3600 } )
    AND

    // Path termination condition (last)
    (NOT EXISTS{ (lastQ)-[:NEXT_QUOTE]->() } OR EXISTS{ (lastQ)-[x:NEXT_QUOTE]->() WHERE x.diff_seconds >= 3600 } )
    AND

    // No gaps condition (if you remove this condition then gaps are allowed and you get spurious longer chains that verify the end of path but not the max diff condition)
    ALL(x IN relationships(path) WHERE x.diff_seconds < 3600 )
    AND

    // Filter based on quote in the last N days
    firstQ.created_date > datetime() - duration({days: 1000})
    AND

    // Where there are more than one quote in the chain otherwise there is nothing to compare against
    length(path)> 1

WITH nodes(path) as nodes

// Iterate over the list in chain order we create an array [0,1,2,3... length - 2]
UNWIND range(0,size(nodes)-2) as index

// For each position (index) in the list take the node at that position (current) and the rest
WITH nodes[index] as current, nodes[index+1..size(nodes)] as rest

// Iterate over the rest keeping current to get all pairs of nodes without repetitions
UNWIND rest as subsequent

WITH current, subsequent,

// Build up similarity scores for all properties
// Strings
apoc.text.levenshteinSimilarity(current.firstname, subsequent.firstname) AS firstname,
apoc.text.levenshteinSimilarity(current.surname, subsequent.surname) AS surname,
apoc.text.levenshteinSimilarity(current.postcode, subsequent.postcode) AS postcode,

// Numbers
(current.passport - subsequent.passport) AS passport_number,
apoc.text.levenshteinSimilarity(toString(current.passport), toString(subsequent.passport)) AS passport_similarity,

// Dates
duration.inDays(current.dob, subsequent.dob).days AS dob,

// Location
toInteger(point.distance(point({longitude: current.longitude, latitude: current.latitude}), point({longitude: subsequent.longitude, latitude: subsequent.latitude}))) AS location

// Create :SIMILARITY Relationship
CREATE (current)-[:SIMILARITY {
    // Add change string for simplicity
    change: subsequent.change_info,

    // Strings
    firstname: firstname,
    surname: surname,
    postcode: postcode,

    // Numbers
    passport_number: passport_number,
    passport_similarity: passport_similarity,

    // Dates
    dob: dob,

    // Location
    location: location,

    // Calulcated Similarity Score
    similarity_score: (firstname + surname + postcode + passport_similarity ) / 4
}]->(subsequent)

查看新创建的关系

// View all SIMILARITY relationships
MATCH path=()-[r:SIMILARITY]->()
RETURN path;

5.5. 静态评分

在此查询中,我们将根据以下要求识别报价链

  • 根据 5.4 查询中的 SIMILARITY 关系计算分数,然后将其返回给用户。

// Calculate static Fraud Score
MATCH path=(a)-[r:SIMILARITY]->(b)
WHERE a.created_date > datetime() - Duration({days: 1000})
RETURN sum(r.similarity_score)/COUNT(relationships(path)) AS Similarity,
CASE
    WHEN COUNT(relationships(p)) = 0 THEN 'Additional Quote Needs Adding'
    WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(path)) * 100) > 70 THEN 'LOW'
    WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(path)) * 100) < 70 AND toInteger(sum(r.similarity_score)/COUNT(relationships(path)) * 100) > 50 THEN 'MEDIUM'
    WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(path)) * 100) < 50 THEN 'HIGH'
END AS Fraud_Level

5.6. 实时欺诈评分

对于我们最后一个 Cypher 查询,我们将向 Neo4j 添加一个新的报价,并运行欺诈分数计算,以获得显示相似度分数的实时响应。此代码可以在 API 后面使用,或直接在 Cypher 中使用,这将提供欺诈的即时指示。

在此查询中,我们将根据以下要求识别报价链

  • 获取上一个报价

  • 在链的末端创建一个新的 Quote

  • 获取所有在最近 1000 天内的 Quote 节点的完整 Quote

  • 获取所有 Quote 节点(每个单独报价之间的时间差不超过 1 小时)的完整 Quote

  • 计算属性分数

  • Quote 链写入新的 SIMILARITY 关系

  • 计算分数,然后返回给用户

// // // Realtime Quote Score // // //

// Get last `Quote` node in quote chain
MATCH (last:Quote)
WITH last
ORDER BY last.created_date DESC
LIMIT 1
WITH last
// Create new quote node
MERGE (current:Quote {
    change_info: "changed dob",
    created_date: datetime(),
    dob: Date("1978-11-30"),
    firstname: "Michael",
    surname: "Down",
    latitude: 56.359258,
    longitude: -5.851487,
    passport: 584699530,
    postcode: "PA62 6AA"
})
WITH last, current, duration.inSeconds(DateTime(last.created_date), DateTime(current.created_date)) AS time
// Create relationship
CREATE (last)-[:NEXT_QUOTE {diff_seconds: time.seconds}]->(current)

WITH current

// Minimum comparison
MATCH path=(firstQ)-[r:NEXT_QUOTE*0..100]->(current)
WHERE

    // Path termination condition (first)
    (NOT EXISTS{ (firstQ)<-[:NEXT_QUOTE]-() } OR EXISTS{ (firstQ)<-[x:NEXT_QUOTE]-() WHERE x.diff_seconds >= 3600 } )
    AND

    // Path termination condition (last)
    (NOT EXISTS{ (lastQ)-[:NEXT_QUOTE]->() } OR EXISTS{ (lastQ)-[x:NEXT_QUOTE]->() WHERE x.diff_seconds >= 3600 } )
    AND

    // No gaps condition (if you remove this condition then gaps are allowed and you get spurious longer chains that verify the end of path but not the max diff condition)
    ALL(x IN relationships(path) WHERE x.diff_seconds < 3600 )
    AND

    // Filter based on quote in the last N days
    firstQ.created_date > datetime() - duration({days: 1000})
    AND

    // Where there are more than one quote in the chain otherwise there is nothing to compare against
    length(path)> 1

//let's keep just the nodes in the chain
UNWIND nodes(path)[0..-1] as subsequent

WITH current, subsequent,

// Build up similarity scores for all properties
// Strings
apoc.text.levenshteinSimilarity(current.firstname, subsequent.firstname) AS firstname,
apoc.text.levenshteinSimilarity(current.surname, subsequent.surname) AS surname,
apoc.text.levenshteinSimilarity(current.postcode, subsequent.postcode) AS postcode,

// Numbers
(current.passport - subsequent.passport) AS passport_number,
apoc.text.levenshteinSimilarity(toString(current.passport), toString(subsequent.passport)) AS passport_similarity,

// Dates
duration.inDays(current.dob, subsequent.dob).days AS dob,

// Location
toInteger(point.distance(point({longitude: current.longitude, latitude: current.latitude}), point({longitude: subsequent.longitude, latitude: subsequent.latitude}))) AS location

// Create :SIMILARITY Relationship
CREATE (current)-[:SIMILARITY {
    // Add change string for simplicity
    change: subsequent.change_info,

    // Strings
    firstname: firstname,
    surname: surname,
    postcode: postcode,

    // Numbers
    passport_number: passport_number,
    passport_similarity: passport_similarity,

    // Dates
    dob: dob,

    // Location
    location: location,

    // Calulcated Similarity Score
    similarity_score: (firstname + surname + postcode + passport_similarity ) / 4
}]->(subsequent)

WITH *

// Quote - 3 - Calculate Fraud Score
MATCH p=(a)-[r:SIMILARITY]->(b)
WHERE a.created_date > datetime() - Duration({days: 1000})
RETURN avg(r.similarity_score) AS Similarity,
CASE
    WHEN COUNT(relationships(p)) = 0 THEN 'Run Agiain'
    WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(p)) * 100) > 70 THEN 'LOW'
    WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(p)) * 100) < 70 AND toInteger(sum(r.similarity_score)/COUNT(relationships(p)) * 100) > 50 THEN 'MEDIUM'
    WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(p)) * 100) < 50 THEN 'HIGH'
END AS Fraud_Level;
© . All rights reserved.