文本函数

请参阅 Cypher 手册，了解内置的 Cypher 字符串函数和运算符。

使用 Levenshtein 距离比较字符串

使用 StringUtils.distance(text1, text2) 方法（Levenshtein）比较给定的 STRING 值。

RETURN apoc.text.distance("Levenshtein", "Levenstein") // 1

使用 Sørensen–Dice 系数公式比较给定字符串。

假定 Locale.ENGLISH 计算相似度

RETURN apoc.text.sorensenDiceSimilarity("belly", "jolly") // 0.5

使用显式语言环境计算相似度

RETURN apoc.text.sorensenDiceSimilarity("halım", "halim", "tr-TR") // 0.5

检查 2 个词是否可以通过 `fuzzyMatch` 模糊方式匹配

根据给定 STRING 的长度（距离：长度 < 3 为 0，长度 < 5 为 1，否则为 2），它将允许更多需要编辑的字符以匹配第二个 STRING (Levenshtein 距离)。

RETURN apoc.text.fuzzyMatch("The", "the") // true

语音比较函数

语音文本 (Soundex) 函数允许您计算给定字符串的 Soundex 编码。还有一个过程用于比较两个字符串在 Soundex 算法下听起来的相似程度。所有 Soundex 过程默认假定所使用的语言是美式英语。

apoc.text.phonetic(text STRING)

返回 STRING 中所有词语的美式英语语音 Soundex 编码。

apoc.text.doubleMetaphone(value STRING)

返回给定 STRING 值中所有词语的双音译语音编码。

apoc.text.clean(text STRING)

去除给定 STRING 中除字母数字字符外的所有内容，并将其转换为小写。

apoc.text.compareCleaned(text1 STRING, text2 STRING)

比较两个给定 STRING 值，这些值已去除除字母数字字符外的所有内容并转换为小写。

表 1. 过程
`apoc.text.phoneticDelta(text1 STRING, text2 STRING)`	返回两个给定 `STRING` 值之间的美式英语 Soundex 字符差异。

// will return 'H436'
RETURN apoc.text.phonetic('Hello, dear User!')

// will return '4'  (very similar)
RETURN apoc.text.phoneticDelta('Hello Mr Rabbit', 'Hello Mr Ribbit')

文本格式化

使用给定参数和可选参数语言格式化给定 STRING。

不带语言参数（默认为 'en'）

RETURN apoc.text.format('ab%s %d %.1f %s%n',['cd', 42, 3.14, true]) AS value // abcd 42 3.1 true

带语言参数

RETURN apoc.text.format('ab%s %d %.1f %s%n',['cd', 42, 3.14, true],'it') AS value // abcd 42 3,1 true

字符串搜索

indexOf 函数提供给定 lookup 字符串在 text 中的首次出现位置，如果未找到则返回 -1。它还可以选择性地接受 from（包含）和 to（不包含）参数。

RETURN apoc.text.indexOf('Hello World!', 'World') // 6

indexesOf 函数提供给定 lookup 字符串在 text 中的所有出现位置，如果未找到则返回空列表。它还可以选择性地接受 from（包含）和 to（不包含）参数。

RETURN apoc.text.indexesOf('Hello World!', 'o',2,9) // [4,7]

获取从索引匹配开始的子字符串

返回 World!

WITH 'Hello World!' as text, length(text) as len
WITH text, len, apoc.text.indexOf(text, 'World',3) as index
RETURN substring(text, case index when -1 then len-1 else index end, len);

正则表达式

返回 'HelloWorld'

RETURN apoc.text.replace('Hello World!', '[^a-zA-Z]', '')

返回给定正则表达式模式匹配的结果

RETURN apoc.text.regexGroups('abc <link xxx1>yyy1</link> def <link xxx2>yyy2</link>','<link (\\w+)>(\\w+)</link>') AS result

// [["<link xxx1>yyy1</link>", "xxx1", "yyy1"], ["<link xxx2>yyy2</link>", "xxx2", "yyy2"]]

返回给定正则表达式模式匹配的结果，并连接到其给定组名

RETURN apoc.text.regexGroupsByName(
  'abc <link xxx1>yyy1</link> def <link xxx2>yyy2</link>',
  '<link (?<firstPart>\\w+)>(?<secondPart>\\w+)</link>'
) AS output;

// [{ "group": "<link xxx1>yyy1</link>", "matches" : {"firstPart": "xxx1", "secondPart": "yyy1"}}, {"group": <link xxx2>yyy2</link>", "matches" : { "firstPart":  "xxx2", "secondPart": "yyy2"}}]

分割与连接

将使用给定正则表达式分割，返回 ['Hello', 'World']

RETURN apoc.text.split('Hello   World', ' +')

将返回 'Hello World'

RETURN apoc.text.join(['Hello', 'World'], ' ')

数据清洗

将返回 'helloworld'

RETURN apoc.text.clean('Hello World!')

将返回 true

RETURN apoc.text.compareCleaned('Hello World!', '_hello-world_')

将只返回 'Hello World!'

UNWIND ['Hello World!', 'hello worlds'] as text
RETURN apoc.text.filterCleanMatches(text, 'hello_world') as text

清洗功能对于清理格式不一致的微脏文本数据，以便进行非精确比较非常有用。

清洗会剥离字符串中所有非字母数字字符（包括空格），并将其转换为小写。

大小写转换函数

使用 capitalize 将单词的首字母大写

RETURN apoc.text.capitalize("neo4j") // "Neo4j"

使用 capitalizeAll 将文本中每个单词的首字母大写

RETURN apoc.text.capitalizeAll("graph database") // "Graph Database"

使用 decapitalize 将字符串的首字母小写

RETURN apoc.text.decapitalize("Graph Database") // "graph Database"

使用 decapitalizeAll 将所有单词的首字母小写

RETURN apoc.text.decapitalizeAll("Graph Databases") // "graph databases"

使用 swapCase 切换字符串的大小写

RETURN apoc.text.swapCase("Neo4j") // nEO4J

使用 camelCase 将字符串转换为小驼峰命名法

RETURN apoc.text.camelCase("FOO_BAR");    // "fooBar"
RETURN apoc.text.camelCase("Foo bar");    // "fooBar"
RETURN apoc.text.camelCase("Foo22 bar");  // "foo22Bar"
RETURN apoc.text.camelCase("foo-bar");    // "fooBar"
RETURN apoc.text.camelCase("Foobar");     // "foobar"
RETURN apoc.text.camelCase("Foo$$Bar");   // "fooBar"

使用 upperCamelCase 将字符串转换为大驼峰命名法

RETURN apoc.text.upperCamelCase("FOO_BAR");   // "FooBar"
RETURN apoc.text.upperCamelCase("Foo bar");   // "FooBar"
RETURN apoc.text.upperCamelCase("Foo22 bar"); // "Foo22Bar"
RETURN apoc.text.upperCamelCase("foo-bar");   // "FooBar"
RETURN apoc.text.upperCamelCase("Foobar");    // "Foobar"
RETURN apoc.text.upperCamelCase("Foo$$Bar");  // "FooBar"

使用 snakeCase 将字符串转换为蛇形命名法

RETURN apoc.text.snakeCase("test Snake Case"); // "test-snake-case"
RETURN apoc.text.snakeCase("FOO_BAR");         // "foo-bar"
RETURN apoc.text.snakeCase("Foo bar");         // "foo-bar"
RETURN apoc.text.snakeCase("fooBar");          // "foo-bar"
RETURN apoc.text.snakeCase("foo-bar");         // "foo-bar"
RETURN apoc.text.snakeCase("Foo bar");         // "foo-bar"
RETURN apoc.text.snakeCase("Foo  bar");        // "foo-bar"

使用 toUpperCase 将字符串转换为大写

RETURN apoc.text.toUpperCase("test upper case"); // "TEST_UPPER_CASE"
RETURN apoc.text.toUpperCase("FooBar");          // "FOO_BAR"
RETURN apoc.text.toUpperCase("fooBar");          // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo-bar");         // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo--bar");        // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo$$bar");        // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo 22 bar");      // "FOO_22_BAR"

Base64 解码与编码

以 base64 或 base64Url 编码或解码字符串

Base64 编码

RETURN apoc.text.base64Encode("neo4j") // bmVvNGo=

Base64 解码

RETURN apoc.text.base64Decode("bmVvNGo=") // neo4j

Base64 URL 编码

RETURN apoc.text.base64UrlEncode("https://neo4j.ac.cn/?test=test") // aHR0cDovL25lbzRqLmNvbS8_dGVzdD10ZXN0

Base64 URL 解码

RETURN apoc.text.base64UrlDecode("aHR0cDovL25lbzRqLmNvbS8_dGVzdD10ZXN0") // https://neo4j.ac.cn/?test=test

随机字符串

您可以通过调用 apoc.text.random 并提供长度参数和可选的有效字符字符串来生成指定长度的随机字符串。

valid 参数将接受以下正则表达式模式，或者您可以提供一个字母和/或字符的字符串。

模式

描述

A-Z

大写字母 A-Z

a-z

小写字母 A-Z

0-9

数字 0-9（包含）

以下调用将返回一个随机字符串，其中包含大写字母、数字以及 . 和 $ 字符。

RETURN apoc.text.random(10, "A-Z0-9.$")

哈希函数

apoc.util.sha1([values])

计算列表中所有字符串值连接的 SHA1 值

apoc.util.md5([values])

计算列表中所有字符串值连接的 MD5 值