Azure Synapse Analytics

Azure Synapse Analytics(以前称为 SQL 数据仓库)是一个基于云的企业数据仓库,它利用大规模并行处理 (MPP) 来快速运行跨 PB 级数据的复杂查询。

先决条件

您需要一个正在运行的 Azure Synapse Analytics 实例。如果您没有,可以从 这里创建。

依赖项

Azure Synapse Analytics 仅在 Databricks Runtime 中通过 Spark 工作,因为所需的连接器尚未公开发布。

身份验证

Azure Synapse 连接器使用三种类型的网络连接

  • Spark 驱动程序到 Azure Synapse

  • Spark 驱动程序和执行器到 Azure 存储帐户

  • Azure Synapse 到 Azure 存储帐户

要选择最适合您的用例的身份验证方法,我们建议您查看官方的 Azure Synapse 文档

从 Azure Synapse Analytics 到 Neo4j

根据您选择的身份验证方法,以下是一个关于如何将数据从 Azure Synapse Analytics 表作为节点导入到 Neo4j 的示例

// Step (1)
// Load a table into a Spark DataFrame
val azureDF: DataFrame = spark.read
  .format("com.databricks.spark.sqldw")
  .option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>")
  .option("dbTable", "CUSTOMER")
  .load()

// Step (2)
// Save the `azureDF` as nodes with labels `Person` and `Customer` into Neo4j
azureDF.write
  .format("org.neo4j.spark.DataSource")
  .mode(SaveMode.ErrorIfExists)
  .option("url", "neo4j://<host>:<port>")
  .option("labels", ":Person:Customer")
  .save()
# Step (1)
# Load a table into a Spark DataFrame
azureDF = (spark.read
  .format("com.databricks.spark.sqldw")
  .option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>")
  .option("dbTable", "CUSTOMER")
  .load())

# Step (2)
# Save the `azureDF` as nodes with labels `Person` and `Customer` into Neo4j
(azureDF.write
  .format("org.neo4j.spark.DataSource")
  .mode("ErrorIfExists")
  .option("url", "neo4j://<host>:<port>")
  .option("labels", ":Person:Customer")
  .save())

从 Neo4j 到 Azure Synapse Analytics

根据您选择的身份验证方法,以下是一个关于如何将数据从 Neo4j 导入到 Azure Synapse Analytics 表的示例

// Step (1)
// Load `:Person:Customer` nodes as DataFrame
val neo4jDF: DataFrame = spark.read.format("org.neo4j.spark.DataSource")
  .option("url", "neo4j://<host>:<port>")
  .option("labels", ":Person:Customer")
  .load()

// Step (2)
// Save the `neo4jDF` as table CUSTOMER into Azure Synapse Analytics
neo4jDF.write
  .format("com.databricks.spark.sqldw")
  .option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>")
  .option("dbTable", "CUSTOMER")
  .save()
# Step (1)
# Load `:Person:Customer` nodes as DataFrame
neo4jDF = (spark.read.format("org.neo4j.spark.DataSource")
  .option("url", "neo4j://<host>:<port>")
  .option("labels", ":Person:Customer")
  .load())

# Step (2)
# Save the `neo4jDF` as table CUSTOMER into Azure Synapse Analytics
(neo4jDF.write
  .format("com.databricks.spark.sqldw")
  .option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>")
  .option("dbTable", "CUSTOMER")
  .save())