Amazon Redshift
Amazon Redshift 使用 SQL 分析跨数据仓库、操作数据库和数据湖的结构化和半结构化数据,使用 AWS 设计的硬件和机器学习,以任何规模提供最佳的性价比。
先决条件
您需要一个正在运行的 Amazon Redshift 实例。如果您没有,可以从 这里 创建。
从 Redshift 到 Neo4j
在 Databricks Runtime 中
在这种情况下,一个好的起点是 Databricks 指南。
// Step (1)
// Load a table into a Spark DataFrame
val redshiftDF: DataFrame = spark.read
.format("com.databricks.spark.redshift")
.option("url", "jdbc:redshift://<the-rest-of-the-connection-string>")
.option("dbtable", "CUSTOMER")
.option("tempdir", "s3a://<your-bucket>/<your-directory-path>")
.load()
// Step (2)
// Save the `redshiftDF` as nodes with labels `Person` and `Customer` into Neo4j
redshiftDF.write
.format("org.neo4j.spark.DataSource")
.mode(SaveMode.ErrorIfExists)
.option("url", "neo4j://<host>:<port>")
.option("labels", ":Person:Customer")
.save()
# Step (1)
# Load a table into a Spark DataFrame
redshiftDF = (spark.read
.format("com.databricks.spark.redshift")
.option("url", "jdbc:redshift://<the-rest-of-the-connection-string>")
.option("dbtable", "CUSTOMER")
.option("tempdir", "s3a://<your-bucket>/<your-directory-path>")
.load())
# Step (2)
# Save the `redshiftDF` as nodes with labels `Person` and `Customer` into Neo4j
(redshiftDF.write
.format("org.neo4j.spark.DataSource")
.mode("ErrorIfExists")
.option("url", "neo4j://<host>:<port>")
.option("labels", ":Person:Customer")
.save())
在任何其他具有 Redshift 社区依赖关系的 Spark Runtime 中
在这种情况下,一个好的起点是 Redshift 社区存储库
// Step (1)
// Load a table into a Spark DataFrame
val redshiftDF: DataFrame = spark.read
.format("io.github.spark_redshift_community.spark.redshift")
.option("url", "jdbc:redshift://<the-rest-of-the-connection-string>")
.option("dbtable", "CUSTOMER")
.option("tempdir", "s3a://<your-bucket>/<your-directory-path>")
.load()
// Step (2)
// Save the `redshiftDF` as nodes with labels `Person` and `Customer` into Neo4j
redshiftDF.write
.format("org.neo4j.spark.DataSource")
.mode(SaveMode.ErrorIfExists)
.option("url", "neo4j://<host>:<port>")
.option("labels", ":Person:Customer")
.save()
# Step (1)
# Load a table into a Spark DataFrame
redshiftDF = (spark.read
.format("io.github.spark_redshift_community.spark.redshift")
.option("url", "jdbc:redshift://<the-rest-of-the-connection-string>")
.option("dbtable", "CUSTOMER")
.option("tempdir", "s3a://<your-bucket>/<your-directory-path>")
.load())
# Step (2)
# Save the `redshiftDF` as nodes with labels `Person` and `Customer` into Neo4j
(redshiftDF.write
.format("org.neo4j.spark.DataSource")
.mode("ErrorIfExists")
.option("url", "neo4j://<host>:<port>")
.option("labels", ":Person:Customer")
.save())
从 Neo4j 到 Redshift
在 Databricks Runtime 中
在这种情况下,一个好的起点是 Databricks 指南。
// Step (1)
// Load `:Person:Customer` nodes as DataFrame
val neo4jDF: DataFrame = spark.read.format("org.neo4j.spark.DataSource")
.option("url", "neo4j://<host>:<port>")
.option("labels", ":Person:Customer")
.load()
// Step (2)
// Save the `neo4jDF` as table CUSTOMER into Redshift
neo4jDF.write
.format("com.databricks.spark.redshift")
.option("url", "jdbc:redshift://<the-rest-of-the-connection-string>")
.option("dbtable", "CUSTOMER")
.option("tempdir", "s3a://<your-bucket>/<your-directory-path>")
.mode("error")
.save()
# Step (1)
# Load `:Person:Customer` nodes as DataFrame
neo4jDF = (spark.read.format("org.neo4j.spark.DataSource")
.option("url", "neo4j://<host>:<port>")
.option("labels", ":Person:Customer")
.load())
# Step (2)
# Save the `neo4jDF` as table CUSTOMER into Redshift
(neo4jDF.write
.format("com.databricks.spark.redshift")
.option("url", "jdbc:redshift://<the-rest-of-the-connection-string>")
.option("dbtable", "CUSTOMER")
.option("tempdir", "s3a://<your-bucket>/<your-directory-path>")
.mode("error")
.save())
在任何其他具有 Redshift 社区依赖关系的 Spark Runtime 中
在这种情况下,一个好的起点是 RediShift 社区存储库。
// Step (1)
// Load `:Person:Customer` nodes as DataFrame
val neo4jDF: DataFrame = spark.read.format("org.neo4j.spark.DataSource")
.option("url", "neo4j://<host>:<port>")
.option("labels", ":Person:Customer")
.load()
// Step (2)
// Save the `neo4jDF` as table CUSTOMER into Redshift
neo4jDF.write
.format("io.github.spark_redshift_community.spark.redshift")
.option("url", "jdbc:redshift://<the-rest-of-the-connection-string>")
.option("dbtable", "CUSTOMER")
.option("tempdir", "s3a://<your-bucket>/<your-directory-path>")
.mode("error")
.save()
# Step (1)
# Load `:Person:Customer` nodes as DataFrame
neo4jDF = (spark.read.format("org.neo4j.spark.DataSource")
.option("url", "neo4j://<host>:<port>")
.option("labels", ":Person:Customer")
.load())
# Step (2)
# Save the `neo4jDF` as table CUSTOMER into Redshift
(neo4jDF.write
.format("io.github.spark_redshift_community.spark.redshift")
.option("url", "jdbc:redshift://<the-rest-of-the-connection-string>")
.option("dbtable", "CUSTOMER")
.option("tempdir", "s3a://<your-bucket>/<your-directory-path>")
.mode("error")
.save())