备份、聚合和恢复(在线)

为了执行备份,Neo4j 使用管理服务,该服务仅在 Kubernetes 集群内部可用,并且应保护对它的访问。有关更多信息,请参阅访问 Neo4j

准备将数据库备份到云提供商(AWS、GCP 和 Azure)存储桶

您可以使用 neo4j/neo4j-admin Helm 图表将 Neo4j 数据库备份到任何云提供商(AWS、GCP 和 Azure)存储桶中。从 Neo4j 5.10 开始,neo4j/neo4j-admin Helm 图表也支持执行多个数据库的备份。从 5.13 开始,neo4j/neo4j-admin Helm 图表还支持 GCP、AWS 和 Azure 的工作负载身份集成。从 5.14 开始,neo4j/neo4j-admin Helm 图表还支持 MinIO(一个与 AWS S3 兼容的对象存储 API)用于非 TLS/SSL 端点。

先决条件

在您备份数据库并将其上传到存储桶之前,请验证您是否具有以下条件:

创建 Kubernetes 密钥

您可以使用以下选项之一创建具有可以访问云提供商存储桶的凭据的 Kubernetes 密钥:

使用您的 GCP 服务帐户 JSON 密钥文件创建名为 gcpcreds 的密钥。JSON 密钥文件包含具有对存储桶访问权限的服务帐户的所有详细信息。

kubectl create secret generic gcpcreds --from-file=credentials=/path/to/gcpcreds.json
  1. 创建以下格式的凭据文件:

    [ default ]
    region = us-east-1
    aws_access_key_id = <your-aws_access_key_id>
    aws_secret_access_key = <your-aws_secret_access_key>
  2. 通过凭据文件创建名为 awscreds 的密钥。

    kubectl create secret generic awscreds --from-file=credentials=/path/to/your/credentials
  1. 创建以下格式的凭据文件:

    AZURE_STORAGE_ACCOUNT_NAME=<your-azure-storage-account-name>
    AZURE_STORAGE_ACCOUNT_KEY=<your-azure-storage-account-key>
  2. 通过凭据文件创建名为 azurecred 的密钥。

    kubectl create secret generic azurecred --from-file=credentials=/path/to/your/credentials

配置备份参数

您可以通过使用 secretNamesecretKeyName 参数或通过将 Kubernetes 服务帐户映射到工作负载身份集成来在 backup-values.yaml 文件中配置备份参数。

以下示例显示了执行到云提供商存储桶的备份所需的最小配置。有关可用备份参数的更多信息,请参阅 备份参数

使用 secretNamesecretKeyName 参数配置 backup-values.yaml 文件

neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.10.0"
  jobSchedule: "* * * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  backoffLimit: 3

backup:
  bucketName: "my-bucket"
  databaseAdminServiceName:  "standalone-admin" #This is the Neo4j Admin Service name.
  database: "neo4j,system"
  cloudProvider: "gcp"
  secretName: "gcpcreds"
  secretKeyName: "credentials"

consistencyCheck:
  enabled: true
neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.10.0"
  jobSchedule: "* * * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  backoffLimit: 3

backup:
  bucketName: "my-bucket"
  databaseAdminServiceName:  "standalone-admin"
  database: "neo4j,system"
  cloudProvider: "aws"
  secretName: "awscreds"
  secretKeyName: "credentials"

consistencyCheck:
  enabled: true
neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.10.0"
  jobSchedule: "* * * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  backoffLimit: 3

backup:
  bucketName: "my-bucket"
  databaseAdminServiceName:  "standalone-admin"
  database: "neo4j,system"
  cloudProvider: "azure"
  secretName: "azurecreds"
  secretKeyName: "credentials"

consistencyCheck:
  enabled: true

使用服务帐户工作负载身份集成配置 backup-values.yaml 文件

在某些情况下,将 Kubernetes 服务帐户与工作负载身份集成分配给 Neo4j 备份 Pod 可能很有用。当您想要提高安全性并对 Pod 进行更精确的访问控制时,这一点尤其重要。这样做可确保根据 Pod 在云生态系统中的身份授予对资源的安全访问。有关设置具有工作负载身份的服务帐户的更多信息,请参阅 Google Kubernetes Engine (GKE) → 使用工作负载身份Amazon EKS → 配置 Kubernetes 服务帐户以承担 IAM 角色Microsoft Azure → 将 Microsoft Entra 工作负载 ID 用于 Azure Kubernetes Service (AKS)

要配置 Neo4j 备份 Pod 以使用具有工作负载身份的 Kubernetes 服务帐户,请将 serviceAccountName 设置为要使用的服务帐户的名称。对于 Azure 部署,您还需要将 azureStorageAccountName 参数设置为 Azure 存储帐户的名称,备份文件将上传到该帐户。例如:

neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.13.0"
  jobSchedule: "* * * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  backoffLimit: 3

backup:
  bucketName: "my-bucket"
  databaseAdminServiceName:  "standalone-admin" #This is the Neo4j Admin Service name.
  database: "neo4j,system"
  cloudProvider: "gcp"
  secretName: ""
  secretKeyName: ""

consistencyCheck:
  enabled: true

serviceAccountName: "demo-service-account"
neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.13.0"
  jobSchedule: "* * * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  backoffLimit: 3

backup:
  bucketName: "my-bucket"
  databaseAdminServiceName:  "standalone-admin"
  database: "neo4j,system"
  cloudProvider: "aws"
  secretName: ""
  secretKeyName: ""

consistencyCheck:
  enabled: true

serviceAccountName: "demo-service-account"
neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.13.0"
  jobSchedule: "* * * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  backoffLimit: 3

backup:
  bucketName: "my-bucket"
  databaseAdminServiceName:  "standalone-admin"
  database: "neo4j,system"
  cloudProvider: "azure"
  azureStorageAccountName: "storageAccountName"

consistencyCheck:
  enabled: true

serviceAccountName: "demo-service-account"

默认情况下创建的 /backups 挂载是 emptyDir 类型卷。这意味着存储在此卷中的数据不是持久性的,并且在 Pod 被删除时将丢失。要将持久卷用于备份,请将以下部分添加到 backup-values.yaml 文件中:

tempVolume:
  persistentVolumeClaim:
    claimName: backup-pvc

您需要在安装 neo4j-admin Helm 图表之前创建持久卷和持久卷声明。有关更多信息,请参阅 卷挂载和持久卷

配置 backup-values.yaml 文件以使用 MinIO

此功能从 Neo4j 5.14 开始可用。

MinIO 是一个与 AWS S3 兼容的对象存储 API。您可以在 backup-values.yaml 文件中指定 minioEndpoint 参数以将备份推送到您的 MinIO 存储桶。此端点必须是 s3 API 端点,否则备份 Helm 图表将失败。仅支持非 TLS/SSL 端点。例如:

neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.14.0"
  jobSchedule: "* * * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  backoffLimit: 3

backup:
  bucketName: "my-bucket"
  databaseAdminServiceName:  "standalone-admin"
  minioEndpoint: "http://demo.minio.svc.cluster.local:9000"
  database: "neo4j,system"
  cloudProvider: "aws"
  secretName: "awscreds"
  secretKeyName: "credentials"

consistencyCheck:
  enabled: true

准备将数据库备份到本地存储

此功能从 Neo4j 5.16 开始可用。

您可以使用 neo4j/neo4j-admin Helm 图表将 Neo4j 数据库备份到本地存储。配置 backup-values.yaml 文件时,请将“cloudProvider”字段保留为空,并在 tempVolume 部分提供持久卷,以确保如果 Pod 被删除,备份文件将保持持久性。

您需要在安装 neo4j-admin Helm 图表之前创建持久卷和持久卷声明。有关更多信息,请参阅 卷挂载和持久卷

例如:

neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.16.0"
  jobSchedule: "* * * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  backoffLimit: 3

backup:
  bucketName: "my-bucket"
  databaseAdminServiceName:  "standalone-admin"
  database: "neo4j,system"
  cloudProvider: ""

consistencyCheck:
  enabled: true

tempVolume:
  persistentVolumeClaim:
    claimName: backup-pvc

备份参数

要查看 Helm 图表上哪些选项是可配置的,请使用 helm show values 和 Helm 图表 neo4j/neo4j-admin
从 Neo4j 5.10 开始,neo4j/neo4j-admin Helm 图表也支持使用 nodeSelector 标签将您的 Neo4j Pod 分配到特定节点,从 Neo4j 5.11 开始,可以使用亲和性/反亲和性规则或容忍度。有关更多信息,请参阅 将备份 Pod 分配到特定节点 以及 Kubernetes 官方文档关于 亲和性和反亲和性 规则和 污点和容忍度

例如:

helm show values neo4j/neo4j-admin
## @param nameOverride String to partially override common.names.fullname
nameOverride: ""
## @param fullnameOverride String to fully override common.names.fullname
fullnameOverride: ""
# disableLookups will disable all the lookups done in the helm charts
# This should be set to true when using ArgoCD since ArgoCD uses helm template and the helm lookups will fail
# You can enable this when executing helm commands with --dry-run command
disableLookups: false

neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.21.0"
  podLabels: {}
#    app: "demo"
#    acac: "dcdddc"
  podAnnotations: {}
#    ssdvvs: "svvvsvs"
#    vfsvswef: "vcfvgb"
  # define the backup job schedule . default is * * * * *
  jobSchedule: ""
  # default is 3
  successfulJobsHistoryLimit:
  # default is 1
  failedJobsHistoryLimit:
  # default is 3
  backoffLimit:
  #add labels if required
  labels: {}

backup:
  # Ensure the bucket is already existing in the respective cloud provider
  # In case of azure the bucket is the container name in the storage account
  # bucket: azure-storage-container
  bucketName: ""

  #address details of the neo4j instance from which backup is to be done (serviceName or ip either one is required)

  #ex: standalone-admin.default.svc.cluster.local:6362
  # admin service name -  standalone-admin
  # namespace - default
  # cluster domain - cluster.local
  # port - 6362

  #ex: 10.3.3.2:6362
  # admin service ip - 10.3.3.2
  # port - 6362

  databaseAdminServiceName: ""
  databaseAdminServiceIP: ""
  #default name is 'default'
  databaseNamespace: ""
  #default port is 6362
  databaseBackupPort: ""
  #default value is cluster.local
  databaseClusterDomain: ""
  # specify minio endpoint ex: http://demo.minio.svc.cluster.local:9000
  # please ensure this endpoint is the s3 api endpoint or else the backup helm chart will fail
  # as of now it works only with non tls endpoints
  # to be used only when aws is used as cloudProvider
  minioEndpoint: ""

  #name of the database to backup ex: neo4j or neo4j,system (You can provide command separated database names)
  # In case of comma separated databases failure of any single database will lead to failure of complete operation
  database: ""
  # cloudProvider can be either gcp, aws, or azure
  # if cloudProvider is empty then the backup will be done to the /backups mount.
  # the /backups mount can point to a persistentVolume based on the definition set in tempVolume
  cloudProvider: ""



  # name of the kubernetes secret containing the respective cloud provider credentials
  # Ensure you have read,write access to the mentioned bucket
  # For AWS :
  # add the below in a file and create a secret via
  # 'kubectl create secret generic awscred --from-file=credentials=/demo/awscredentials'

  #  [ default ]
  #  region = us-east-1
  #  aws_access_key_id = XXXXX
  #  aws_secret_access_key = XXXX

  # For AZURE :
  # add the storage account name and key in below format in a file create a secret via
  # 'kubectl create secret generic azurecred --from-file=credentials=/demo/azurecredentials'

  #  AZURE_STORAGE_ACCOUNT_NAME=XXXX
  #  AZURE_STORAGE_ACCOUNT_KEY=XXXX

  # For GCP :
  # create the secret via the gcp service account json key file.
  # ex: 'kubectl create secret generic gcpcred --from-file=credentials=/demo/gcpcreds.json'
  secretName: ""
  # provide the keyname used in the above secret
  secretKeyName: ""
  # provide the azure storage account name
  # this to be provided when you are using workload identity integration for azure
  azureStorageAccountName: ""
  #setting this to true will not delete the backup files generated at the /backup mount
  keepBackupFiles: true

  #Below are all neo4j-admin database backup flags / options
  #To know more about the flags read here : https://neo4j.ac.cn/docs/operations-manual/current/backup-restore/online-backup/
  pageCache: ""
  includeMetadata: "all"
  type: "AUTO"
  keepFailed: false
  parallelRecovery: false
  verbose: true
  heapSize: ""

  # https://neo4j.ac.cn/docs/operations-manual/current/backup-restore/aggregate/
  # Performs aggregate backup. If enabled, NORMAL BACKUP WILL NOT BE DONE only aggregate backup
  # fromPath supports only s3 or local mount. For s3 , please set cloudProvider to aws and use either serviceAccount or creds
  aggregate:
    enabled: false
    verbose: true
    keepOldBackup: false
    parallelRecovery: false
    # Only AWS S3 or local mount paths are supported
    # For S3 provide the complete path , Ex: s3://bucket1/bucket2
    fromPath: ""
    # database name to aggregate. Can contain * and ? for globbing.
    database: ""

#Below are all neo4j-admin database check flags / options
#To know more about the flags read here : https://neo4j.ac.cn/docs/operations-manual/current/tools/neo4j-admin/consistency-checker/
consistencyCheck:
  enable: false
  checkIndexes: true
  checkGraph: true
  checkCounts: true
  checkPropertyOwners: true
  #The database name for which consistency check needs to be done.
  #Defaults to the backup.database values if left empty
  #The database name here should match with one of the database names present in backup.database. If not , the consistency check will be ignored
  database: ""
  maxOffHeapMemory: ""
  threads: ""
  verbose: true

# Set to name of an existing Service Account to use if desired
# Follow the following links for setting up a service account with workload identity
# Azure - https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview?tabs=go
# GCP - https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity
# AWS - https://docs.aws.amazon.com/eks/latest/userguide/associate-service-account-role.html
serviceAccountName: ""

# Volume to use as temporary storage for files before they are uploaded to cloud. For large databases local storage may not have sufficient space.
# In that case set an ephemeral or persistent volume with sufficient space here
# The chart defaults to an emptyDir, use this to overwrite default behavior
#tempVolume:
#  persistentVolumeClaim:
#    claimName: backup-pvc

# securityContext defines privilege and access control settings for a Pod. Making sure that we don't run Neo4j as root user.
securityContext:
  runAsNonRoot: true
  runAsUser: 7474
  runAsGroup: 7474
  fsGroup: 7474
  fsGroupChangePolicy: "Always"

# default ephemeral storage of backup container
resources:
  requests:
    ephemeralStorage: "4Gi"
    cpu: ""
    memory: ""
  limits:
    ephemeralStorage: "5Gi"
    cpu: ""
    memory: ""

# nodeSelector labels
# please ensure the respective labels are present on one of nodes or else helm charts will throw an error
nodeSelector: {}
#  label1: "true"
#  label2: "value1"

# set backup pod affinity
affinity: {}
#  podAffinity:
#    requiredDuringSchedulingIgnoredDuringExecution:
#      - labelSelector:
#          matchExpressions:
#            - key: security
#              operator: In
#              values:
#                - S1
#        topologyKey: topology.kubernetes.io/zone
#  podAntiAffinity:
#    preferredDuringSchedulingIgnoredDuringExecution:
#      - weight: 100
#        podAffinityTerm:
#          labelSelector:
#            matchExpressions:
#              - key: security
#                operator: In
#                values:
#                  - S2
#          topologyKey: topology.kubernetes.io/zone

#Add tolerations to the Neo4j pod
tolerations: []
#  - key: "key1"
#    operator: "Equal"
#    value: "value1"
#    effect: "NoSchedule"
#  - key: "key2"
#    operator: "Equal"
#    value: "value2"
#    effect: "NoSchedule"

备份您的数据库

要备份您的数据库,您可以使用配置的 backup-values.yaml 文件安装 neo4j-admin Helm 图表。

  1. 使用 backup-values.yaml 文件安装 neo4j-admin Helm 图表

    helm install backup-name neo4j-admin -f /path/to/your/backup-values.yaml

    neo4j/neo4j-admin Helm 图表安装一个 cron 作业,该作业根据作业计划启动一个 Pod。此 Pod 执行一个或多个数据库的备份、备份文件的一致性检查,并将它们上传到云提供商存储桶。

  2. 使用 kubectl logs pod/<neo4j-backup-pod-name> 监控备份 Pod 日志以检查备份进度。

  3. 检查备份文件和一致性检查报告是否已上传到云提供商存储桶或本地存储。

聚合数据库备份链

聚合备份命令将备份链转换为单个备份文件。当您有一个要还原到不同集群的备份链,或者当您想要存档备份链时,这很有用。有关聚合备份链操作的优势、其语法和可用选项的更多信息,请参阅 聚合数据库备份链

neo4j-admin Helm 图表支持聚合存储在 AWS S3 存储桶或本地挂载中的备份链。如果启用,则不会执行正常的备份,只会执行聚合备份。

  1. 要聚合存储在 AWS S3 存储桶或本地挂载中的备份链,您需要在您的 backup-values.yaml 文件中提供以下信息:

    如果您的备份链存储在 AWS S3 上,则需要将 cloudProvider 设置为 aws 并使用 credsserviceAccount 连接到您的 AWS S3 存储桶。例如:

    使用 awscreds 密钥连接到您的 AWS S3 存储桶
    neo4j:
      image: "neo4j/helm-charts-backup"
      imageTag: "5.21.0"
      jobSchedule: "* * * * *"
      successfulJobsHistoryLimit: 3
      failedJobsHistoryLimit: 1
      backoffLimit: 3
    
    backup:
    
      cloudProvider: "aws"
      secretName: "awscreds"
      secretKeyName: "credentials"
    
      aggregate:
        enabled: true
        verbose: false
        keepOldBackup: false
        parallelRecovery: false
        fromPath: "s3://bucket1/bucket2"
        # Database name to aggregate. Can contain * and ? for globbing.
        database: "neo4j"
    
    resources:
      requests:
        ephemeralStorage: "4Gi"
      limits:
        ephemeralStorage: "5Gi"
    使用 serviceAccount 连接到您的 AWS S3 存储桶
    neo4j:
      image: "neo4j/helm-charts-backup"
      imageTag: "5.21.0"
      jobSchedule: "* * * * *"
      successfulJobsHistoryLimit: 3
      failedJobsHistoryLimit: 1
      backoffLimit: 3
    
    backup:
    
        cloudProvider: "aws"
    
        aggregate:
          enabled: true
          verbose: false
          keepOldBackup: false
          parallelRecovery: false
          fromPath: "s3://bucket1/bucket2"
          # Database name to aggregate. Can contain * and ? for globbing.
          database: "neo4j"
    
    #The service account must already exist in your cloud provider account and have the necessary permissions to manage your S3 bucket, as well as to download and upload files. See the example policy below.
    #{
    #   "Version": "2012-10-17",
    #    "Id": "Neo4jBackupAggregatePolicy",
    #    "Statement": [
    #        {
    #            "Sid": "Neo4jBackupAggregateStatement",
    #            "Effect": "Allow",
    #            "Action": [
    #                "s3:ListBucket",
    #                "s3:GetObject",
    #                "s3:PutObject",
    #                "s3:DeleteObject"
    #            ],
    #            "Resource": [
    #                "arn:aws:s3:::mybucket/*",
    #                "arn:aws:s3:::mybucket"
    #            ]
    #        }
    #    ]
    #}
    serviceAccountName: "my-service-account"
    
    resources:
      requests:
        ephemeralStorage: "4Gi"
      limits:
        ephemeralStorage: "5Gi"
    neo4j:
      image: "neo4j/helm-charts-backup"
      imageTag: "5.21.0"
      successfulJobsHistoryLimit: 1
      failedJobsHistoryLimit: 1
      backoffLimit: 1
    
    backup:
    
      aggregate:
        enabled: true
        verbose: false
        keepOldBackup: false
        parallelRecovery: false
        fromPath: "/backups"
        # Database name to aggregate. Can contain * and ? for globbing.
        database: "neo4j"
    
    tempVolume:
      persistentVolumeClaim:
        claimName: aggregate-pv-pvc
    
    resources:
      requests:
        ephemeralStorage: "4Gi"
      limits:
        ephemeralStorage: "5Gi"
  2. 使用配置的 backup-values.yaml 文件安装 neo4j-admin Helm 图表。

    helm install backup-name neo4j-admin -f /path/to/your/backup-values.yaml
  3. 使用 kubectl logs pod/<neo4j-aggregate-backup-pod-name> 监控 Pod 日志以检查聚合备份操作的进度。

  4. 验证聚合的备份文件是否已替换云提供商存储桶或本地存储中的备份链。

还原单个数据库

要还原单个脱机数据库或数据库备份,您首先需要删除要替换的数据库,除非您想将备份作为 DBMS 中的附加数据库还原。然后,使用 neo4j-admin 的还原命令还原数据库备份。最后,使用 Cypher 命令 CREATE DATABASE namesystem 数据库中创建已还原的数据库。

删除要替换的数据库

在还原数据库备份之前,您必须使用针对 system 数据库的 Cypher 命令 DROP DATABASE name 删除要替换为该备份的数据库。如果要将备份作为 DBMS 中的附加数据库还原,则可以继续下一节。

对于 Neo4j 集群部署,您只需在一个集群服务器上运行 Cypher 命令 DROP DATABASE name。该命令会自动从那里路由到其他集群成员。

  1. 连接到 Neo4j DBMS

    kubectl exec -it <release-name>-0 -- bash
  2. 使用 cypher-shell 连接到 system 数据库

    cypher-shell -u neo4j -p <password> -d system
  3. 删除要替换为备份的数据库

    DROP DATABASE neo4j;
  4. 退出 Cypher Shell 命令行控制台

    :exit;

还原数据库备份

使用neo4j-admin database restore命令恢复数据库备份,然后使用Cypher命令CREATE DATABASE namesystem数据库中创建恢复的数据库。有关命令语法、选项和用法的详细信息,请参阅恢复数据库备份

对于Neo4j集群部署,请在每个集群服务器上恢复数据库备份。

  1. 运行neo4j-admin database restore命令以恢复数据库备份。

    neo4j-admin database restore neo4j --from-path=/backups/neo4j --expand-commands
  2. 使用 cypher-shell 连接到 system 数据库

    cypher-shell -u neo4j -p <password> -d system
  3. 创建neo4j数据库。

    对于Neo4j集群部署,您只需在一个集群服务器上运行Cypher命令CREATE DATABASE name

    CREATE DATABASE neo4j;
  4. 在浏览器中打开http://<external-ip>:7474/browser/并检查所有数据是否已成功恢复。

  5. neo4j数据库执行Cypher命令,例如

    MATCH (n) RETURN n

    如果您使用--include-metadata选项备份了数据库,则可以手动恢复用户和角色元数据。有关更多信息,请参阅恢复数据库备份→示例

要恢复system数据库,请按照转储和加载数据库(离线)中描述的步骤操作。