备份、聚合和恢复(在线)
为了执行备份,Neo4j 使用管理服务,该服务仅在 Kubernetes 集群内部可用,并且应保护对它的访问。有关更多信息,请参阅访问 Neo4j。 |
准备将数据库备份到云提供商(AWS、GCP 和 Azure)存储桶
您可以使用 neo4j/neo4j-admin Helm 图表将 Neo4j 数据库备份到任何云提供商(AWS、GCP 和 Azure)存储桶中。从 Neo4j 5.10 开始,neo4j/neo4j-admin Helm 图表也支持执行多个数据库的备份。从 5.13 开始,neo4j/neo4j-admin Helm 图表还支持 GCP、AWS 和 Azure 的工作负载身份集成。从 5.14 开始,neo4j/neo4j-admin Helm 图表还支持 MinIO(一个与 AWS S3 兼容的对象存储 API)用于非 TLS/SSL 端点。
先决条件
在您备份数据库并将其上传到存储桶之前,请验证您是否具有以下条件:
-
一个云提供商存储桶(AWS、GCP 或 Azure),具有读取和写入权限,以便能够上传备份。
-
访问云提供商存储桶的凭据,例如 GCP 的服务帐户 JSON 密钥文件、AWS 的凭据文件或 Azure 的存储帐户凭据。
-
如果您想使用工作负载身份集成来访问云提供商存储桶,则需要一个具有工作负载身份的服务帐户。
-
有关在 GCP 和 AWS 上设置具有工作负载身份的服务帐户的更多信息,请参阅:
-
有关使用工作负载身份设置 Azure 存储帐户的更多信息,请参阅 Microsoft Azure → 将 Microsoft Entra 工作负载 ID 用于 Azure Kubernetes Service (AKS)
-
-
在其中一个云提供商上运行的 Kubernetes 集群,并已安装 Neo4j Helm 图表。有关更多信息,请参阅 快速入门:部署独立实例 或 快速入门:部署集群。
-
如果您想将备份推送到 MinIO 存储桶,则需要 MinIO 服务器(一个与 AWS S3 兼容的对象存储 API)。有关更多信息,请参阅 MinIO 官方文档。
-
最新的 Neo4j Helm 图表。您可以使用
helm repo update
更新存储库以获取最新的图表。
创建 Kubernetes 密钥
您可以使用以下选项之一创建具有可以访问云提供商存储桶的凭据的 Kubernetes 密钥:
使用您的 GCP 服务帐户 JSON 密钥文件创建名为 gcpcreds
的密钥。JSON 密钥文件包含具有对存储桶访问权限的服务帐户的所有详细信息。
kubectl create secret generic gcpcreds --from-file=credentials=/path/to/gcpcreds.json
-
创建以下格式的凭据文件:
[ default ] region = us-east-1 aws_access_key_id = <your-aws_access_key_id> aws_secret_access_key = <your-aws_secret_access_key>
-
通过凭据文件创建名为
awscreds
的密钥。kubectl create secret generic awscreds --from-file=credentials=/path/to/your/credentials
-
创建以下格式的凭据文件:
AZURE_STORAGE_ACCOUNT_NAME=<your-azure-storage-account-name> AZURE_STORAGE_ACCOUNT_KEY=<your-azure-storage-account-key>
-
通过凭据文件创建名为
azurecred
的密钥。kubectl create secret generic azurecred --from-file=credentials=/path/to/your/credentials
配置备份参数
您可以通过使用 secretName
和 secretKeyName
参数或通过将 Kubernetes 服务帐户映射到工作负载身份集成来在 backup-values.yaml 文件中配置备份参数。
以下示例显示了执行到云提供商存储桶的备份所需的最小配置。有关可用备份参数的更多信息,请参阅 备份参数。 |
使用 secretName
和 secretKeyName
参数配置 backup-values.yaml 文件
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.10.0"
jobSchedule: "* * * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
backoffLimit: 3
backup:
bucketName: "my-bucket"
databaseAdminServiceName: "standalone-admin" #This is the Neo4j Admin Service name.
database: "neo4j,system"
cloudProvider: "gcp"
secretName: "gcpcreds"
secretKeyName: "credentials"
consistencyCheck:
enabled: true
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.10.0"
jobSchedule: "* * * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
backoffLimit: 3
backup:
bucketName: "my-bucket"
databaseAdminServiceName: "standalone-admin"
database: "neo4j,system"
cloudProvider: "aws"
secretName: "awscreds"
secretKeyName: "credentials"
consistencyCheck:
enabled: true
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.10.0"
jobSchedule: "* * * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
backoffLimit: 3
backup:
bucketName: "my-bucket"
databaseAdminServiceName: "standalone-admin"
database: "neo4j,system"
cloudProvider: "azure"
secretName: "azurecreds"
secretKeyName: "credentials"
consistencyCheck:
enabled: true
使用服务帐户工作负载身份集成配置 backup-values.yaml 文件
在某些情况下,将 Kubernetes 服务帐户与工作负载身份集成分配给 Neo4j 备份 Pod 可能很有用。当您想要提高安全性并对 Pod 进行更精确的访问控制时,这一点尤其重要。这样做可确保根据 Pod 在云生态系统中的身份授予对资源的安全访问。有关设置具有工作负载身份的服务帐户的更多信息,请参阅 Google Kubernetes Engine (GKE) → 使用工作负载身份、Amazon EKS → 配置 Kubernetes 服务帐户以承担 IAM 角色 和 Microsoft Azure → 将 Microsoft Entra 工作负载 ID 用于 Azure Kubernetes Service (AKS)。
要配置 Neo4j 备份 Pod 以使用具有工作负载身份的 Kubernetes 服务帐户,请将 serviceAccountName
设置为要使用的服务帐户的名称。对于 Azure 部署,您还需要将 azureStorageAccountName
参数设置为 Azure 存储帐户的名称,备份文件将上传到该帐户。例如:
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.13.0"
jobSchedule: "* * * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
backoffLimit: 3
backup:
bucketName: "my-bucket"
databaseAdminServiceName: "standalone-admin" #This is the Neo4j Admin Service name.
database: "neo4j,system"
cloudProvider: "gcp"
secretName: ""
secretKeyName: ""
consistencyCheck:
enabled: true
serviceAccountName: "demo-service-account"
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.13.0"
jobSchedule: "* * * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
backoffLimit: 3
backup:
bucketName: "my-bucket"
databaseAdminServiceName: "standalone-admin"
database: "neo4j,system"
cloudProvider: "aws"
secretName: ""
secretKeyName: ""
consistencyCheck:
enabled: true
serviceAccountName: "demo-service-account"
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.13.0"
jobSchedule: "* * * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
backoffLimit: 3
backup:
bucketName: "my-bucket"
databaseAdminServiceName: "standalone-admin"
database: "neo4j,system"
cloudProvider: "azure"
azureStorageAccountName: "storageAccountName"
consistencyCheck:
enabled: true
serviceAccountName: "demo-service-account"
默认情况下创建的 /backups 挂载是 emptyDir 类型卷。这意味着存储在此卷中的数据不是持久性的,并且在 Pod 被删除时将丢失。要将持久卷用于备份,请将以下部分添加到 backup-values.yaml 文件中:
tempVolume:
persistentVolumeClaim:
claimName: backup-pvc
您需要在安装 neo4j-admin Helm 图表之前创建持久卷和持久卷声明。有关更多信息,请参阅 卷挂载和持久卷。 |
配置 backup-values.yaml 文件以使用 MinIO
此功能从 Neo4j 5.14 开始可用。
MinIO 是一个与 AWS S3 兼容的对象存储 API。您可以在 backup-values.yaml 文件中指定 minioEndpoint
参数以将备份推送到您的 MinIO 存储桶。此端点必须是 s3 API 端点,否则备份 Helm 图表将失败。仅支持非 TLS/SSL 端点。例如:
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.14.0"
jobSchedule: "* * * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
backoffLimit: 3
backup:
bucketName: "my-bucket"
databaseAdminServiceName: "standalone-admin"
minioEndpoint: "http://demo.minio.svc.cluster.local:9000"
database: "neo4j,system"
cloudProvider: "aws"
secretName: "awscreds"
secretKeyName: "credentials"
consistencyCheck:
enabled: true
准备将数据库备份到本地存储
此功能从 Neo4j 5.16 开始可用。
您可以使用 neo4j/neo4j-admin Helm 图表将 Neo4j 数据库备份到本地存储。配置 backup-values.yaml 文件时,请将“cloudProvider”字段保留为空,并在 tempVolume
部分提供持久卷,以确保如果 Pod 被删除,备份文件将保持持久性。
您需要在安装 neo4j-admin Helm 图表之前创建持久卷和持久卷声明。有关更多信息,请参阅 卷挂载和持久卷。 |
例如:
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.16.0"
jobSchedule: "* * * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
backoffLimit: 3
backup:
bucketName: "my-bucket"
databaseAdminServiceName: "standalone-admin"
database: "neo4j,system"
cloudProvider: ""
consistencyCheck:
enabled: true
tempVolume:
persistentVolumeClaim:
claimName: backup-pvc
备份参数
要查看 Helm 图表上哪些选项是可配置的,请使用 helm show values
和 Helm 图表 neo4j/neo4j-admin。
从 Neo4j 5.10 开始,neo4j/neo4j-admin Helm 图表也支持使用 nodeSelector
标签将您的 Neo4j Pod 分配到特定节点,从 Neo4j 5.11 开始,可以使用亲和性/反亲和性规则或容忍度。有关更多信息,请参阅 将备份 Pod 分配到特定节点 以及 Kubernetes 官方文档关于 亲和性和反亲和性 规则和 污点和容忍度。
例如:
helm show values neo4j/neo4j-admin
## @param nameOverride String to partially override common.names.fullname
nameOverride: ""
## @param fullnameOverride String to fully override common.names.fullname
fullnameOverride: ""
# disableLookups will disable all the lookups done in the helm charts
# This should be set to true when using ArgoCD since ArgoCD uses helm template and the helm lookups will fail
# You can enable this when executing helm commands with --dry-run command
disableLookups: false
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.21.0"
podLabels: {}
# app: "demo"
# acac: "dcdddc"
podAnnotations: {}
# ssdvvs: "svvvsvs"
# vfsvswef: "vcfvgb"
# define the backup job schedule . default is * * * * *
jobSchedule: ""
# default is 3
successfulJobsHistoryLimit:
# default is 1
failedJobsHistoryLimit:
# default is 3
backoffLimit:
#add labels if required
labels: {}
backup:
# Ensure the bucket is already existing in the respective cloud provider
# In case of azure the bucket is the container name in the storage account
# bucket: azure-storage-container
bucketName: ""
#address details of the neo4j instance from which backup is to be done (serviceName or ip either one is required)
#ex: standalone-admin.default.svc.cluster.local:6362
# admin service name - standalone-admin
# namespace - default
# cluster domain - cluster.local
# port - 6362
#ex: 10.3.3.2:6362
# admin service ip - 10.3.3.2
# port - 6362
databaseAdminServiceName: ""
databaseAdminServiceIP: ""
#default name is 'default'
databaseNamespace: ""
#default port is 6362
databaseBackupPort: ""
#default value is cluster.local
databaseClusterDomain: ""
# specify minio endpoint ex: http://demo.minio.svc.cluster.local:9000
# please ensure this endpoint is the s3 api endpoint or else the backup helm chart will fail
# as of now it works only with non tls endpoints
# to be used only when aws is used as cloudProvider
minioEndpoint: ""
#name of the database to backup ex: neo4j or neo4j,system (You can provide command separated database names)
# In case of comma separated databases failure of any single database will lead to failure of complete operation
database: ""
# cloudProvider can be either gcp, aws, or azure
# if cloudProvider is empty then the backup will be done to the /backups mount.
# the /backups mount can point to a persistentVolume based on the definition set in tempVolume
cloudProvider: ""
# name of the kubernetes secret containing the respective cloud provider credentials
# Ensure you have read,write access to the mentioned bucket
# For AWS :
# add the below in a file and create a secret via
# 'kubectl create secret generic awscred --from-file=credentials=/demo/awscredentials'
# [ default ]
# region = us-east-1
# aws_access_key_id = XXXXX
# aws_secret_access_key = XXXX
# For AZURE :
# add the storage account name and key in below format in a file create a secret via
# 'kubectl create secret generic azurecred --from-file=credentials=/demo/azurecredentials'
# AZURE_STORAGE_ACCOUNT_NAME=XXXX
# AZURE_STORAGE_ACCOUNT_KEY=XXXX
# For GCP :
# create the secret via the gcp service account json key file.
# ex: 'kubectl create secret generic gcpcred --from-file=credentials=/demo/gcpcreds.json'
secretName: ""
# provide the keyname used in the above secret
secretKeyName: ""
# provide the azure storage account name
# this to be provided when you are using workload identity integration for azure
azureStorageAccountName: ""
#setting this to true will not delete the backup files generated at the /backup mount
keepBackupFiles: true
#Below are all neo4j-admin database backup flags / options
#To know more about the flags read here : https://neo4j.ac.cn/docs/operations-manual/current/backup-restore/online-backup/
pageCache: ""
includeMetadata: "all"
type: "AUTO"
keepFailed: false
parallelRecovery: false
verbose: true
heapSize: ""
# https://neo4j.ac.cn/docs/operations-manual/current/backup-restore/aggregate/
# Performs aggregate backup. If enabled, NORMAL BACKUP WILL NOT BE DONE only aggregate backup
# fromPath supports only s3 or local mount. For s3 , please set cloudProvider to aws and use either serviceAccount or creds
aggregate:
enabled: false
verbose: true
keepOldBackup: false
parallelRecovery: false
# Only AWS S3 or local mount paths are supported
# For S3 provide the complete path , Ex: s3://bucket1/bucket2
fromPath: ""
# database name to aggregate. Can contain * and ? for globbing.
database: ""
#Below are all neo4j-admin database check flags / options
#To know more about the flags read here : https://neo4j.ac.cn/docs/operations-manual/current/tools/neo4j-admin/consistency-checker/
consistencyCheck:
enable: false
checkIndexes: true
checkGraph: true
checkCounts: true
checkPropertyOwners: true
#The database name for which consistency check needs to be done.
#Defaults to the backup.database values if left empty
#The database name here should match with one of the database names present in backup.database. If not , the consistency check will be ignored
database: ""
maxOffHeapMemory: ""
threads: ""
verbose: true
# Set to name of an existing Service Account to use if desired
# Follow the following links for setting up a service account with workload identity
# Azure - https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview?tabs=go
# GCP - https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity
# AWS - https://docs.aws.amazon.com/eks/latest/userguide/associate-service-account-role.html
serviceAccountName: ""
# Volume to use as temporary storage for files before they are uploaded to cloud. For large databases local storage may not have sufficient space.
# In that case set an ephemeral or persistent volume with sufficient space here
# The chart defaults to an emptyDir, use this to overwrite default behavior
#tempVolume:
# persistentVolumeClaim:
# claimName: backup-pvc
# securityContext defines privilege and access control settings for a Pod. Making sure that we don't run Neo4j as root user.
securityContext:
runAsNonRoot: true
runAsUser: 7474
runAsGroup: 7474
fsGroup: 7474
fsGroupChangePolicy: "Always"
# default ephemeral storage of backup container
resources:
requests:
ephemeralStorage: "4Gi"
cpu: ""
memory: ""
limits:
ephemeralStorage: "5Gi"
cpu: ""
memory: ""
# nodeSelector labels
# please ensure the respective labels are present on one of nodes or else helm charts will throw an error
nodeSelector: {}
# label1: "true"
# label2: "value1"
# set backup pod affinity
affinity: {}
# podAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# - labelSelector:
# matchExpressions:
# - key: security
# operator: In
# values:
# - S1
# topologyKey: topology.kubernetes.io/zone
# podAntiAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - weight: 100
# podAffinityTerm:
# labelSelector:
# matchExpressions:
# - key: security
# operator: In
# values:
# - S2
# topologyKey: topology.kubernetes.io/zone
#Add tolerations to the Neo4j pod
tolerations: []
# - key: "key1"
# operator: "Equal"
# value: "value1"
# effect: "NoSchedule"
# - key: "key2"
# operator: "Equal"
# value: "value2"
# effect: "NoSchedule"
备份您的数据库
要备份您的数据库,您可以使用配置的 backup-values.yaml 文件安装 neo4j-admin Helm 图表。
-
使用 backup-values.yaml 文件安装 neo4j-admin Helm 图表
helm install backup-name neo4j-admin -f /path/to/your/backup-values.yaml
neo4j/neo4j-admin Helm 图表安装一个 cron 作业,该作业根据作业计划启动一个 Pod。此 Pod 执行一个或多个数据库的备份、备份文件的一致性检查,并将它们上传到云提供商存储桶。
-
使用
kubectl logs pod/<neo4j-backup-pod-name>
监控备份 Pod 日志以检查备份进度。 -
检查备份文件和一致性检查报告是否已上传到云提供商存储桶或本地存储。
聚合数据库备份链
聚合备份命令将备份链转换为单个备份文件。当您有一个要还原到不同集群的备份链,或者当您想要存档备份链时,这很有用。有关聚合备份链操作的优势、其语法和可用选项的更多信息,请参阅 聚合数据库备份链。
neo4j-admin Helm 图表支持聚合存储在 AWS S3 存储桶或本地挂载中的备份链。如果启用,则不会执行正常的备份,只会执行聚合备份。 |
-
要聚合存储在 AWS S3 存储桶或本地挂载中的备份链,您需要在您的 backup-values.yaml 文件中提供以下信息:
如果您的备份链存储在 AWS S3 上,则需要将 cloudProvider 设置为
aws
并使用creds
或serviceAccount
连接到您的 AWS S3 存储桶。例如:使用awscreds
密钥连接到您的 AWS S3 存储桶neo4j: image: "neo4j/helm-charts-backup" imageTag: "5.21.0" jobSchedule: "* * * * *" successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 1 backoffLimit: 3 backup: cloudProvider: "aws" secretName: "awscreds" secretKeyName: "credentials" aggregate: enabled: true verbose: false keepOldBackup: false parallelRecovery: false fromPath: "s3://bucket1/bucket2" # Database name to aggregate. Can contain * and ? for globbing. database: "neo4j" resources: requests: ephemeralStorage: "4Gi" limits: ephemeralStorage: "5Gi"
使用serviceAccount
连接到您的 AWS S3 存储桶neo4j: image: "neo4j/helm-charts-backup" imageTag: "5.21.0" jobSchedule: "* * * * *" successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 1 backoffLimit: 3 backup: cloudProvider: "aws" aggregate: enabled: true verbose: false keepOldBackup: false parallelRecovery: false fromPath: "s3://bucket1/bucket2" # Database name to aggregate. Can contain * and ? for globbing. database: "neo4j" #The service account must already exist in your cloud provider account and have the necessary permissions to manage your S3 bucket, as well as to download and upload files. See the example policy below. #{ # "Version": "2012-10-17", # "Id": "Neo4jBackupAggregatePolicy", # "Statement": [ # { # "Sid": "Neo4jBackupAggregateStatement", # "Effect": "Allow", # "Action": [ # "s3:ListBucket", # "s3:GetObject", # "s3:PutObject", # "s3:DeleteObject" # ], # "Resource": [ # "arn:aws:s3:::mybucket/*", # "arn:aws:s3:::mybucket" # ] # } # ] #} serviceAccountName: "my-service-account" resources: requests: ephemeralStorage: "4Gi" limits: ephemeralStorage: "5Gi"
neo4j: image: "neo4j/helm-charts-backup" imageTag: "5.21.0" successfulJobsHistoryLimit: 1 failedJobsHistoryLimit: 1 backoffLimit: 1 backup: aggregate: enabled: true verbose: false keepOldBackup: false parallelRecovery: false fromPath: "/backups" # Database name to aggregate. Can contain * and ? for globbing. database: "neo4j" tempVolume: persistentVolumeClaim: claimName: aggregate-pv-pvc resources: requests: ephemeralStorage: "4Gi" limits: ephemeralStorage: "5Gi"
-
使用配置的 backup-values.yaml 文件安装 neo4j-admin Helm 图表。
helm install backup-name neo4j-admin -f /path/to/your/backup-values.yaml
-
使用
kubectl logs pod/<neo4j-aggregate-backup-pod-name>
监控 Pod 日志以检查聚合备份操作的进度。 -
验证聚合的备份文件是否已替换云提供商存储桶或本地存储中的备份链。
还原单个数据库
要还原单个脱机数据库或数据库备份,您首先需要删除要替换的数据库,除非您想将备份作为 DBMS 中的附加数据库还原。然后,使用 neo4j-admin
的还原命令还原数据库备份。最后,使用 Cypher 命令 CREATE DATABASE name
在 system
数据库中创建已还原的数据库。
删除要替换的数据库
在还原数据库备份之前,您必须使用针对 system
数据库的 Cypher 命令 DROP DATABASE name
删除要替换为该备份的数据库。如果要将备份作为 DBMS 中的附加数据库还原,则可以继续下一节。
对于 Neo4j 集群部署,您只需在一个集群服务器上运行 Cypher 命令 |
-
连接到 Neo4j DBMS
kubectl exec -it <release-name>-0 -- bash
-
使用
cypher-shell
连接到system
数据库cypher-shell -u neo4j -p <password> -d system
-
删除要替换为备份的数据库
DROP DATABASE neo4j;
-
退出 Cypher Shell 命令行控制台
:exit;
还原数据库备份
使用neo4j-admin database restore
命令恢复数据库备份,然后使用Cypher命令CREATE DATABASE name
在system
数据库中创建恢复的数据库。有关命令语法、选项和用法的详细信息,请参阅恢复数据库备份。
对于Neo4j集群部署,请在每个集群服务器上恢复数据库备份。 |
-
运行
neo4j-admin database restore
命令以恢复数据库备份。neo4j-admin database restore neo4j --from-path=/backups/neo4j --expand-commands
-
使用
cypher-shell
连接到system
数据库cypher-shell -u neo4j -p <password> -d system
-
创建
neo4j
数据库。对于Neo4j集群部署,您只需在一个集群服务器上运行Cypher命令
CREATE DATABASE name
。CREATE DATABASE neo4j;
-
在浏览器中打开http://<external-ip>:7474/browser/并检查所有数据是否已成功恢复。
-
对
neo4j
数据库执行Cypher命令,例如MATCH (n) RETURN n
如果您使用
--include-metadata
选项备份了数据库,则可以手动恢复用户和角色元数据。有关更多信息,请参阅恢复数据库备份→示例。
要恢复 |