You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: modules/ROOT/pages/scalability/scaling-with-neo4j.adoc
+24-33Lines changed: 24 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,17 +4,19 @@
4
4
5
5
Neo4j offers various options for scaling, tailored to specific use cases and requirements. Here are some of the supported scaling strategies:
6
6
7
-
* *Data replication via Neo4j clustering (read scalability)*-- via Neo4j clustering.
8
-
A Neo4j cluster is a collection of servers running Neo4j that are configured to communicate with each other.
9
-
These may be used to host databases, and the databases may be configured to replicate across servers in the cluster, thus achieving read scalability or high availability.
10
-
A minimum of three servers is required for the cluster to be fault-tolerant.
11
-
Neo4j cluster is good for:
12
-
13
-
** Horizontal, READ scalability
14
-
** Always on, highly available with disaster recovery and rolling upgrades (Neo4j 5.0+)
15
-
** Flexible infrastructure from 1 to many copies of the same database
7
+
* *Data replication via Neo4j analytics clustering (read scalability)*-- A Neo4j cluster is a high-availability cluster with multi-DB support.
8
+
It is a collection of servers running Neo4j that are configured to communicate with each other.
9
+
This means that servers and databases are decoupled: servers provide computation and storage power for databases to use.
10
+
Each database relies on its own cluster architecture, organized into primaries (with a minimum of 3 for high availability) and secondaries (for read scaling).
11
+
Scalability, allocation/reallocation, service elasticity, load balancing, and automatic routing are automatically provided (or they can be finely controlled).
12
+
+
13
+
xref:clustering/setup/analytics-cluster.adoc[Neo4j analytics cluster] is good for:
14
+
15
+
** Horizontal, read scalability
16
+
** Always on, highly available with disaster recovery and rolling upgrades (Neo4j 5.0+).
17
+
** Flexible infrastructure from 1 to many copies of the same database.
16
18
** Servers may be service-specific (analytical/transactional workloads, data science, reporting, etc.).
* *Data federation and sharding via composite database*-- using federated queries, Neo4j allows you to query multiple Neo4j databases with a single query.
20
22
The data is partitioned into smaller, more manageable pieces, called shards.
@@ -35,24 +37,27 @@ This allows, in theory, the unlimited growth of a graph.
35
37
label:preview[Preview feature] xref:scalability/sharded-property-databases/overview.adoc[Property sharding] (part of Infinigraph) allows you to decouple the properties attached to nodes and relationships and store them in separate graphs.
36
38
This architecture enables the independent scaling of property data, allowing for the handling of high volumes, heavy queries, and high read concurrency.
37
39
38
-
The following table summarizes the similarities and differences between composite databases and sharded property databases:
40
+
The following table summarizes the similarities and differences between analytics clustering, composite databases and sharded property databases:
39
41
40
-
.Similarities and differences between composite databases and sharded property databases
41
-
[cols="2,4a,4a",frame="topbot",options="header"]
42
+
.Similarities and differences between analytics clustering, composite databases and sharded property databases
`CALL {} IN TRANSACTION` for multiple, isolated read/write transactions with manual error handling
67
73
| Parallel read & write transactions on all shards +
68
74
Standard transaction management
69
75
70
76
| Data load
77
+
|
71
78
| Manually orchestrated import +
72
79
Ad-hoc, project-based, sharded import
73
80
| Initial and incremental data import via neo4j-admin and Aura importer
74
81
75
82
| Cypher queries
83
+
|
76
84
| Parallel execution on shards. +
77
85
Single database queries must be modified according to the *sharding rules*. +
78
86
Automated shard pruning using sharding functions.
@@ -81,39 +89,22 @@ Single database *queries run as is*. +
81
89
Automated shard pruning based on node selection.
82
90
83
91
| User tools
92
+
|
84
93
| Work with Browser and Cypher Shell. +
85
94
Tools used on individual shards and Bloom are not supported on composite databases.
86
95
| All tools supported.
87
96
88
97
| Admin tools
98
+
|
89
99
| Tools used on individual shards are not supported on composite databases.
90
100
| All tools supported.
91
101
92
102
| Libraries
103
+
|
93
104
| Supported on individual shards.
94
105
| All libraries supported.
95
106
|===
96
107
97
-
== Neo4j clustering
98
-
99
-
xref:clustering/index.adoc[Neo4j cluster] is a high-availability cluster with multi-DB support.
100
-
This means that servers and databases are decoupled: servers provide computation and storage power for databases to use.
101
-
Each database relies on its own cluster architecture, organized into primaries (with a minimum of 3) and secondaries (for read scaling).
102
-
Scalability, allocation/reallocation, service elasticity, load balancing, and automatic routing are automatically provided (or they can be finely controlled).
Online resharding (adding new shards, removing old ones, relocating data to accommodate the new topology) is currently not supported.
33
-
You can reshard your data via the `neo4j-admin database copy` command.
34
-
See xref:scalability/sharded-property-databases/data-ingestion.adoc#splitting-existing-db-into-shards[Splitting an existing database into shards] for more information.
35
-
36
-
Alternatively, you can select more shards than needed to start with and allow space for their data to grow, as the Neo4j cluster allows databases to be moved based on server availability.
32
+
=== Resizing
33
+
You can resize a sharded property database by adding or removing property shards.
34
+
You can select more shards than needed to start with and allow space for their data to grow, as the Neo4j cluster allows databases to be moved based on server availability.
37
35
For example, ten property shards can be initially hosted on five servers (two shards per server), and additional servers can be added as needed.
38
36
For details on managing databases and servers in a cluster, see xref:clustering/databases.adoc[Managing databases in a cluster] and xref:clustering/servers.adoc[Managing servers in a cluster].
39
37
38
+
=== Resharding
39
+
40
+
You can reshard your data via the `neo4j-admin database copy` command.
41
+
See xref:scalability/sharded-property-databases/data-ingestion.adoc#splitting-existing-db-into-shards[Splitting an existing database into shards] for more information.
42
+
40
43
//TODO: We should talk about co-location, adding/removing servers in a cluster and say what is supported and what is not.
41
44
42
-
[[backup-and-recovery]]
43
-
== Backup and recovery
45
+
[[backup-and-restore]]
46
+
== Backup and restore
44
47
45
48
A sharded property database is a database made up of multiple databases.
46
49
This means that when you want to back up a database, you must back up all the shards individually, resulting in a sharded property database backup that is composed of multiple smaller backup chains.
@@ -126,7 +129,7 @@ Failure to meet this requirement will make a given replica of a property shard u
126
129
127
130
If a property shard replica does fall behind the transaction log range available on the graph shard, you can recover it by:
128
131
129
-
. Connecting to the server hosting the affected replica using the _bolt://_ scheme.
132
+
. Connecting to the `system` database on the server hosting the affected replica using the _bolt://_ scheme.
130
133
. Quarantining the replica using xref:procedures.adoc#procedure_dbms_quarantineDatabase[`dbms.quarantineDatabase()`].
131
134
. Unquarantining the replica using xref:procedures.adoc#procedure_dbms_unquarantineDatabase[`dbms.unquarantineDatabase()`] with the `replaceStateReplaceStore` option.
132
135
This will force the replica to copy the database store files from another replica of the property shard.
@@ -135,7 +138,7 @@ If all replicas of a given property shard are behind, then the sharded property
135
138
This is an irrecoverable state.
136
139
Up until this point, losing replicas reduces fault tolerance, but the database remains available.
137
140
When a sharded property database becomes irrecoverable, it needs to be dropped and recreated from a backup.
138
-
See <<backup-and-recovery, Backup and recovery>>.
141
+
See <<backup-and-restore, Backup and restore>>.
139
142
140
143
One mechanism to avoid property shards falling out of range of the graph shard’s transaction log is to set a sufficiently large transaction log prune time on the graph shard.
141
144
See xref:scalability/sharded-property-databases/limitations-and-considerations.adoc#setting-suitable-tx-log-retention-policy[Setting a suitable transaction log retention policy].
| By default, the sharded property database is disabled.footnote:[This setting is a feature toggle behind which the sharded property database is developed. See xref:scalability/sharded-property-databases/overview.adoc[Property sharding overview].]
18
+
| By default, the sharded property database is disabled.footnote:[Property sharding is a preview feature. For details, see xref:scalability/sharded-property-databases/overview.adoc[Property sharding overview].]
19
19
20
20
| db.query.default_language=CYPHER_25
21
21
| Ensures that any database created will use Cypher 25 (unless users specifically override the default version in the `CREATE DATABASE` command).
22
-
See xref:configuration/cypher-version-configuration.adoc[Configure the Cypher default version] and link: shttps://neo4j.com/docs/cypher-manual/25/queries/select-version/[Cypher Manual -> Select Cypher version].
22
+
See xref:configuration/cypher-version-configuration.adoc[Configure the Cypher default version] and link:https://neo4j.com/docs/cypher-manual/25/queries/select-version/[Cypher Manual -> Select Cypher version].
= `CREATE DATABASE` command with sharded databases
5
+
6
+
You can create a sharded database using the Cypher command `CREATE DATABASE` (requires Cypher 25, introduced alongside Neo4j 2025.06.0).
7
+
For details on configuring the Cypher version, see xref:configuration/cypher-version-configuration.adoc[Configure the Cypher default version].
8
+
9
+
10
+
== Syntax
11
+
12
+
[options="header", width="100%", cols="1m,5a"]
13
+
|===
14
+
| Command | Syntax
15
+
16
+
| CREATE DATABASE
17
+
|
18
+
[source, syntax, role="noheader"]
19
+
----
20
+
CREATE DATABASE name [IF NOT EXISTS]
21
+
[[SET] GRAPH SHARD {
22
+
[TOPOLOGY n PRIMAR{Y\|IES} [m SECONDAR{Y\|IES}]]
23
+
}]
24
+
[SET] PROPER{TY\|IES} {
25
+
COUNT n [TOPOLOGY m REPLICA[S]]
26
+
}
27
+
[OPTIONS "{" option: value[, ...] "}"]
28
+
[WAIT [n [SEC[OND[S]]]]\|NOWAIT]
29
+
----
30
+
|===
31
+
32
+
When creating a sharded database, the following are created:
33
+
34
+
* A virtual sharded database `<name>`.
35
+
* A single graph shard with the name `<name>-g000`.
36
+
* A number of property shards with the name `<name>-p000<index>`.
37
+
The count property in `SET PROPERTY SHARDS` specifies the number of property shards.
38
+
39
+
[NOTE]
40
+
====
41
+
`CREATE OR REPLACE` does not replace an existing sharded database.
42
+
====
43
+
44
+
== Options
45
+
46
+
The `CREATE DATABASE` command can have a map of options, e.g., `OPTIONS {key: 'value'}`.
47
+
For sharded databases, only the seeding option is supported.
48
+
49
+
The following table describes the `seedUri` option:
50
+
51
+
[frame="topbot", grid="cols", cols="<1s,<4"]
52
+
|===
53
+
| *Key*
54
+
m| seedURI
55
+
| *Value*
56
+
a| URI to a folder containing all the backups or a list of dumps/backups.
57
+
58
+
[NOTE]
59
+
The folder notation only works for backups, not dumps.
60
+
61
+
When specifying each artifact manually the key of the map is the name of the shard.
62
+
Where shard name = `databaseName-g000` or `databaseName-p000` for property shards where the last shard name would be `databaseName-px` where `x = numShards -1`.
63
+
| *Description*
64
+
a| Defines an identical seed from an external source, which will be used to seed all servers. For more information, see xref::database-administration/standard-databases/seed-from-uri.adoc[Seed from a URI].
0 commit comments