Skip to content

Commit c1ad025

Browse files
committed
apply suggestions from review
1 parent e0ba7df commit c1ad025

15 files changed

+152
-173
lines changed

modules/ROOT/content-nav.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@
171171
*** xref:scalability/composite-databases/delete-composite-databases.adoc[]
172172
*** xref:scalability/composite-databases/querying-composite-databases.adoc[]
173173
*** xref:scalability/composite-databases/sharding-with-copy.adoc[]
174-
*** xref:scalability/composite-databases/scaling-with-composite-databases.adoc[]
174+
//*** xref:scalability/composite-databases/scaling-with-composite-databases.adoc[]
175175
** Property sharding (Preview feature)
176176
*** xref:scalability/sharded-property-databases/overview.adoc[]
177177
*** xref:scalability/sharded-property-databases/planning-and-sizing.adoc[]
55.7 KB
Loading
53.4 KB
Loading

modules/ROOT/pages/scalability/composite-databases/scaling-with-composite-databases.adoc

Lines changed: 0 additions & 3 deletions
This file was deleted.

modules/ROOT/pages/scalability/scaling-with-neo4j.adoc

Lines changed: 24 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,19 @@
44

55
Neo4j offers various options for scaling, tailored to specific use cases and requirements. Here are some of the supported scaling strategies:
66

7-
* *Data replication via Neo4j clustering (read scalability)* -- via Neo4j clustering.
8-
A Neo4j cluster is a collection of servers running Neo4j that are configured to communicate with each other.
9-
These may be used to host databases, and the databases may be configured to replicate across servers in the cluster, thus achieving read scalability or high availability.
10-
A minimum of three servers is required for the cluster to be fault-tolerant.
11-
Neo4j cluster is good for:
12-
13-
** Horizontal, READ scalability
14-
** Always on, highly available with disaster recovery and rolling upgrades (Neo4j 5.0+)
15-
** Flexible infrastructure from 1 to many copies of the same database
7+
* *Data replication via Neo4j analytics clustering (read scalability)* -- A Neo4j cluster is a high-availability cluster with multi-DB support.
8+
It is a collection of servers running Neo4j that are configured to communicate with each other.
9+
This means that servers and databases are decoupled: servers provide computation and storage power for databases to use.
10+
Each database relies on its own cluster architecture, organized into primaries (with a minimum of 3 for high availability) and secondaries (for read scaling).
11+
Scalability, allocation/reallocation, service elasticity, load balancing, and automatic routing are automatically provided (or they can be finely controlled).
12+
+
13+
xref:clustering/setup/analytics-cluster.adoc[Neo4j analytics cluster] is good for:
14+
15+
** Horizontal, read scalability
16+
** Always on, highly available with disaster recovery and rolling upgrades (Neo4j 5.0+).
17+
** Flexible infrastructure from 1 to many copies of the same database.
1618
** Servers may be service-specific (analytical/transactional workloads, data science, reporting, etc.).
17-
Multi-region, multi-tenant, SaaS-style scalability
19+
Multi-region, multi-tenant, SaaS-style scalability.
1820

1921
* *Data federation and sharding via composite database* -- using federated queries, Neo4j allows you to query multiple Neo4j databases with a single query.
2022
The data is partitioned into smaller, more manageable pieces, called shards.
@@ -35,24 +37,27 @@ This allows, in theory, the unlimited growth of a graph.
3537
label:preview[Preview feature] xref:scalability/sharded-property-databases/overview.adoc[Property sharding] (part of Infinigraph) allows you to decouple the properties attached to nodes and relationships and store them in separate graphs.
3638
This architecture enables the independent scaling of property data, allowing for the handling of high volumes, heavy queries, and high read concurrency.
3739
38-
The following table summarizes the similarities and differences between composite databases and sharded property databases:
40+
The following table summarizes the similarities and differences between analytics clustering, composite databases and sharded property databases:
3941

40-
.Similarities and differences between composite databases and sharded property databases
41-
[cols="2,4a,4a",frame="topbot",options="header"]
42+
.Similarities and differences between analytics clustering, composite databases and sharded property databases
43+
[cols="2,4a,4a,4a",frame="topbot",options="header"]
4244
|===
4345
|
46+
| Analytics cluster
4447
| Composite database
4548
| Sharded property database
4649

4750

4851
| Typical use cases
52+
|
4953
| *Federated data* +
5054
Time-based sharding +
5155
*Application-based access*
5256
| *Graphs with a large volume of properties* +
5357
Ideal for vector and full-text search
5458

5559
| Scalability
60+
|
5661
| *Data volume: unlimited* +
5762
Read concurrency: horizontal scale on multiple instances +
5863
*Write concurrency: horizontal scale depending on the graph model*
@@ -61,18 +66,21 @@ Read concurrency: horizontal scale on multiple instances +
6166
*Write concurrency: single instance*
6267

6368
| Transactions
69+
|
6470
| Parallel read transactions +
6571
Single-shard write transactions +
6672
`CALL {} IN TRANSACTION` for multiple, isolated read/write transactions with manual error handling
6773
| Parallel read & write transactions on all shards +
6874
Standard transaction management
6975

7076
| Data load
77+
|
7178
| Manually orchestrated import +
7279
Ad-hoc, project-based, sharded import
7380
| Initial and incremental data import via neo4j-admin and Aura importer
7481

7582
| Cypher queries
83+
|
7684
| Parallel execution on shards. +
7785
Single database queries must be modified according to the *sharding rules*. +
7886
Automated shard pruning using sharding functions.
@@ -81,39 +89,22 @@ Single database *queries run as is*. +
8189
Automated shard pruning based on node selection.
8290

8391
| User tools
92+
|
8493
| Work with Browser and Cypher Shell. +
8594
Tools used on individual shards and Bloom are not supported on composite databases.
8695
| All tools supported.
8796

8897
| Admin tools
98+
|
8999
| Tools used on individual shards are not supported on composite databases.
90100
| All tools supported.
91101

92102
| Libraries
103+
|
93104
| Supported on individual shards.
94105
| All libraries supported.
95106
|===
96107

97-
== Neo4j clustering
98-
99-
xref:clustering/index.adoc[Neo4j cluster] is a high-availability cluster with multi-DB support.
100-
This means that servers and databases are decoupled: servers provide computation and storage power for databases to use.
101-
Each database relies on its own cluster architecture, organized into primaries (with a minimum of 3) and secondaries (for read scaling).
102-
Scalability, allocation/reallocation, service elasticity, load balancing, and automatic routing are automatically provided (or they can be finely controlled).
103-
104-
image::scalability/cluster.png[title="some title.", role="middle"]
105-
106-
107-
== Composite databases
108-
109-
Composite databases enable queries to access multiple graphs simultaneously.
110-
They provide:
111-
112-
* *Data Federation:* the ability to access data available in distributed sources in the form of disjoint graphs.
113-
* *Data Sharding:* the ability to access data available in distributed sources in the form of a common graph partitioned on multiple databases.
114-
115-
For more information, see xref:scalability/composite-databases/concepts.adoc[Composite databases].
116-
117108
//TODO
118109
//Admin considerations
119110

modules/ROOT/pages/scalability/sharded-property-databases/admin-operations.adoc

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -29,18 +29,21 @@ ENABLE SERVER 'serverId' OPTIONS { allowedDatabases: ['foo-sharded-p000'] }
2929

3030
== Resizing and resharding
3131

32-
Online resharding (adding new shards, removing old ones, relocating data to accommodate the new topology) is currently not supported.
33-
You can reshard your data via the `neo4j-admin database copy` command.
34-
See xref:scalability/sharded-property-databases/data-ingestion.adoc#splitting-existing-db-into-shards[Splitting an existing database into shards] for more information.
35-
36-
Alternatively, you can select more shards than needed to start with and allow space for their data to grow, as the Neo4j cluster allows databases to be moved based on server availability.
32+
=== Resizing
33+
You can resize a sharded property database by adding or removing property shards.
34+
You can select more shards than needed to start with and allow space for their data to grow, as the Neo4j cluster allows databases to be moved based on server availability.
3735
For example, ten property shards can be initially hosted on five servers (two shards per server), and additional servers can be added as needed.
3836
For details on managing databases and servers in a cluster, see xref:clustering/databases.adoc[Managing databases in a cluster] and xref:clustering/servers.adoc[Managing servers in a cluster].
3937

38+
=== Resharding
39+
40+
You can reshard your data via the `neo4j-admin database copy` command.
41+
See xref:scalability/sharded-property-databases/data-ingestion.adoc#splitting-existing-db-into-shards[Splitting an existing database into shards] for more information.
42+
4043
//TODO: We should talk about co-location, adding/removing servers in a cluster and say what is supported and what is not.
4144

42-
[[backup-and-recovery]]
43-
== Backup and recovery
45+
[[backup-and-restore]]
46+
== Backup and restore
4447

4548
A sharded property database is a database made up of multiple databases.
4649
This means that when you want to back up a database, you must back up all the shards individually, resulting in a sharded property database backup that is composed of multiple smaller backup chains.
@@ -126,7 +129,7 @@ Failure to meet this requirement will make a given replica of a property shard u
126129

127130
If a property shard replica does fall behind the transaction log range available on the graph shard, you can recover it by:
128131

129-
. Connecting to the server hosting the affected replica using the _bolt://_ scheme.
132+
. Connecting to the `system` database on the server hosting the affected replica using the _bolt://_ scheme.
130133
. Quarantining the replica using xref:procedures.adoc#procedure_dbms_quarantineDatabase[`dbms.quarantineDatabase()`].
131134
. Unquarantining the replica using xref:procedures.adoc#procedure_dbms_unquarantineDatabase[`dbms.unquarantineDatabase()`] with the `replaceStateReplaceStore` option.
132135
This will force the replica to copy the database store files from another replica of the property shard.
@@ -135,7 +138,7 @@ If all replicas of a given property shard are behind, then the sharded property
135138
This is an irrecoverable state.
136139
Up until this point, losing replicas reduces fault tolerance, but the database remains available.
137140
When a sharded property database becomes irrecoverable, it needs to be dropped and recreated from a backup.
138-
See <<backup-and-recovery, Backup and recovery>>.
141+
See <<backup-and-restore, Backup and restore>>.
139142

140143
One mechanism to avoid property shards falling out of range of the graph shard’s transaction log is to set a sufficiently large transaction log prune time on the graph shard.
141144
See xref:scalability/sharded-property-databases/limitations-and-considerations.adoc#setting-suitable-tx-log-retention-policy[Setting a suitable transaction log retention policy].

modules/ROOT/pages/scalability/sharded-property-databases/altering-sharded-databases.adoc

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -81,10 +81,4 @@ SET TOPOLOGY 1 PRIMARY 2 SECONDARIES;
8181
----
8282
ALTER DATABASE `foo-sharded-p000`
8383
SET TOPOLOGY 2 REPLICAS;
84-
----
85-
86-
[NOTE]
87-
====
88-
Resharding is currently not supported.
89-
When the database is operational, altering a property shard can only be done by altering the number of replicas per graph shard.
90-
====
84+
----

modules/ROOT/pages/scalability/sharded-property-databases/configuration.adoc

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,18 @@ To enable the property sharding in your cluster, you must configure the followin
1515
| Configuration setting | Description
1616

1717
| internal.dbms.sharded_property_database.enabled=true
18-
| By default, the sharded property database is disabled.footnote:[This setting is a feature toggle behind which the sharded property database is developed. See xref:scalability/sharded-property-databases/overview.adoc[Property sharding overview].]
18+
| By default, the sharded property database is disabled.footnote:[Property sharding is a preview feature. For details, see xref:scalability/sharded-property-databases/overview.adoc[Property sharding overview].]
1919

2020
| db.query.default_language=CYPHER_25
2121
| Ensures that any database created will use Cypher 25 (unless users specifically override the default version in the `CREATE DATABASE` command).
22-
See xref:configuration/cypher-version-configuration.adoc[Configure the Cypher default version] and link: shttps://neo4j.com/docs/cypher-manual/25/queries/select-version/[Cypher Manual -> Select Cypher version].
22+
See xref:configuration/cypher-version-configuration.adoc[Configure the Cypher default version] and link:https://neo4j.com/docs/cypher-manual/25/queries/select-version/[Cypher Manual -> Select Cypher version].
2323

2424
| internal.dbms.cluster.experimental_protocol_version.dbms_enabled=true
2525
| Allows users to take valid backups of a sharded database.
26+
27+
|internal.dbms.single_raft_enabled=true
28+
| Allows a sharded property database to start with 1 primary for the graph shard and scale up to 3 at a later date.
29+
It is not needed if you will always run 3 primary graph shard.
2630
|===
2731

2832

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
:description: This page describes how to create a sharded property database using the `CREATE DATABASE` command.
2+
:page-role: new-2025.10 enterprise-edition not-on-aura
3+
:keywords: sharded property database, CREATE DATABASE, Cypher 25
4+
= `CREATE DATABASE` command with sharded databases
5+
6+
You can create a sharded database using the Cypher command `CREATE DATABASE` (requires Cypher 25, introduced alongside Neo4j 2025.06.0).
7+
For details on configuring the Cypher version, see xref:configuration/cypher-version-configuration.adoc[Configure the Cypher default version].
8+
9+
10+
== Syntax
11+
12+
[options="header", width="100%", cols="1m,5a"]
13+
|===
14+
| Command | Syntax
15+
16+
| CREATE DATABASE
17+
|
18+
[source, syntax, role="noheader"]
19+
----
20+
CREATE DATABASE name [IF NOT EXISTS]
21+
[[SET] GRAPH SHARD {
22+
[TOPOLOGY n PRIMAR{Y\|IES} [m SECONDAR{Y\|IES}]]
23+
}]
24+
[SET] PROPER{TY\|IES} {
25+
COUNT n [TOPOLOGY m REPLICA[S]]
26+
}
27+
[OPTIONS "{" option: value[, ...] "}"]
28+
[WAIT [n [SEC[OND[S]]]]\|NOWAIT]
29+
----
30+
|===
31+
32+
When creating a sharded database, the following are created:
33+
34+
* A virtual sharded database `<name>`.
35+
* A single graph shard with the name `<name>-g000`.
36+
* A number of property shards with the name `<name>-p000<index>`.
37+
The count property in `SET PROPERTY SHARDS` specifies the number of property shards.
38+
39+
[NOTE]
40+
====
41+
`CREATE OR REPLACE` does not replace an existing sharded database.
42+
====
43+
44+
== Options
45+
46+
The `CREATE DATABASE` command can have a map of options, e.g., `OPTIONS {key: 'value'}`.
47+
For sharded databases, only the seeding option is supported.
48+
49+
The following table describes the `seedUri` option:
50+
51+
[frame="topbot", grid="cols", cols="<1s,<4"]
52+
|===
53+
| *Key*
54+
m| seedURI
55+
| *Value*
56+
a| URI to a folder containing all the backups or a list of dumps/backups.
57+
58+
[NOTE]
59+
The folder notation only works for backups, not dumps.
60+
61+
When specifying each artifact manually the key of the map is the name of the shard.
62+
Where shard name = `databaseName-g000` or `databaseName-p000` for property shards where the last shard name would be `databaseName-px` where `x = numShards -1`.
63+
| *Description*
64+
a| Defines an identical seed from an external source, which will be used to seed all servers. For more information, see xref::database-administration/standard-databases/seed-from-uri.adoc[Seed from a URI].
65+
| *Example*
66+
|
67+
[source, syntax, role="noheader"]
68+
----
69+
seedUri: {
70+
`foo-sharded-g000`: "s3://bucket/folder/foo-g000.backup",
71+
`foo-sharded-p000`: "s3://bucket/folder/foo-p001.backup",
72+
`foo-sharded-p001`: "s3://bucket/folder/foo-p002.backup"
73+
}
74+
----
75+
Or
76+
[source, syntax, role="noheader"]
77+
----
78+
seedUri: "s3://bucket/folder/"
79+
----
80+
|===
81+
82+
== Default numbers for topology
83+
84+
The sharded property databases use the Neo4j cluster topology.
85+
Therefore, you need to consider how the following settings will affect the creation of your sharded property database.
86+
87+
[options="header", width="100%", cols="4m,1m,1m,3a"]
88+
|===
89+
| Configuration settings with their default value
90+
| Default value
91+
| Valid values
92+
| Description
93+
94+
|initial.dbms.default_primaries_count
95+
| 1
96+
| [1-10]
97+
| The default number of primaries for the graph shard when the database is created.
98+
99+
|initial.dbms.default_secondaries_count
100+
| 0
101+
| [0-19]
102+
| The default number of secondaries for the graph shard when the database is created.
103+
|===

0 commit comments

Comments
 (0)