Skip to content

Commit 840cdbf

Browse files
committed
editorial updates
1 parent 08a2a8e commit 840cdbf

11 files changed

+65
-60
lines changed

modules/ROOT/pages/scalability/concepts.adoc

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@
33
= Concepts
44

55
Scalability is a crucial aspect of database management, allowing a system to handle changing demands by adding and removing resources to meet the demands of a database's workload.
6-
Neo4j supports multiple strategies to achieve scalability, enabling systems to handle larger datasets, more concurrent users, and higher query complexity without compromising performance or availability, i.e. system's resiliency.
6+
Neo4j supports multiple strategies to achieve scalability, enabling systems to handle larger datasets, more concurrent users, and higher query complexity without compromising performance or availability, i.e. the system's resiliency.
77
The three main strategies are:
88

9-
* Clustering -- for horizontal read scalability.
10-
* Composite databases -- for federated queries and distributed data management.
11-
* Property sharding -- for handling massive property-heavy graphs.
9+
* xref:clustering/setup/analytics-cluster.adoc[Analytics clustering] -- for horizontal read scalability.
10+
* xref:scalability/composite-databases/concepts.adoc[Composite databases] -- for federated queries and distributed data management.
11+
* xref:scalability/sharded-property-databases/overview.adoc[Property sharding] -- for handling massive property-heavy graphs.
1212
1313
== What is scalability?
1414

@@ -38,8 +38,8 @@ There are two primary methods to achieve scalability:
3838

3939
== What is database scalability?
4040

41-
Database scalability is the ability of the database management system (DBMS) to handle changing demands.
42-
To scale properly, a database needs to use strategies that cover all areas: data access, data manipulation in memory, and database computing.
41+
Database scalability is the ability of a database management system (DBMS) to handle changing demands.
42+
To scale properly, a database must apply strategies that cover all areas: data access, data manipulation in memory, and database computing.
4343

4444
Strategies include:
4545

@@ -51,9 +51,9 @@ Strategies include:
5151

5252
** *Shared Everything*: All servers share data and memory.
5353
Flexible, but prone to contention. +
54-
In this model, data on disk and in memory are shared among all servers in a cluster.
54+
In this model, data is shared between disk and memory across all servers in a cluster.
5555
Requests are satisfied by any combination of servers.
56-
This approach introduces complexity as the cluster must implement a way to avoid contention when multiple servers try to update the same data simultaneously.
56+
This approach introduces complexity, as the cluster must implement a way to avoid contention when multiple servers try to update the same data simultaneously.
5757

5858
** *Shared Nothing*: Each server manages its own partition (shard).
5959
More fault-tolerant, eliminates single points of failure. +
@@ -68,10 +68,11 @@ Graph database scalability refers to the ability of a database to handle differe
6868
It includes:
6969

7070
* *Data volume* - involves ensuring a consistent SLA in both query and administration response times, even as the size of the data for storage and retrieval expands. +
71-
Volume depends on data type(s). Vectors occupy a large data space.
71+
Volume depends on data type(s).
72+
Vectors occupy a large data space.
7273

7374
* *Query volume*
74-
** Read queries + write queries
75+
** Read queries + write queries.
7576
** Queries and user concurrency -- the aim is to ensure a linear response time during the execution of concurrent queries against the same database.
7677
** Query complexity -- provide response time in line with the complexity of a query. The complexity of a query can be set by the combination of:
7778
*** Steps to execute
@@ -82,9 +83,9 @@ Volume depends on data type(s). Vectors occupy a large data space.
8283

8384
* *Admin volume*
8485
** Data ingestion/extraction -- When scaling data ingestion/extraction, the goal is to maintain a linear response time when ingesting or extracting an increasing set of data.
85-
This objective holds true irrespective of the volume of stored data, assuming a similar data structure.
86+
This objective remains true regardless of the volume of stored data, provided a similar data structure is used.
8687
** Multi-tenancy -- In SaaS and AaaS environments, the scaling cost for tenants should exhibit linearity.
87-
For more general services like DBaaS (e.g., Aura), scalability should also be linear, considering all five scalability factors mentioned here.
88+
For more general services, such as DBaaS (e.g., Aura), scalability should also be linear, considering all five scalability factors mentioned here.
8889

8990

9091

modules/ROOT/pages/scalability/scaling-with-neo4j.adoc

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -74,15 +74,15 @@ Ad-hoc, project-based, sharded import
7474

7575
| Cypher queries
7676
| Parallel execution on shards. +
77-
Single database queries must be modified, depending on the *sharding rules*. +
77+
Single database queries must be modified according to the *sharding rules*. +
7878
Automated shard pruning using sharding functions.
7979
| Parallel execution on shards. +
8080
Single database *queries run as is*. +
8181
Automated shard pruning based on node selection.
8282

8383
| User tools
8484
| Work with Browser and Cypher Shell. +
85-
Tools used on individual shards, Bloom is not supported on composite databases.
85+
Tools used on individual shards and Bloom are not supported on composite databases.
8686
| All tools supported.
8787

8888
| Admin tools
@@ -98,15 +98,15 @@ Tools used on individual shards, Bloom is not supported on composite databases.
9898

9999
xref:clustering/index.adoc[Neo4j cluster] is a high-availability cluster with multi-DB support.
100100
This means that servers and databases are decoupled: servers provide computation and storage power for databases to use.
101-
Each database relies on its own cluster architecture, organized in primaries (>=3) and secondaries (for read scaling).
101+
Each database relies on its own cluster architecture, organized into primaries (with a minimum of 3) and secondaries (for read scaling).
102102
Scalability, allocation/reallocation, service elasticity, load balancing, and automatic routing are automatically provided (or they can be finely controlled).
103103

104104
image::scalability/cluster.png[title="some title.", role="middle"]
105105

106106

107107
== Composite databases
108108

109-
Composite databases allow queries to access multiple graphs at once.
109+
Composite databases enable queries to access multiple graphs simultaneously.
110110
They provide:
111111

112112
* *Data Federation:* the ability to access data available in distributed sources in the form of disjoint graphs.

modules/ROOT/pages/scalability/sharded-property-databases/admin-operations.adoc

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,12 @@
33
:keywords: sharded property databases, sharding, admin operations, aliases, servers, backup, recovery, failover
44
= Admin operations
55

6+
Sharded property databases are managed similarly to standard Neo4j databases, with some differences in certain administrative operations.
7+
68
== Managing aliases for sharded databases
79

810
When creating an alias for a sharded database, use the virtual database name when specifying it as the alias target.
9-
The following example shows how to create the alias foo for the sharded database foo-sharded:
11+
The following example shows how to create the alias `foo` for the sharded database `foo-sharded`:
1012

1113
[source, cypher]
1214
----
@@ -22,7 +24,7 @@ The following example shows how to enable a server and allow allocating the prop
2224

2325
[source, cypher]
2426
----
25-
ENABLE SERVER server name OPTIONS { allowedDatabases: [foo-sharded-p000] }
27+
ENABLE SERVER 'serverId' OPTIONS { allowedDatabases: ['foo-sharded-p000'] }
2628
----
2729

2830
== Resizing and resharding
@@ -31,7 +33,9 @@ Online resharding (adding new shards, removing old ones, relocating data to acco
3133
You can reshard your data via the `neo4j-admin database copy` command.
3234
See xref:scalability/sharded-property-databases/data-ingestion.adoc#splitting-existing-db-into-shards[Splitting an existing database into shards] for more information.
3335

34-
Alternatively, you can select more shards than needed to start with and allow space for their data to grow, as the Neo4j cluster allows databases to be moved based on server availability. For example, 10 property shards can be initially hosted on 5 servers (2 shards per server), and additional servers can be added as needed.
36+
Alternatively, you can select more shards than needed to start with and allow space for their data to grow, as the Neo4j cluster allows databases to be moved based on server availability.
37+
For example, ten property shards can be initially hosted on five servers (two shards per server), and additional servers can be added as needed.
38+
For details on managing databases and servers in a cluster, see xref:clustering/databases.adoc[Managing databases in a cluster] and xref:clustering/servers.adoc[Managing servers in a cluster].
3539

3640
//TODO: We should talk about co-location, adding/removing servers in a cluster and say what is supported and what is not.
3741

@@ -45,7 +49,7 @@ Backup chains for each shard are produced using the neo4j-admin database backup.
4549
For the graph shard, its backup chain must contain one full artefact and 0+ differential artefacts.
4650
Each property shard’s backup chain must contain only one full backup and no differential backups.
4751
In practical terms, this means that to back up a sharded property database, you start with a full backup of the graph shard and then all of the property shards; any subsequent differential backups would only need to be of the graph shard.
48-
This is because the transaction log of the property shards is the same as the graph shard log and is just filtered when applied, so only the graph shard log is required for a restore.
52+
This is because the transaction log of the property shards is the same as the graph shard log and is simply filtered when applied, so only the graph shard log is required for a restore.
4953

5054
For example, assume there is a sharded property database called `foo` with a graph shard and 2 property shards.
5155
A backup must be taken of each shard, for example:
@@ -109,8 +113,8 @@ To form a valid sharded property database backup, you need to:
109113
* Take a full backup of the property shard `foo-p000` so that its store at least includes transaction 5.
110114
* Take a differential backup of the graph shard so that at least transaction 12 is included in its transaction log, so `foo-p001` is included in its range.
111115

112-
Once a valid sharded properties database backup is formed, then differential backups can be performed by taking differential backups of the graph shard, extending the range of the graph shard chain.
113-
Continuing with the example, the graph chain contains transactions from 11-36, property shard 1’s store files are at 13, and property shard 2’s store files are at 30.
116+
Once a valid sharded properties database backup is created, differential backups can be performed by taking differential backups of the graph shard, extending the range of the graph shard chain.
117+
Continuing with the example, the graph chain contains transactions from 11 to 36, property shard 1’s store files are at 13, and property shard 2’s store files are at 30.
114118
You then take a differential backup of the graph shard containing transactions 37 to 50.
115119
At restore time, all databases can be recovered up to transaction 50 and made consistent.
116120

modules/ROOT/pages/scalability/sharded-property-databases/altering-sharded-databases.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33

44
= Altering sharded property databases
55

6-
A sharded database can be altered on two levels.
7-
It is possible to change the entire sharded database with ALTER DATABASE or alter a specific shard with `ALTER DATABASE <shard-name>`.
6+
You can alter a sharded property database on two levels.
7+
It is possible to change the entire sharded database with `ALTER DATABASE` or alter a specific shard with `ALTER DATABASE <shard-name>`.
88

99
== Syntax
1010

modules/ROOT/pages/scalability/sharded-property-databases/configuration.adoc

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,9 @@
22
:description: This page describes the system requirements and configuration settings for sharded property databases.
33
= System requirements and configuration
44

5-
65
== System requirements
76

8-
The sharded property database requires the same xref:installation/requirements.adoc[system requirements] as Neo4j 2025.06 and later versions.
7+
The sharded property database requires the same xref:installation/requirements.adoc[system requirements] as Neo4j 2025.10 and later versions.
98

109
== Configuration settings
1110

@@ -16,8 +15,7 @@ To enable the property sharding in your cluster, you must configure the followin
1615
| Configuration setting | Description
1716

1817
| internal.dbms.sharded_property_database.enabled=true
19-
| By default, the sharded property database is disabled. This setting is a feature toggle behind which the sharded property database is developed.
20-
See xref:scalability/sharded-property-databases/overview.adoc[Property sharding overview].
18+
| By default, the sharded property database is disabled.footnote:[This setting is a feature toggle behind which the sharded property database is developed. See xref:scalability/sharded-property-databases/overview.adoc[Property sharding overview].]
2119

2220
| db.query.default_language=CYPHER_25
2321
| Ensures that any database created will use Cypher 25 (unless users specifically override the default version in the `CREATE DATABASE` command).

modules/ROOT/pages/scalability/sharded-property-databases/deleting-sharded-databases.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
= Deleting sharded property databases
55

66
Sharded databases can be deleted using the `DROP DATABASE` command.
7-
Note that all database aliases must be dropped before dropping a database.
7+
Note that you must drop all database aliases before dropping a database.
88

99
.Syntax
1010
[options="header", width="100%", cols="1m,5a"]

modules/ROOT/pages/scalability/sharded-property-databases/limitations-and-considerations.adoc

Lines changed: 20 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@
66

77
=== CDC
88

9-
CDC is not supported in this version.
9+
CDC is not supported in this version.
1010

1111
=== Unsupported procedures
1212

13-
The following procedures are not supported by sharded databases:
13+
The following procedures are not supported by sharded property databases:
1414

1515
* cdc.earliest()
1616
* cdc.current()
@@ -29,14 +29,14 @@ The following procedures are not supported by sharded databases:
2929

3030
[NOTE]
3131
====
32-
It is strongly recommended not to use `dbms.setConfigValue()` on sharded databases as sharded databases run in a clustered environment, which means the procedure must be run against each cluster member and is not propagated to other members.
33-
In particular, `dbms.setConfigValue()` cannot be used to set read-only behaviour as the two settings `server.databases.read_only` and `server.databases.writable` are not compatible with sharded databases.
32+
It is strongly recommended not to use `dbms.setConfigValue()` on sharded property databases, as sharded property databases run in a clustered environment, which means the procedure must be run against each cluster member and is not propagated to other members.
33+
In particular, `dbms.setConfigValue()` cannot be used to set read-only behavior as the two settings `server.databases.read_only` and `server.databases.writable` are not compatible with sharded property databases.
3434
The correct way of setting read/write access is by using `ALTER DATABASE`.
35+
See xref:scalability/sharded-property-databases/altering-sharded-databases.adoc[Altering sharded property databases] for details.
3536
====
3637

3738
=== Property-based access control (PBAC)
3839

39-
4040
PBAC is not supported in this version.
4141

4242
=== `USE graph.byElementId()`
@@ -48,8 +48,7 @@ Calling `USE graph.byElementId(<element-id>)` with an element of a sharded datab
4848
=== Queries with `MERGE` clause
4949

5050
`MERGE` queries are very slow at any meaningful scale.
51-
Due to their plan, they are likely to cause a nested loop join, which does not perform well on SPD at the moment.
52-
We are looking to fix this soon.
51+
Due to their plan, they are likely to cause a nested loop join, which does not perform well on sharded property databases at the moment.
5352

5453
=== Filtering on properties in paths
5554

@@ -63,20 +62,22 @@ WHERE k.creationDate=1268465841718
6362
RETURN n,k,m
6463
----
6564

66-
This could be rewritten to be:
65+
This could be rewritten to be to perform better as follows:
6766

6867
[source, cypher]
6968
----
7069
MATCH (n:Person)[k:KNOWS{creationDate=1268465841718}]>+(m:Person)
7170
RETURN n,k,m
7271
----
7372

74-
Which would perform much better, but not all queries can be rewritten in this way.
73+
However, not all queries can be rewritten in this way.
7574

7675
=== Call in transactions for batch write operations
7776

78-
Because of the write architecture, creating larger transactions when doing write operations that can be batched will give large performance benefits.
79-
For example:
77+
Because of the write architecture, batching larger transactions during write operations gives significant performance benefits.
78+
This is also true for single instance databases, but the performance difference is more pronounced in sharded property databases.
79+
80+
For example, consider the following query:
8081

8182
[source, cypher]
8283
----
@@ -94,7 +95,7 @@ FOR each update IN node_updates DO
9495
END FOR
9596
----
9697

97-
can be rewritten in a much more performant way as follows:
98+
It can be rewritten as follows to perform better:
9899

99100
[source, cypher]
100101
----
@@ -110,28 +111,25 @@ SET n.name = u.name,
110111
n.age = u.age
111112
----
112113

113-
This is the same advice that would be given for a non-sharded Neo4j database, but it is doubly important for a property-sharded database.
114-
115114
== Other considerations
116115

117116
=== `neo4j-admin database copy` to a sharded property database
118117

119-
When using the `neo4j-admin database copy --property-shard-count > 0` command to split an existing database into shards, it is not possible to copy in place, meaning you cannot replace your existing database with a sharded property database.
120-
You must specify a new name or set `--to-path-data` and `--to-path-txn` or `--target-location={path|uri}`
121-
`--target-format={database|backup}` to a new DBMS location.
118+
When using the `neo4j-admin database copy --property-shard-count > 0` command to split an existing database into shards, it is not possible to copy in place, meaning you cannot replace your existing database with a sharded property database.
119+
Instead, you must specify a new name or set `--to-path-data` and `--to-path-txn` or `--target-location={path|uri}` and `--target-format={database|backup}` to a new DBMS location.
122120

123121
=== `USE` clause with sharded databases
124122

125123
When targeting a sharded database in a `USE` clause, use its virtual database name or an alias in the graph reference.
124+
Targeting a shard directly is not supported.
125+
126126
For example:
127127

128128
[source, cypher]
129129
----
130130
USE `neo4j-sharded` MATCH (n) RETURN n
131131
----
132132

133-
Targeting a shard directly is not supported.
134-
135133
=== Cypher 5
136134

137135
Cypher 5 is unsupported for sharded property databases.
@@ -145,7 +143,7 @@ See xref:configuration/cypher-version-configuration.adoc[Configure the Cypher de
145143
Property shards pull transaction log entries from the graph shard and apply them to their stores.
146144
Thus, there is a requirement that the graph shard may not prune an entry from its transaction log until each replica of each property shard has pulled and applied that entry.
147145
Failure to maintain this requirement can render a sharded property database irrecoverable.
148-
In order to ensure enough transaction logs are kept, you must set db.tx_log.rotation.retention_policy accordingly.
146+
In order to ensure enough transaction logs are kept, you must set xref:configuration/configuration-settings.adoc#config_db.tx_log.rotation.retention_policy[`db.tx_log.rotation.retention_policy`] accordingly.
149147
A suitable heuristic is to ensure that the transaction log kept covers the transactions written between successive full backups of the sharded property database.
150148

151149
[NOTE]
@@ -156,7 +154,8 @@ It is important to ensure that there is space for the transaction logs and that
156154

157155
=== Controlling the property shard transaction log pull frequency
158156

159-
The interval at which property shards pull transaction log entries from the graph shard is controlled by `internal.dbms.sharded_property_database.property_pull_interval` (defaults to 10ms).Write performance can often be improved by setting this value lower at the cost of more polling on the graph shard from the property shards, which has unknown consequences at the moment.
157+
The interval at which property shards pull transaction log entries from the graph shard is controlled by `internal.dbms.sharded_property_database.property_pull_interval` (defaults to 10ms).
158+
Write performance can often be improved by setting this value lower at the cost of more polling on the graph shard from the property shards, which has unknown consequences at the moment.
160159

161160

162161
`

0 commit comments

Comments
 (0)