Skip to content

Commit 49cd0e1

Browse files
committed
store: Split subgraph_deployment table
We split the table `subgraphs.subgraph_deployment` into two tables, `subgraphs.head` and `subgraphs.deployment` where the `head` table only contains the metadata that changes on every block. This should help with situations where the `subgraph_deployment` table gets very bloated since the `head` table that gets bloated through frequent changes has much smaller rows than the current `subgraph_deployment` table. Rows in `subgraph_deployment` can grow as big as 500k, whereas rows in the `head` table will only take about 350 bytes at most. Updates will also be marginally better on the `heads` table since it only has one index rather than the two that `subgraph_deployment` has.
1 parent da48b19 commit 49cd0e1

File tree

13 files changed

+638
-290
lines changed

13 files changed

+638
-290
lines changed

docs/implementation/metadata.md

Lines changed: 43 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
List of all known subgraph names. Maintained in the primary, but there is a background job that periodically copies the table from the primary to all other shards. Those copies are used for queries when the primary is down.
88

99
| Column | Type | Use |
10-
|-------------------|--------------|-------------------------------------------|
10+
| ----------------- | ------------ | ----------------------------------------- |
1111
| `id` | `text!` | primary key, UUID |
1212
| `name` | `text!` | user-chosen name |
1313
| `current_version` | `text` | `subgraph_version.id` for current version |
@@ -18,29 +18,27 @@ List of all known subgraph names. Maintained in the primary, but there is a back
1818

1919
The `id` is used by the hosted explorer to reference the subgraph.
2020

21-
2221
### `subgraphs.subgraph_version`
2322

2423
Mapping of subgraph names from `subgraph` to IPFS hashes. Maintained in the primary, but there is a background job that periodically copies the table from the primary to all other shards. Those copies are used for queries when the primary is down.
2524

2625
| Column | Type | Use |
27-
|---------------|--------------|-------------------------|
26+
| ------------- | ------------ | ----------------------- |
2827
| `id` | `text!` | primary key, UUID |
2928
| `subgraph` | `text!` | `subgraph.id` |
3029
| `deployment` | `text!` | IPFS hash of deployment |
3130
| `created_at` | `numeric` | UNIX timestamp |
3231
| `vid` | `int8!` | unused |
3332
| `block_range` | `int4range!` | unused |
3433

35-
3634
## Managing a deployment
3735

3836
Directory of all deployments. Maintained in the primary, but there is a background job that periodically copies the table from the primary to all other shards. Those copies are used for queries when the primary is down.
3937

4038
### `public.deployment_schemas`
4139

4240
| Column | Type | Use |
43-
|--------------|----------------|----------------------------------------------|
41+
| ------------ | -------------- | -------------------------------------------- |
4442
| `id` | `int4!` | primary key |
4543
| `subgraph` | `text!` | IPFS hash of deployment |
4644
| `name` | `text!` | name of `sgdNNN` schema |
@@ -52,36 +50,46 @@ Directory of all deployments. Maintained in the primary, but there is a backgrou
5250

5351
There can be multiple copies of the same deployment, but at most one per shard. The `active` flag indicates which of these copies will be used for queries; `graph-node` makes sure that there is always exactly one for each IPFS hash.
5452

55-
### `subgraphs.subgraph_deployment`
53+
### `subgraphs.head`
54+
55+
Details about a deployment that change on every block. Maintained in the
56+
shard alongside the deployment's data in `sgdNNN`.
57+
58+
| Column | Type | Use |
59+
| ----------------- | ---------- | -------------------------------------------- |
60+
| `id` | `integer!` | primary key, same as `deployment_schemas.id` |
61+
| `block_hash` | `bytea` | current subgraph head |
62+
| `block_number` | `numeric` | |
63+
| `entity_count` | `numeric!` | total number of entities |
64+
| `firehose_cursor` | `text` | |
65+
66+
### `subgraphs.deployment`
5667

5768
Details about a deployment to track sync progress etc. Maintained in the
5869
shard alongside the deployment's data in `sgdNNN`. The table should only
59-
contain frequently changing data, but for historical reasons contains also
60-
static data.
61-
62-
| Column | Type | Use |
63-
|--------------------------------------|------------|----------------------------------------------|
64-
| `id` | `integer!` | primary key, same as `deployment_schemas.id` |
65-
| `deployment` | `text!` | IPFS hash |
66-
| `failed` | `boolean!` | |
67-
| `synced` | `boolean!` | |
68-
| `earliest_block_number` | `integer!` | earliest block for which we have data |
69-
| `latest_ethereum_block_hash` | `bytea` | current subgraph head |
70-
| `latest_ethereum_block_number` | `numeric` | |
71-
| `entity_count` | `numeric!` | total number of entities |
72-
| `graft_base` | `text` | IPFS hash of graft base |
73-
| `graft_block_hash` | `bytea` | graft block |
74-
| `graft_block_number` | `numeric` | |
75-
| `reorg_count` | `integer!` | |
76-
| `current_reorg_depth` | `integer!` | |
77-
| `max_reorg_depth` | `integer!` | |
78-
| `fatal_error` | `text` | |
79-
| `non_fatal_errors` | `text[]` | |
80-
| `health` | `health!` | |
81-
| `last_healthy_ethereum_block_hash` | `bytea` | |
82-
| `last_healthy_ethereum_block_number` | `numeric` | |
83-
| `firehose_cursor` | `text` | |
84-
| `debug_fork` | `text` | |
70+
contain data that changes fairly infrequently, but for historical reasons
71+
contains also static data.
72+
73+
| Column | Type | Use |
74+
| ------------------------------------ | ------------- | ---------------------------------------------------- |
75+
| `id` | `integer!` | primary key, same as `deployment_schemas.id` |
76+
| `subgraph` | `text!` | IPFS hash |
77+
| `earliest_block_number` | `integer!` | earliest block for which we have data |
78+
| `health` | `health!` | |
79+
| `failed` | `boolean!` | |
80+
| `fatal_error` | `text` | |
81+
| `non_fatal_errors` | `text[]` | |
82+
| `graft_base` | `text` | IPFS hash of graft base |
83+
| `graft_block_hash` | `bytea` | graft block |
84+
| `graft_block_number` | `numeric` | |
85+
| `reorg_count` | `integer!` | |
86+
| `current_reorg_depth` | `integer!` | |
87+
| `max_reorg_depth` | `integer!` | |
88+
| `last_healthy_ethereum_block_hash` | `bytea` | |
89+
| `last_healthy_ethereum_block_number` | `numeric` | |
90+
| `debug_fork` | `text` | |
91+
| `synced_at` | `timestamptz` | time when deployment first reach chain head |
92+
| `synced_at_block_number` | `integer` | block number where deployment first reach chain head |
8593

8694
The columns `reorg_count`, `current_reorg_depth`, and `max_reorg_depth` are
8795
set during indexing. They are used to determine whether a reorg happened
@@ -94,7 +102,7 @@ Details about a deployment that rarely change. Maintained in the
94102
shard alongside the deployment's data in `sgdNNN`.
95103

96104
| Column | Type | Use |
97-
|-------------------------|------------|------------------------------------------------------|
105+
| ----------------------- | ---------- | ---------------------------------------------------- |
98106
| `id` | `integer!` | primary key, same as `deployment_schemas.id` |
99107
| `spec_version` | `text!` | |
100108
| `description` | `text` | |
@@ -115,7 +123,7 @@ but there is a background job that periodically copies the table from the
115123
primary to all other shards.
116124

117125
| Column | Type | Use |
118-
|---------|-------|---------------------------------------------|
126+
| ------- | ----- | ------------------------------------------- |
119127
| id | int4! | primary key, ref to `deployment_schemas.id` |
120128
| node_id | text! | name of index node |
121129

@@ -147,7 +155,7 @@ should have the 'account-like' optimization turned on.
147155
Details about features that a deployment uses, Maintained in the primary.
148156

149157
| Column | Type | Use |
150-
|----------------|-----------|-------------|
158+
| -------------- | --------- | ----------- |
151159
| `id` | `text!` | primary key |
152160
| `spec_version` | `text!` | |
153161
| `api_version` | `text` | |

node/src/manager/commands/unused_deployments.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ pub fn record(store: Arc<SubgraphStore>) -> Result<(), Error> {
6464
let recorded = store.record_unused_deployments()?;
6565

6666
for unused in store.list_unused_deployments(unused::Filter::New)? {
67-
if recorded.iter().any(|r| r.deployment == unused.deployment) {
67+
if recorded.iter().any(|r| r.subgraph == unused.deployment) {
6868
add_row(&mut list, unused);
6969
}
7070
}
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
create table subgraphs.subgraph_deployment (
2+
id int4 primary key,
3+
4+
deployment text unique not null,
5+
6+
latest_ethereum_block_hash bytea,
7+
latest_ethereum_block_number numeric,
8+
entity_count numeric NOT NULL,
9+
firehose_cursor text,
10+
11+
earliest_block_number integer DEFAULT 0 NOT NULL,
12+
13+
graft_base text,
14+
graft_block_hash bytea,
15+
graft_block_number numeric,
16+
17+
health text NOT NULL,
18+
failed boolean NOT NULL,
19+
fatal_error text,
20+
non_fatal_errors text[] DEFAULT '{}'::text[],
21+
22+
reorg_count integer DEFAULT 0 NOT NULL,
23+
current_reorg_depth integer DEFAULT 0 NOT NULL,
24+
max_reorg_depth integer DEFAULT 0 NOT NULL,
25+
26+
last_healthy_ethereum_block_hash bytea,
27+
last_healthy_ethereum_block_number numeric,
28+
29+
debug_fork text,
30+
31+
synced_at timestamp with time zone,
32+
synced_at_block_number integer,
33+
34+
constraint subgraph_deployment_health_new_check
35+
check ((health = any (array['failed', 'healthy', 'unhealthy'])))
36+
);
37+
38+
insert into subgraphs.subgraph_deployment
39+
(id, deployment,
40+
latest_ethereum_block_hash, latest_ethereum_block_number,
41+
entity_count, firehose_cursor,
42+
earliest_block_number,
43+
graft_base, graft_block_hash, graft_block_number,
44+
health, failed, fatal_error, non_fatal_errors,
45+
reorg_count, current_reorg_depth, max_reorg_depth,
46+
last_healthy_ethereum_block_hash, last_healthy_ethereum_block_number,
47+
debug_fork,
48+
synced_at, synced_at_block_number)
49+
select h.id, d.subgraph,
50+
h.block_hash, h.block_number,
51+
h.entity_count, h.firehose_cursor,
52+
earliest_block_number,
53+
graft_base, graft_block_hash, graft_block_number,
54+
health, failed, fatal_error, non_fatal_errors,
55+
reorg_count, current_reorg_depth, max_reorg_depth,
56+
last_healthy_block_hash, last_healthy_block_number,
57+
debug_fork,
58+
synced_at, synced_at_block_number
59+
from subgraphs.head h, subgraphs.deployment d
60+
where h.id = d.id;
61+
62+
alter table subgraphs.copy_state
63+
drop constraint copy_state_dst_fkey,
64+
add constraint copy_state_dst_fkey
65+
foreign key (dst) references
66+
subgraphs.subgraph_deployment(id) on delete cascade;
67+
68+
alter table subgraphs.subgraph_error
69+
drop constraint subgraph_error_subgraph_id_fkey,
70+
add constraint subgraph_error_subgraph_id_fkey
71+
foreign key (subgraph_id) references
72+
subgraphs.subgraph_deployment(deployment) on delete cascade;
73+
74+
alter table subgraphs.subgraph_manifest
75+
drop constraint subgraph_manifest_id_fkey,
76+
add constraint subgraph_manifest_new_id_fkey
77+
foreign key (id) references
78+
subgraphs.subgraph_deployment(id) on delete cascade;
79+
80+
alter table subgraphs.table_stats
81+
drop constraint table_stats_deployment_fkey,
82+
add constraint table_stats_deployment_fkey
83+
foreign key (deployment) references
84+
subgraphs.subgraph_deployment(id) on delete cascade;
85+
86+
drop view info.subgraph_info;
87+
88+
create view info.subgraph_info as
89+
select ds.id AS schema_id,
90+
ds.name AS schema_name,
91+
ds.subgraph,
92+
ds.version,
93+
s.name,
94+
CASE
95+
WHEN s.pending_version = v.id THEN 'pending'::text
96+
WHEN s.current_version = v.id THEN 'current'::text
97+
ELSE 'unused'::text
98+
END AS status,
99+
d.failed,
100+
d.synced_at
101+
from deployment_schemas ds,
102+
subgraph_deployment d,
103+
subgraph_version v,
104+
subgraph s
105+
where d.id = ds.id
106+
and v.deployment = d.deployment
107+
and v.subgraph = s.id;
108+
109+
drop table subgraphs.deployment;
110+
drop table subgraphs.head;
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
create table subgraphs.head (
2+
id int4 primary key,
3+
entity_count int8 not null,
4+
block_number int4,
5+
block_hash bytea,
6+
firehose_cursor text
7+
);
8+
9+
create table subgraphs.deployment (
10+
id int4 primary key,
11+
12+
subgraph text unique not null,
13+
14+
earliest_block_number int4 default 0 not null,
15+
16+
health text not null,
17+
failed boolean not null,
18+
fatal_error text,
19+
non_fatal_errors text[] default '{}'::text[],
20+
21+
graft_base text,
22+
graft_block_hash bytea,
23+
graft_block_number int4,
24+
25+
reorg_count int4 default 0 not null,
26+
current_reorg_depth int4 default 0 not null,
27+
max_reorg_depth int4 default 0 not null,
28+
29+
last_healthy_block_hash bytea,
30+
last_healthy_block_number int4,
31+
32+
debug_fork text,
33+
34+
synced_at timestamptz,
35+
synced_at_block_number int4,
36+
37+
constraint deployment_health_new_check
38+
check ((health = any (array['failed', 'healthy', 'unhealthy']))),
39+
constraint deployment_id
40+
foreign key (id) references subgraphs.head(id) on delete cascade
41+
);
42+
43+
insert into subgraphs.head
44+
(id, block_hash, block_number, entity_count, firehose_cursor)
45+
select id, latest_ethereum_block_hash,
46+
latest_ethereum_block_number, entity_count, firehose_cursor
47+
from subgraphs.subgraph_deployment;
48+
49+
insert into subgraphs.deployment
50+
(id, subgraph, failed, graft_base, graft_block_hash, graft_block_number,
51+
fatal_error, non_fatal_errors, reorg_count, current_reorg_depth,
52+
max_reorg_depth,
53+
last_healthy_block_hash, last_healthy_block_number,
54+
debug_fork, earliest_block_number,
55+
health,
56+
synced_at, synced_at_block_number)
57+
select
58+
id, deployment, failed, graft_base, graft_block_hash, graft_block_number,
59+
fatal_error, non_fatal_errors, reorg_count, current_reorg_depth,
60+
max_reorg_depth,
61+
last_healthy_ethereum_block_hash, last_healthy_ethereum_block_number,
62+
debug_fork, earliest_block_number,
63+
health,
64+
synced_at, synced_at_block_number
65+
from subgraphs.subgraph_deployment;
66+
67+
-- Support joining with subgraph_error
68+
create index deployment_fatal_error
69+
on subgraphs.deployment(fatal_error);
70+
71+
alter table subgraphs.copy_state
72+
drop constraint copy_state_dst_fkey,
73+
add constraint copy_state_dst_fkey
74+
foreign key (dst) references subgraphs.deployment(id) on delete cascade;
75+
76+
alter table subgraphs.subgraph_error
77+
drop constraint subgraph_error_subgraph_id_fkey,
78+
add constraint subgraph_error_subgraph_id_fkey
79+
foreign key (subgraph_id) references
80+
subgraphs.deployment(subgraph) on delete cascade;
81+
82+
alter table subgraphs.subgraph_manifest
83+
drop constraint subgraph_manifest_new_id_fkey,
84+
add constraint subgraph_manifest_id_fkey
85+
foreign key (id) references subgraphs.deployment(id) on delete cascade;
86+
87+
alter table subgraphs.table_stats
88+
drop constraint table_stats_deployment_fkey,
89+
add constraint table_stats_deployment_fkey
90+
foreign key (deployment) references subgraphs.deployment(id)
91+
on delete cascade;
92+
93+
drop view info.subgraph_info;
94+
95+
drop table subgraphs.subgraph_deployment;
96+
97+
create view info.subgraph_info as
98+
select ds.id as schema_id,
99+
ds.name as schema_name,
100+
ds.subgraph,
101+
ds.version,
102+
s.name,
103+
CASE
104+
WHEN s.pending_version = v.id THEN 'pending'
105+
WHEN s.current_version = v.id THEN 'current'
106+
ELSE 'unused'
107+
END AS status,
108+
d.failed,
109+
d.synced_at
110+
from deployment_schemas ds,
111+
subgraphs.deployment d,
112+
subgraphs.subgraph_version v,
113+
subgraphs.subgraph s
114+
where d.id = ds.id
115+
and v.deployment = d.subgraph
116+
and v.subgraph = s.id;

store/postgres/src/chain_store.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2394,10 +2394,10 @@ impl ChainStoreTrait for ChainStore {
23942394
where name = $2)), -1)::int as block
23952395
from (
23962396
select min(d.latest_ethereum_block_number) as block
2397-
from subgraphs.subgraph_deployment d,
2397+
from subgraphs.deployment d,
23982398
subgraphs.subgraph_deployment_assignment a,
23992399
deployment_schemas ds
2400-
where ds.subgraph = d.deployment
2400+
where ds.id = d.id
24012401
and a.id = d.id
24022402
and not d.failed
24032403
and ds.network = $2) a;";

0 commit comments

Comments
 (0)