Graph-Vector Bench

Skip to how to run

Data Model / Schema

Nodes: User (Regular Node) + Item (Vector Node)
Edges: Interacted (User -> Item)

Dataset Configuration

./dataset_config/large.toml

10,000 User nodes
- uniformly distributed across 25 countries
500,000 Item vectors
- uniformly distributed across 1000 categories
400 Interactions per User
- On average - Poisson sampled, ~3.9m edges with our seed.

User Node

Fields:

country: u8

Item Vector

Dimensions: 1536
Precision: f32 (f64 on Helix as f32 is currently unsupported)

Fields:

category: u16

Queries

1. PointGet

Given an Item id, return back id and category.

Represents any kind of random access by ID, e.g. viewing a product page, user profile.

2. OneHop

Given a User id, follow Interacted edges and return all Items' id and category.

Represents viewing a collection e.g. all posts a User has seen, everyone they've followed.

3. OneHopFilter

Given a User id and Item category, find all Items the User Interacted with filtered by category.

Represents viewing things in a category, e.g. "Show me all electronics this user has viewed" or "Find all action movies this user has watched".

Benchmarking

The benchmark is designed to be ran on two linux servers with systemd.

Running on a single machine will be painful: the database server reboots after switching systemd modules. On a single machine you'd have to run the benchmark command again after it reboots - it won't reboot again if the target database is already running.

Set up server

Clone the repo and run cargo build --release
Create a systemd module that runs the built binary with the server arg (make it the full path -> ./target/release/graph-vector-bench server). It must be able to run sudo without password to interact with systemd. It can be any name, the purpose is to have the CLI server to start after the machine reboots after switching databases.
Set up Postgres, Neo4j and HelixDB. They must have systemd modules able to be enabled by the name postgresql, neo4j, helix.

Set up client

Clone the repo and run cargo install
Make a fresh directory where you will run the CLI from.
Copy over ./server_connections.toml file and ./benchmark_config ./matrix_config ./dataset_config directories into your new directory.
Fill in server_connections.toml with your database connection strings. server_url is this CLI's server
Make sure the huggingface CLI is installed: pipx install "huggingface_hub[cli]"
hf download KShivendu/dbpedia-entities-openai-1M --repo-type=dataset --local-dir openai-1m - This is in preparation for upcoming vector queries

Running benchmarks (on client)

Run graph-vector-bench <command> --help to view the parameters for the following:

For every dataset you want to populate, run graph-vector-bench populate ...
Once populated, run graph-vector-bench benchmark ...

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
benchmark_config		benchmark_config
charts		charts
dataset_config		dataset_config
helix-queries		helix-queries
matrix_config		matrix_config
results		results
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
benchmark_post.md		benchmark_post.md
server_connections.toml		server_connections.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Graph-Vector Bench

Data Model / Schema

Dataset Configuration

User Node

Item Vector

Queries

1. PointGet

2. OneHop

3. OneHopFilter

Benchmarking

Set up server

Set up client

Running benchmarks (on client)

About

Uh oh!

Releases

Packages

Languages

HelixDB/graph-vector-bench

Folders and files

Latest commit

History

Repository files navigation

Graph-Vector Bench

Data Model / Schema

Dataset Configuration

User Node

Item Vector

Queries

1. PointGet

2. OneHop

3. OneHopFilter

Benchmarking

Set up server

Set up client

Running benchmarks (on client)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages