Nodes: User (Regular Node) + Item (Vector Node)
Edges: Interacted (User -> Item)
- 10,000 User nodes
- uniformly distributed across 25 countries
- 500,000 Item vectors
- uniformly distributed across 1000 categories
- 400 Interactions per User
- On average - Poisson sampled, ~3.9m edges with our seed.
Fields:
- country: u8
Dimensions: 1536
Precision: f32 (f64 on Helix as f32 is currently unsupported)
Fields:
- category: u16
Given an Item id, return back id and category.
Represents any kind of random access by ID, e.g. viewing a product page, user profile.
Given a User id, follow Interacted edges and return all Items' id and category.
Represents viewing a collection e.g. all posts a User has seen, everyone they've followed.
Given a User id and Item category, find all Items the User Interacted with filtered by category.
Represents viewing things in a category, e.g. "Show me all electronics this user has viewed" or "Find all action movies this user has watched".
The benchmark is designed to be ran on two linux servers with systemd.
Running on a single machine will be painful: the database server reboots after switching systemd modules. On a single machine you'd have to run the benchmark command again after it reboots - it won't reboot again if the target database is already running.
- Clone the repo and run
cargo build --release - Create a systemd module that runs the built binary with the
serverarg (make it the full path ->./target/release/graph-vector-bench server). It must be able to runsudowithout password to interact with systemd. It can be any name, the purpose is to have the CLI server to start after the machine reboots after switching databases. - Set up Postgres, Neo4j and HelixDB. They must have systemd modules able to be enabled by the name
postgresql,neo4j,helix.
- Clone the repo and run
cargo install - Make a fresh directory where you will run the CLI from.
- Copy over
./server_connections.tomlfile and./benchmark_config ./matrix_config ./dataset_configdirectories into your new directory. - Fill in
server_connections.tomlwith your database connection strings.server_urlis this CLI's server - Make sure the huggingface CLI is installed:
pipx install "huggingface_hub[cli]" hf download KShivendu/dbpedia-entities-openai-1M --repo-type=dataset --local-dir openai-1m- This is in preparation for upcoming vector queries
Run graph-vector-bench <command> --help to view the parameters for the following:
- For every dataset you want to populate, run
graph-vector-bench populate ... - Once populated, run
graph-vector-bench benchmark ...