Skip to content

HelixDB/graph-vector-bench

Repository files navigation

Graph-Vector Bench

Skip to how to run

Data Model / Schema

Nodes: User (Regular Node) + Item (Vector Node)
Edges: Interacted (User -> Item)

Dataset Configuration

./dataset_config/large.toml

  • 10,000 User nodes
    • uniformly distributed across 25 countries
  • 500,000 Item vectors
    • uniformly distributed across 1000 categories
  • 400 Interactions per User
    • On average - Poisson sampled, ~3.9m edges with our seed.

User Node

Fields:

  • country: u8

Item Vector

Dimensions: 1536
Precision: f32 (f64 on Helix as f32 is currently unsupported)

Fields:

  • category: u16

Queries

1. PointGet

Given an Item id, return back id and category.

Represents any kind of random access by ID, e.g. viewing a product page, user profile.

2. OneHop

Given a User id, follow Interacted edges and return all Items' id and category.

Represents viewing a collection e.g. all posts a User has seen, everyone they've followed.

3. OneHopFilter

Given a User id and Item category, find all Items the User Interacted with filtered by category.

Represents viewing things in a category, e.g. "Show me all electronics this user has viewed" or "Find all action movies this user has watched".

Benchmarking

The benchmark is designed to be ran on two linux servers with systemd.

Running on a single machine will be painful: the database server reboots after switching systemd modules. On a single machine you'd have to run the benchmark command again after it reboots - it won't reboot again if the target database is already running.

Set up server

  1. Clone the repo and run cargo build --release
  2. Create a systemd module that runs the built binary with the server arg (make it the full path -> ./target/release/graph-vector-bench server). It must be able to run sudo without password to interact with systemd. It can be any name, the purpose is to have the CLI server to start after the machine reboots after switching databases.
  3. Set up Postgres, Neo4j and HelixDB. They must have systemd modules able to be enabled by the name postgresql, neo4j, helix.

Set up client

  1. Clone the repo and run cargo install
  2. Make a fresh directory where you will run the CLI from.
  3. Copy over ./server_connections.toml file and ./benchmark_config ./matrix_config ./dataset_config directories into your new directory.
  4. Fill in server_connections.toml with your database connection strings. server_url is this CLI's server
  5. Make sure the huggingface CLI is installed: pipx install "huggingface_hub[cli]"
  6. hf download KShivendu/dbpedia-entities-openai-1M --repo-type=dataset --local-dir openai-1m - This is in preparation for upcoming vector queries

Running benchmarks (on client)

Run graph-vector-bench <command> --help to view the parameters for the following:

  1. For every dataset you want to populate, run graph-vector-bench populate ...
  2. Once populated, run graph-vector-bench benchmark ...

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published