Skip to content

DOC-753 | Graph ML UI #709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jun 19, 2025
Merged

DOC-753 | Graph ML UI #709

merged 16 commits into from
Jun 19, 2025

Conversation

bluepal-thirumala-thotapalli
Copy link
Contributor

@bluepal-thirumala-thotapalli bluepal-thirumala-thotapalli commented Jun 10, 2025

Description

TODO: Update screenshots due to name change Data Science (Suite) -> GenAI Suite

Upstream PRs

  • 3.10:
  • 3.11:
  • 3.12:
  • 3.13:

Copy link
Contributor

Deploy Preview Available Via
https://deploy-preview-709--docs-hugo.netlify.app

This comment was marked as duplicate.

@Simran-B Simran-B changed the title Doc 753 DOC-753 | Graph ML UI Jun 10, 2025
Comment on lines 2 to 3
title: ArangoGraphML Web Interface
menuTitle: ArangoGraphML Web Interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title to be discussed (we might rename it to just GraphML)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I’ve updated the title and menuTitle to "GraphML" as suggested.

aliases:
- getting-started-with-arangographml
---
Solve high-computational graph problems with Graph Machine Learning. Apply ML on a selected graph to predict connections, get better product recommendations, classify nodes, and perform node embeddings. Configure and run the whole machine learning flow entirely in the web interface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only have node classification and embeddings available as immediate options. If we mention something like link predictions, we should at least outline how to achieve that.

Would also be good to have a more technical explanation here about how GraphML works (GraphSage, using depth 2 neighborhood, as mentioned in Slack team channel).

Please also add an overview over the process instead of immediately starting with project creation etc., users should first get an understanding of the hierarchy and steps involved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve addressed the points as suggested:

Mentioned only node classification and embeddings as the currently available options.

Added a brief technical explanation of how GraphML works, referencing GraphSAGE with depth 2 neighborhood, based on our Slack discussion and information from the official GraphSAGE site.

Included an overview section at the beginning to explain the overall process, hierarchy, and steps before diving into project creation.


## Prediction Phase

Once the best-performing model has been selected, the final step of the GraphML pipeline is to generate predictions for new or unlabeled data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I explained, we don't have the capability to only process new/unlabeled data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated – Rewrote the section to remove the inaccurate reference to “new or unlabeled data” as suggested.
Replaced it with:

After selecting a model, you can create a Prediction Job. The Prediction Job generates predictions and persists them to the source graph, either in a new collection or within the source documents.
Let me know if any further adjustments are needed.

This comment was marked as duplicate.

This comment was marked as duplicate.

This comment was marked as duplicate.

This comment was marked as duplicate.

This comment was marked as duplicate.

This comment was marked as duplicate.

Comment on lines 7 to 8
aliases:
- getting-started-with-arangographml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be copied from the getting-started.md file, needs to be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the aliases section from the ui.md file, including the alias getting-started-with-arangographml

Comment on lines +2 to +3
title: GraphML
menuTitle: GraphML
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the bigger picture, I think we need to make some structural changes to the parent chapter to accommodate the new content.

We have a Deploy subpage for the setup, but the other page is Getting Started and covers the usage of ArangoGraphML at the API level only. I think the web interface is a lot more suitable for getting started, but perhaps it's better to just split it into a UI and an API page? Having both on one page (with tabs) would work for the steps (Featurization, Training, ...) but the rest of the UI-related content would be on its own. The benefit I see is that we could have just a single description of the options for UI and API. Should be discussed in the team.

menuTitle: GraphML
weight: 15
description: >-
Enterprise-ready, graph-powered machine learning as a cloud service or self-managed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't say anything about the content being about the UI for GraphML

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added UI-related content and removed the previous content.


GraphML directly supports two primary machine learning tasks:

* **Node Classification:** Automatically assign a category or label to nodes in your graph. For example, you can classify customers as "likely to churn" or "high value," or identify fraudulent transactions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think formally, it's always a label (a single categorical value out of the predicted likelihoods, of which the top likely is selected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the document to use the more formal term "label" as you recommended.

GraphML directly supports two primary machine learning tasks:

* **Node Classification:** Automatically assign a category or label to nodes in your graph. For example, you can classify customers as "likely to churn" or "high value," or identify fraudulent transactions.
* **Node Embeddings:** Generate powerful numerical representations (vectors) for each node. These embeddings capture a node's features as well as its unique structural position within the graph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling the numerical representations itself powerful seems odd.

I think there should be a mention here that it's about node similarity. Otherwise, it could be misunderstood as something that captures features and positions in an absolute way, but it's more about proximity in a high-dimensional space where closeness stands for (assumed) semantic similarity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you were right, focusing on similarity makes the definition much clearer and more accurate. I've rewritten the section as you recommended.

Comment on lines +203 to +206
**Featurize new documents:** Enable this option to generate features for documents that have been added since the model was trained. This is useful for getting predictions on new data without having to retrain the model.

**Featurize outdated documents:** Enable this option to re-generate features for documents that have been modified since the last featurization. This ensures your predictions reflect the latest changes to your data.
In addition to these settings, you will also define the target data, where to store results, and whether to run the job on a recurring schedule.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be an unordered list

Comment on lines +203 to +205
**Featurize new documents:** Enable this option to generate features for documents that have been added since the model was trained. This is useful for getting predictions on new data without having to retrain the model.

**Featurize outdated documents:** Enable this option to re-generate features for documents that have been modified since the last featurization. This ensures your predictions reflect the latest changes to your data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be an unordered list

Comment on lines +224 to +228
**Featurize New Documents:**
This option controls whether newly added documents are automatically featurized. It is useful when new data arrives after training, allowing predictions to continue without requiring a full retraining process.

**Featurize Outdated Documents:**
Enable or disable the featurization of outdated documents. Outdated documents are those whose attributes (used during featurization) have changed since the last feature computation. This ensures prediction results are based on up-to-date information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already described above

**Featurize outdated documents:** Enable this option to re-generate features for documents that have been modified since the last featurization. This ensures your predictions reflect the latest changes to your data.
In addition to these settings, you will also define the target data, where to store results, and whether to run the job on a recurring schedule.

In addition to these settings, you also define the target data, where to store results, and whether to run the job on a recurring schedule.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have an internal link to the scheduling details


When scheduling is turned on, predictions run automatically based on a set CRON expression. This helps keep prediction results up to date as new data is added to the system.

#### Schedule (CRON expression)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do without a subheading?

Copy link

cla-bot bot commented Jun 19, 2025

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Thirumala.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link

cla-bot bot commented Jun 19, 2025

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Thirumala.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link

cla-bot bot commented Jun 19, 2025

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Thirumala.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link

cla-bot bot commented Jun 19, 2025

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Thirumala.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link

cla-bot bot commented Jun 19, 2025

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Thirumala.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link

cla-bot bot commented Jun 19, 2025

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Thirumala.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link

cla-bot bot commented Jun 19, 2025

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Thirumala.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link
Contributor

@nerpaula nerpaula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further changes will be treated in a separate PR

@nerpaula nerpaula merged commit 737a3e6 into main Jun 19, 2025
4 of 5 checks passed
@nerpaula nerpaula deleted the DOC-753 branch June 19, 2025 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants