-
Notifications
You must be signed in to change notification settings - Fork 8
DOC-753 | Graph ML UI #709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Deploy Preview Available Via |
This comment was marked as duplicate.
This comment was marked as duplicate.
title: ArangoGraphML Web Interface | ||
menuTitle: ArangoGraphML Web Interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Title to be discussed (we might rename it to just GraphML)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I’ve updated the title and menuTitle to "GraphML" as suggested.
aliases: | ||
- getting-started-with-arangographml | ||
--- | ||
Solve high-computational graph problems with Graph Machine Learning. Apply ML on a selected graph to predict connections, get better product recommendations, classify nodes, and perform node embeddings. Configure and run the whole machine learning flow entirely in the web interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only have node classification and embeddings available as immediate options. If we mention something like link predictions, we should at least outline how to achieve that.
Would also be good to have a more technical explanation here about how GraphML works (GraphSage, using depth 2 neighborhood, as mentioned in Slack team channel).
Please also add an overview over the process instead of immediately starting with project creation etc., users should first get an understanding of the hierarchy and steps involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve addressed the points as suggested:
Mentioned only node classification and embeddings as the currently available options.
Added a brief technical explanation of how GraphML works, referencing GraphSAGE with depth 2 neighborhood, based on our Slack discussion and information from the official GraphSAGE site.
Included an overview section at the beginning to explain the overall process, hierarchy, and steps before diving into project creation.
|
||
## Prediction Phase | ||
|
||
Once the best-performing model has been selected, the final step of the GraphML pipeline is to generate predictions for new or unlabeled data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I explained, we don't have the capability to only process new/unlabeled data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated – Rewrote the section to remove the inaccurate reference to “new or unlabeled data” as suggested.
Replaced it with:
After selecting a model, you can create a Prediction Job. The Prediction Job generates predictions and persists them to the source graph, either in a new collection or within the source documents.
Let me know if any further adjustments are needed.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
aliases: | ||
- getting-started-with-arangographml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be copied from the getting-started.md file, needs to be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the aliases
section from the ui.md
file, including the alias getting-started-with-arangographml
title: GraphML | ||
menuTitle: GraphML |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the bigger picture, I think we need to make some structural changes to the parent chapter to accommodate the new content.
We have a Deploy subpage for the setup, but the other page is Getting Started and covers the usage of ArangoGraphML at the API level only. I think the web interface is a lot more suitable for getting started, but perhaps it's better to just split it into a UI and an API page? Having both on one page (with tabs) would work for the steps (Featurization, Training, ...) but the rest of the UI-related content would be on its own. The benefit I see is that we could have just a single description of the options for UI and API. Should be discussed in the team.
menuTitle: GraphML | ||
weight: 15 | ||
description: >- | ||
Enterprise-ready, graph-powered machine learning as a cloud service or self-managed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't say anything about the content being about the UI for GraphML
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added UI-related content and removed the previous content.
|
||
GraphML directly supports two primary machine learning tasks: | ||
|
||
* **Node Classification:** Automatically assign a category or label to nodes in your graph. For example, you can classify customers as "likely to churn" or "high value," or identify fraudulent transactions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think formally, it's always a label (a single categorical value out of the predicted likelihoods, of which the top likely is selected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the document to use the more formal term "label" as you recommended.
GraphML directly supports two primary machine learning tasks: | ||
|
||
* **Node Classification:** Automatically assign a category or label to nodes in your graph. For example, you can classify customers as "likely to churn" or "high value," or identify fraudulent transactions. | ||
* **Node Embeddings:** Generate powerful numerical representations (vectors) for each node. These embeddings capture a node's features as well as its unique structural position within the graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling the numerical representations itself powerful seems odd.
I think there should be a mention here that it's about node similarity. Otherwise, it could be misunderstood as something that captures features and positions in an absolute way, but it's more about proximity in a high-dimensional space where closeness stands for (assumed) semantic similarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you were right, focusing on similarity makes the definition much clearer and more accurate. I've rewritten the section as you recommended.
**Featurize new documents:** Enable this option to generate features for documents that have been added since the model was trained. This is useful for getting predictions on new data without having to retrain the model. | ||
|
||
**Featurize outdated documents:** Enable this option to re-generate features for documents that have been modified since the last featurization. This ensures your predictions reflect the latest changes to your data. | ||
In addition to these settings, you will also define the target data, where to store results, and whether to run the job on a recurring schedule. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be an unordered list
**Featurize new documents:** Enable this option to generate features for documents that have been added since the model was trained. This is useful for getting predictions on new data without having to retrain the model. | ||
|
||
**Featurize outdated documents:** Enable this option to re-generate features for documents that have been modified since the last featurization. This ensures your predictions reflect the latest changes to your data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be an unordered list
**Featurize New Documents:** | ||
This option controls whether newly added documents are automatically featurized. It is useful when new data arrives after training, allowing predictions to continue without requiring a full retraining process. | ||
|
||
**Featurize Outdated Documents:** | ||
Enable or disable the featurization of outdated documents. Outdated documents are those whose attributes (used during featurization) have changed since the last feature computation. This ensures prediction results are based on up-to-date information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already described above
**Featurize outdated documents:** Enable this option to re-generate features for documents that have been modified since the last featurization. This ensures your predictions reflect the latest changes to your data. | ||
In addition to these settings, you will also define the target data, where to store results, and whether to run the job on a recurring schedule. | ||
|
||
In addition to these settings, you also define the target data, where to store results, and whether to run the job on a recurring schedule. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have an internal link to the scheduling details
|
||
When scheduling is turned on, predictions run automatically based on a set CRON expression. This helps keep prediction results up to date as new data is added to the system. | ||
|
||
#### Schedule (CRON expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do without a subheading?
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Thirumala.
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Thirumala.
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Thirumala.
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Thirumala.
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Thirumala.
|
Co-authored-by: Simran <[email protected]>
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Thirumala.
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Thirumala.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further changes will be treated in a separate PR
Description
TODO: Update screenshots due to name change Data Science (Suite) -> GenAI Suite
Upstream PRs