Skip to content

[IR2Vec] Add embeddings mode to llvm-ir2vec tool #147844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 17, 2025

Conversation

svkeerthy
Copy link
Contributor

@svkeerthy svkeerthy commented Jul 9, 2025

Add embedding generation functionality to the llvm-ir2vec tool, complementing the existing triplet generation mode.

This change completes the IR2Vec tool by adding the embedding generation functionality, which was previously mentioned as a TODO item. The tool now supports both triplet generation for vocabulary training and embedding generation using a trained vocabulary.

Copy link
Contributor Author

svkeerthy commented Jul 9, 2025

@svkeerthy svkeerthy changed the title IR2Vec Tool Enhancements [IR2Vec] Add embeddings mode to llvm-ir2vec tool Jul 9, 2025
@svkeerthy svkeerthy marked this pull request as ready for review July 9, 2025 22:55
Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Premerge failures here also look relevant.

@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool branch from 7c4d86d to 5f1f3fe Compare July 11, 2025 19:54
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool_enhancements branch 2 times, most recently from bf757c0 to 684d298 Compare July 11, 2025 21:35
Copy link

github-actions bot commented Jul 11, 2025

✅ With the latest revision this PR passed the Python code formatter.

@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool_enhancements branch 2 times, most recently from 2d88b38 to 6fd2dca Compare July 11, 2025 22:10
@svkeerthy svkeerthy requested a review from boomanaiden154 July 11, 2025 22:10
Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style nits, otherwise LGTM.

@@ -34,7 +42,7 @@
#include "llvm/Support/raw_ostream.h"

using namespace llvm;
using namespace ir2vec;
using namespace llvm::ir2vec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using statements, it might be better to wrap everything outside of main in an anonymous namespace inside the llvm::ir2vec namespace. I'm not sure what the coding standards are, but that's the pattern I see in other tools like llvm-exegesis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// Generate embeddings based on the specified level
switch (Level) {
case FunctionLevel: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does clang-format not let you indent here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wierdly yes!

@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool_enhancements branch from 6fd2dca to 4e92c2b Compare July 14, 2025 17:40
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool branch from 5f1f3fe to 744b38b Compare July 14, 2025 17:40
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool_enhancements branch from 4e92c2b to f975249 Compare July 14, 2025 18:02
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool branch from 744b38b to 51b0120 Compare July 14, 2025 18:11
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool_enhancements branch 2 times, most recently from ab12375 to a3b518b Compare July 14, 2025 20:45
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool branch from 51b0120 to e931cf1 Compare July 14, 2025 20:45
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool branch from e931cf1 to 0f1720f Compare July 14, 2025 23:40
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool_enhancements branch 2 times, most recently from f2498dc to 7b801df Compare July 16, 2025 22:49
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool branch from 0f1720f to 52ec5db Compare July 16, 2025 22:49
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool branch from 52ec5db to 36fe251 Compare July 16, 2025 23:32
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool_enhancements branch from 7b801df to df6bdef Compare July 16, 2025 23:32
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool branch from 36fe251 to 47d402c Compare July 16, 2025 23:46
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool_enhancements branch from df6bdef to 0ee74a8 Compare July 16, 2025 23:46
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool branch from f4181fd to 7f45a74 Compare July 17, 2025 18:04
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool_enhancements branch from 0ee74a8 to c0360c7 Compare July 17, 2025 18:04
Copy link
Contributor

@kazutakahirata kazutakahirata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Copy link
Contributor Author

svkeerthy commented Jul 17, 2025

Merge activity

  • Jul 17, 6:58 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Jul 17, 7:04 PM UTC: Graphite rebased this pull request as part of a merge.
  • Jul 17, 7:06 PM UTC: @svkeerthy merged this pull request with Graphite.

@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool branch from 7f45a74 to 74e3b78 Compare July 17, 2025 19:00
svkeerthy added a commit that referenced this pull request Jul 17, 2025
)

Add a new LLVM tool `llvm-ir2vec`. This tool is primarily intended to generate triplets for training the vocabulary (#141834) and to potentially generate the embeddings in a stand alone manner.

This PR introduces the tool with triplet generation functionality. In the upcoming PRs I'll add scripts under `utils/mlgo` to complete the vocabulary tooling. #147844 adds embedding generation logic to the tool.

(Tracking issue - #141817)
Base automatically changed from users/svkeerthy/07-09-ir2vec_tool to main July 17, 2025 19:03
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-09-ir2vec_tool_enhancements branch from c0360c7 to 537495c Compare July 17, 2025 19:04
@svkeerthy svkeerthy merged commit 70e2319 into main Jul 17, 2025
6 checks passed
@svkeerthy svkeerthy deleted the users/svkeerthy/07-09-ir2vec_tool_enhancements branch July 17, 2025 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants