AMD ROCm의 kernel library인 aiter 분석 글입니다. by mino-park7 · Pull Request #51 · Hyper-Accel/hyper-accel.github.io

mino-park7 · 2026-03-17T04:58:00Z

This commit introduces a new blog post titled "AITER Analysis: How AMD Doubled ROCm Inference Performance," authored by Minho Park. The post details the architecture and kernel strategies of AITER, highlighting its significant performance enhancements for AMD GPUs. It includes comprehensive benchmarks, operational support, and a multi-backend strategy, along with various images to illustrate the concepts discussed. The content is available in both English and Korean.

gemini-code-assist · 2026-03-17T04:58:15Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive blog post that delves into AMD's AI Tensor Engine for ROCm (AITER). The post provides an in-depth analysis of how AITER significantly boosts inference performance on AMD GPUs, covering its architectural design, diverse kernel backend strategies, and practical integration methods. It highlights the impact of software optimization on hardware performance, aiming to inform and educate the community on AMD's advancements in the AI accelerator ecosystem.

Highlights

New Blog Post: Introduced a new blog post titled "AITER Analysis: How AMD Doubled ROCm Inference Performance" by Minho Park, available in both English and Korean.
AITER Architecture and Performance: The post details the architecture, kernel strategies, and significant performance enhancements of AITER (AI Tensor Engine for ROCm) for AMD GPUs, including comprehensive benchmarks.
Multi-Backend Strategy: Explained AITER's multi-backend approach utilizing Triton, Composable Kernel (CK), HIP, and Assembly (ASM) for optimal performance across different operations.
JIT Compilation and Integration: Described AITER's JIT compilation pipeline for kernel caching and its seamless integration into inference frameworks like vLLM and SGLang via environment variables.

Changelog

content/posts/rocm-aiter/index.en.md
- Added a new English blog post analyzing AMD's AITER kernel library.
content/posts/rocm-aiter/index.ko.md
- Added a new Korean blog post analyzing AMD's AITER kernel library.

Activity

No human activity has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Kernels run so fast, Doubling speed, a grand feat, Code optimized, swift.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces new English and Korean markdown posts that provide a detailed analysis of AMD's AITER (AI Tensor Engine for ROCm), covering its architecture, multi-backend kernel strategies (Triton, CK, HIP, ASM), JIT compilation pipeline, and performance benchmarks demonstrating significant inference speedups on AMD GPUs. The review comments point out several style guide violations and areas for improvement, including inconsistent capitalization of technical terms, overly long sentences impacting readability, issues within the frontmatter such as incorrect image alt text, an improperly formatted summary field, and a missing required description field, as well as missing language tags in code blocks.

content/posts/rocm-aiter/index.ko.md

content/posts/rocm-aiter/index.en.md

- Fix frontmatter: correct alt text, convert summary from list to string, add description field - Split long intro paragraph into shorter sentences for readability - Add `text` language tags to bare code blocks (CUDA→HIP table, register types, head_dim combinations) - Fix heading capitalization: quantization → Quantization

YoungHoonJun

글 재밌게 잘 읽었습니다 민호님!

YoungHoonJun · 2026-03-18T02:00:32Z

content/posts/rocm-aiter/index.ko.md

+
+Semi Analysis 는 반도체 업계에서 유명한 리서치 기관입니다. 이 기관은 주요 GPU 의 inference 성능을 실측 비교하는 [InferenceX](https://inferencex.semianalysis.com) 벤치마크를 운영하고 있습니다.
+
+2026년 2월에 공개된 [InferenceX v2](https://newsletter.semianalysis.com/p/inferencex-v2-nvidia-blackwell-vs) 보고서에 따르면, AMD MI300X 의 SGLang 성능이 2025년 12월에서 2026년 1월 사이 **거의 2배 가까이** 향상되었다고 합니다. 이 성능 향상의 중심에 **AI Tensor Engine for ROCm(AITER)** 이라는 커널 라이브러리가 있었습니다.


글의 초반부이기 때문에 성능이 어떤 측면에서 2배가 향상(throughput, latency, ...)되었는지 언급되면 좋을 것 같습니다.

반영했습니다! 초반부와 벤치마크 테이블 헤더에 throughput임을 명시했습니다.

YoungHoonJun · 2026-03-18T02:02:29Z

content/posts/rocm-aiter/index.ko.md

+| Block-scale **General Matrix Multiplication(GEMM)** | **2배** |
+| Block-scale Fused **Mixture of Experts(MoE)** | **3배** |
+| MLA Decode | **17배** |
+| **Multi-Head Attention(MHA)** Prefill | **14배** |


여기서 2배, 3배, 17배, 14배 성능이 향상되었다는건 throughput 측면에서의 성능 향상을 의미하는 것이 맞을까요?

네 맞습니다, throughput 기준입니다. 테이블 헤더를 '성능 향상' → 'Throughput 향상'으로 수정했습니다.

YoungHoonJun · 2026-03-18T02:04:56Z

content/posts/rocm-aiter/index.ko.md

+
+### Assembly(ASM)
+
+![MLA 레이어 구조 — AITER가 ASM으로 최적화한 핵심 대상 (출처: AMD ROCm Blog)](./images/mla-architecture.png)


해당 사진이 아래처럼 다크 모드에서는 잘 보이지 않습니다. 흰색 배경이 들어가면 더 좋겠습니다!

반영했습니다! MLA 아키텍처 이미지에 흰색 배경(<figure> 태그)을 추가하여 다크 모드에서도 잘 보이도록 수정했습니다.

YoungHoonJun · 2026-03-18T02:08:56Z

추가로 ‎content/posts/rocm-aiter/images/aiter-mla-header.webp 해당 파일은 현재 포스트 글에서 사용되고 있지 않은 것으로 보입니다!

- Clarify '2x improvement' refers to throughput (intro + benchmark table) - Add white background to MLA architecture image for dark mode visibility - Remove unused aiter-mla-header.webp image

mino-park7 · 2026-03-23T06:46:55Z

반영했습니다! aiter-mla-header.webp 파일을 삭제했습니다.

YoungHoonJun

수정 감사합니다! operator 특성에 따라 4가지 백엔드를 골라쓰는 점이 흥미롭네요...! 좋은 글이라고 생각합니다.

고생하셨습니다, LGTM!

mino-park7 self-assigned this Mar 17, 2026

mino-park7 requested a review from a team as a code owner March 17, 2026 04:58

mino-park7 added the post label Mar 17, 2026

gemini-code-assist bot reviewed Mar 17, 2026

View reviewed changes

content/posts/rocm-aiter/index.ko.md Outdated Show resolved Hide resolved

content/posts/rocm-aiter/index.ko.md Outdated Show resolved Hide resolved

content/posts/rocm-aiter/index.en.md Outdated Show resolved Hide resolved

content/posts/rocm-aiter/index.en.md Outdated Show resolved Hide resolved

YoungHoonJun reviewed Mar 18, 2026

View reviewed changes

Address PR #51 review feedback from YoungHoonJun

4269e1a

- Clarify '2x improvement' refers to throughput (intro + benchmark table) - Add white background to MLA architecture image for dark mode visibility - Remove unused aiter-mla-header.webp image

mino-park7 requested review from a team and YoungHoonJun March 23, 2026 06:47

YoungHoonJun approved these changes Mar 23, 2026

View reviewed changes

mino-park7 merged commit 949c2cb into main Mar 24, 2026
1 check passed

mino-park7 deleted the feat/rocm-aiter branch March 24, 2026 00:55


		Semi Analysis 는 반도체 업계에서 유명한 리서치 기관입니다. 이 기관은 주요 GPU 의 inference 성능을 실측 비교하는 [InferenceX](https://inferencex.semianalysis.com) 벤치마크를 운영하고 있습니다.

		2026년 2월에 공개된 [InferenceX v2](https://newsletter.semianalysis.com/p/inferencex-v2-nvidia-blackwell-vs) 보고서에 따르면, AMD MI300X 의 SGLang 성능이 2025년 12월에서 2026년 1월 사이 거의 2배 가까이 향상되었다고 합니다. 이 성능 향상의 중심에 AI Tensor Engine for ROCm(AITER) 이라는 커널 라이브러리가 있었습니다.


		### Assembly(ASM)

		![MLA 레이어 구조 — AITER가 ASM으로 최적화한 핵심 대상 (출처: AMD ROCm Blog)](./images/mla-architecture.png)

Conversation

mino-park7 commented Mar 17, 2026

Uh oh!

gemini-code-assist bot commented Mar 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YoungHoonJun left a comment

Choose a reason for hiding this comment

Uh oh!

YoungHoonJun Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

mino-park7 Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

YoungHoonJun Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

mino-park7 Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

YoungHoonJun Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

mino-park7 Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

YoungHoonJun commented Mar 18, 2026

Uh oh!

mino-park7 commented Mar 23, 2026

Uh oh!

YoungHoonJun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants