See Vitis™ AI Development Environment on amd.com |
The AI Engine development design tutorials showcase the two major phases of AI Engine application development: designing the application and developing the kernels. These tutorials demonstrate both phases.
The README of AI Engine development contains important information including tool version, environment settings, and a table describing the platform, operating system, and supported features or flows of each tutorial. AMD recommends that you review details before starting to use the AIE tutorials.
| Tutorial | Description |
| Versal Custom Thin Platform Extensible System | This is an AMD Versal™ system example design based on a VCK190 thin custom platform (minimal clocks and AXI exposed to PL) that includes HLS/RTL kernels and AI Engine kernel using a full Makefile build-flow. |
| LeNet Tutorial | This tutorial implements a system-level design to perform image classification using the LeNet algorithm on the AI Engine and PL logic, including block RAM. The design demonstrates functional partitioning between the AI Engine and PL. It also highlights memory partitioning and hierarchy among DDR memory, PL (block RAM) and AI Engine memory. |
| Super Sampling Rate FIR Filters | This tutorial provides a methodology to enable you to make appropriate choices depending on the filter characteristics. It also provides examples on how to implement Super Sampling Rate (SSR) FIR Filters on a Versal™ adaptive SoC AI Engine processor array. |
| Beamforming Design | This tutorial implements a beamforming system running on the AI Engine, PL, and PS, and validates the design running on this heterogeneous domain. |
| Polyphase Channelizer | This tutorial implements a system-level design (such as Polyphase Channelizer) using a combination of AI Engine and PL/HLS kernels. |
| Prime Factor FFT-1008 | This Versal system example implements a 1008-pt FFT using the Prime Factor Algorithm. The design uses both AI Engine and PL kernels working cooperatively. AI Engine elements are hand-coded using AIE API. PL elements use Vitis HLS. The new v++ Unified Command Line flow manages system integration in the Vitis platform. |
| 2D-FFT | This tutorial performs two implementations of a system-level design (2D-FFT): one with AI Engine, and the other with HLS using the DSP Engines. |
| FIR Filter | This tutorial implements a system-level design (FIR Filter) using AI Engines and HLS with DSP Engines. It uses the Versal device plus PL resources including lookup tables, flip-flops, and block RAMs. |
| N-Body Simulator | It is a system-level design that uses the AI Engine, PL, and PS resources to showcase the following features: |
| Digital Down-conversion Chain | This tutorial demonstrates the steps to upgrade a 32-branch digital down-conversion chain (XAPP1351) to the latest recommended tools and coding practice, including conversion of most AI Engine Intrinsics to APIs. The upgraded AIE API version achieves the same throughput performance as the original code base, while being easier to read and maintain. |
| Versal GeMM Implementation | This tutorial performs two implementations of a system-level design: one with AI Engine, and the other with RTL using the DSP Engines. In each implementation, the tutorial takes you through the hardware emulation and hardware flow in the context of a complete Versal adaptive SoC system design. |
| Bilinear Interpolation | This tutorial implements a bilinear interpolation algorithm using AI Engines. It also provides guidance for customizing the design to function with varying image resolutions, and to take advantage of multicore processing on the AI Engine array to achieve desired throughput. |
| 64K IFFT Using 2D Architecture | This Versal system example implements a 64K-pt IFFT using a 2D architecture. It decomposes 64K = 256 x 256 and builds the transform in two dimensions using row and column FFT-256. A matrix transpose is performed in between in the PL. This alternative "divide and conquer" approach is attractive in the SSR > 1 regime. |
| Implementing FFT and DFT Designs on AI Engines | This tutorial implements several techniques for mapping FFT and DFT algorithms to the AI Engine array. These include the Stockham FFT used in AMD Vitis DSPlib, hand-coded variants using the AI Engine API, and a direct form DFT using vector-matrix multiplication. It also shows how to trade off AI engine tile resource vs. throughput performance of the Stockham FFT in DSPlib using its TP_CASC_LEN and TP_PARALLEL_POWER template parameters. This is useful when configuring DSPlib FFT library instances to serve as part of a larger 2D FFT architecture. |
| Bitonic SIMD Sorting on AI Engine for float Datatypes | This tutorial implements a Bitonic SIMD sorter on AI Engine in Versal for float data types. Two examples are given. First, a small example using N=16 demonstrates the concept and identifies strategies for vectorization and management of the vector register space. These ideas are then applied to a second larger example using N=1024. Profiling and throughput performance are compared to `std::sort()`. |
| Fractional Delay Farrow Filter | This Versal system example implements a variable fractional delay algorithm using the Farrow Filter structure. It explains common AI Engine design optimization techniques. The design uses both AI Engine and PL kernels working cooperatively. AI Engine elements are hand-coded using AIE API. PL elements use Vitis HLS. The new v++ Unified Command Line flow manages system integration in the Vitis platform. |
| 1 Million Point float FFT @ 32 GSPS on AI Engine | This tutorial implements a 1M-point FFT for `cfloat` data types that achieves an impressive throughput rate exceeding 32 GSPS using a large portion of the AI Engine array for compute and PL URAM resources to implement a matrix transpose operation. |
| System Partitioning of a Hough Transform on AI Engine | This tutorial explains the process of planning the implementation of a well-known image processing algorithm, mapping and partitioning it to the resources available in a Versal Adaptive SoC device. It shows this using the Hough Transform, a feature extraction technique for computer vision and image processing. |
| MUSIC Algorithm on AI Engine | This tutorial implements the Multiple Signal Classification (MUSIC) Algorithm on the AI Engine. MUSIC is a popular algorithm for Direction of Arrival (DOA) estimation in antenna array systems. |
| Softmax Function on AI Engine | The softmax function is an activation function often used in the output layer of a neural network designed for multi-class classification. This tutorial provides an example of how to implement the softmax function to create custom machine learning inference applications on AI Engines. |
| Time-Division Multiplexed Mixer Example | This tutorial implements a time-division multiplexed (TDM) Mixer design on AI Engine. The design shows how to perform a "corner-turning" operation using the DMA hardware resources inside the AI Engine local tile, leaving core capacity available for compute workloads. The tutorial also shows how to vectorize workloads involving phase or frequency generation without lookup tables. |
| Back-Projection Synthetic Aperture Radar on AI Engine | This tutorial builds an example design for Synthetic Aperture Radar using Vitis Libraries and custom API coding for use with the GOTCHA data set. The design achieves ~2.5 frames per second for 512 x 512 images and 586 radar pulses with fewer than 32 tiles. A large design with eight engine instances achieves close to 20 frames per second. |
Copyright © 2020–2026 Advanced Micro Devices, Inc.