Skip to content

Commit 3644bd5

Browse files
Bhaskar VishnuVardhan ChebroluGitHub Enterprise
Bhaskar VishnuVardhan Chebrolu
authored and
GitHub Enterprise
committed
new performance category
1 parent 4c0ba41 commit 3644bd5

File tree

125 files changed

+52
-27
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

125 files changed

+52
-27
lines changed

cpp_kernels/README.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ This section contains HLS C/C++ Kernel Examples.
77
Example | Description | Key Concepts / Keywords
88
---------------|-----------------------|-------------------------
99
[array_partition/][]|This is a simple example of matrix multiplication (Row x Col) to demonstrate how to achieve better performance by array partitioning, using HLS kernel in Vitis Environment.|__Key__ __Concepts__<br> - [Kernel Optimization](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/vitis_hls_optimization_techniques.html)<br> - [HLS C Kernel](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/devckernels.html#hxx1556235054362)<br> - [Array Partition](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/vitis_hls_optimization_techniques.html#swq1539734225427)<br>__Keywords__<br> - [#pragma HLS ARRAY_PARTITION](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/hls_pragmas.html#gle1504034361378)<br> - [complete](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/hls_pragmas.html#gle1504034361378__ad410829)
10-
[axi_burst_performance/][]|This is an AXI Burst Performance check design. It measures the time it takes to write a buffer into DDR or read a buffer from DDR. The example contains 2 sets of 6 kernels each: each set having a different data width and each kernel having a different burst_length and num_outstanding parameters to compare the impact of these parameters on effective throughput.|
1110
[bind_op_storage/][]|This is simple example of vector addition to describe how to use BIND OP and STORAGE for better implementation style.|__Key__ __Concepts__<br> - [BIND OP](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/vitis_hls_optimization_techniques.html#yew1585574779610)<br> - [BIND STORAGE](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/vitis_hls_optimization_techniques.html#yew1585574779610)<br>__Keywords__<br> - [BIND_OP](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/hls_pragmas.html#ttl1584844636775)<br> - [BIND_STORAGE](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/hls_pragmas.html#chr1584844747152)<br> - [impl](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/hls_pragmas.html#ttl1584844636775__ad411605)<br> - [op](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/hls_pragmas.html#ttl1584844636775__ad411605)<br> - [type](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/hls_pragmas.html#ttl1584844636775__ad411605)<br> - [latency](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/hls_pragmas.html#ttl1584844636775__ad411605)
1211
[burst_rw/][]|This is simple example of using AXI4-master interface for burst read and write|__Key__ __Concepts__<br> - [burst access](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/vitis_hls_optimization_techniques.html#ddw1586913493144)<br>__Keywords__<br> - [memcpy](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/managing_interface_synthesis.html#qoa1585574520885)<br> - [max_read_burst_length](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/vitis_hls_optimization_techniques.html#mcz1586914389391)<br> - [max_write_burst_length](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/vitis_hls_optimization_techniques.html#mcz1586914389391)
1312
[critical_path/][]|This example shows a normal coding style which could lead to critical path issue and design will give degraded timing. Example also contains better coding style which can improve design timing.|__Key__ __Concepts__<br> - Critical Path handling<br> - Improve Timing<br>
@@ -17,7 +16,6 @@ Example | Description | Key Concepts / Keywords
1716
[dependence_inter/][]|This Example demonstrates the HLS pragma 'DEPENDENCE'.Using 'DEPENDENCE' pragma, user can provide additional dependency details to the compiler by specifying if the dependency in consecutive loop iterations on buffer is true/false, which allows the compiler to perform unrolling/pipelining to get better performance.|__Key__ __Concepts__<br> - [Inter Dependence](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/vitis_hls_optimization_techniques.html#wen1539734225565)<br>__Keywords__<br> - [DEPENDENCE](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/hls_pragmas.html#dxe1504034360397)<br> - [inter](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/hls_pragmas.html#dxe1504034360397__ad411019)<br> - [WAR](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/hls_pragmas.html#dxe1504034360397__ad411019)
1817
[gmem_2banks/][]|This example of 2ddr is to demonstrate how to use multiple ddr and create buffers in each DDR.|__Key__ __Concepts__<br> - [Multiple Banks](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/optimizingperformance.html#uuy1504034303412)<br>__Keywords__<br> - [m_axi_auto_max_ports](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/migrating_to_vitis_hls.html#jsk1590553271532)<br> - [sp](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/vitiscommandcompiler.html#clt1568640709907__section_tfc_zxm_1jb)<br> - [connectivity](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/vitiscommandcompiler.html#qcm1528577331870__section_wgd_dxf_dnb)
1918
[kernel_chain/][]|This is a kernel containing the cascaded Matrix Multiplication using dataflow. ap_ctrl_chain is enabled for this kernel to showcase how multiple enqueue of Kernel calls can be overlapped to give higher performance. ap_ctrl_chain allow kernel to start processing of next kernel operation before completing the current kernel operation.|__Key__ __Concepts__<br> - [ap_ctrl_chain](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/managing_interface_synthesis.html#qls1539734256651__ae476333)<br> - PLRAM<br>
20-
[kernel_global_bandwidth/][]|Bandwidth test of global to local memory.|
2119
[lmem_2rw/][]|This is simple example of vector addition to demonstrate how to utilize both ports of Local Memory.|__Key__ __Concepts__<br> - [Kernel Optimization](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/vitis_hls_optimization_techniques.html)<br> - [2port BRAM Utilization](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/managing_interface_synthesis.html#gen1585145183590__ae401668)<br> - two read/write Local Memory<br>__Keywords__<br> - [#pragma HLS UNROLL FACTOR=2](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/hls_pragmas.html#uyd1504034366571)
2220
[loop_pipeline/][]|This example demonstrates how loop pipelining can be used to improve the performance of a kernel.|__Key__ __Concepts__<br> - [Kernel Optimization](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/vitis_hls_optimization_techniques.html)<br> - [Loop Pipelining](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/vitis_hls_optimization_techniques.html#kcq1539734224846)<br>__Keywords__<br> - [pragma HLS PIPELINE](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/hls_pragmas.html#fde1504034360078)
2321
[loop_reorder/][]|This is a simple example of matrix multiplication (Row x Col) to demonstrate how to achieve better pipeline II factor by loop reordering.|__Key__ __Concepts__<br> - [Kernel Optimization](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/vitis_hls_optimization_techniques.html)<br> - Loop reorder to improve II<br>__Keywords__<br> - [#pragma HLS ARRAY_PARTITION](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/hls_pragmas.html#gle1504034361378)
@@ -31,7 +29,6 @@ Example | Description | Key Concepts / Keywords
3129

3230
[.]:.
3331
[array_partition/]:array_partition/
34-
[axi_burst_performance/]:axi_burst_performance/
3532
[bind_op_storage/]:bind_op_storage/
3633
[burst_rw/]:burst_rw/
3734
[critical_path/]:critical_path/
@@ -41,7 +38,6 @@ Example | Description | Key Concepts / Keywords
4138
[dependence_inter/]:dependence_inter/
4239
[gmem_2banks/]:gmem_2banks/
4340
[kernel_chain/]:kernel_chain/
44-
[kernel_global_bandwidth/]:kernel_global_bandwidth/
4541
[lmem_2rw/]:lmem_2rw/
4642
[loop_pipeline/]:loop_pipeline/
4743
[loop_reorder/]:loop_reorder/

host/README.md

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,9 @@ Example | Description | Key Concepts / Keywords
1414
[device_query/][]|This Example prints the OpenCL properties of the platform and its devices using OpenCL CPP APIs. It also displays the limits and capabilities of the hardware.|__Key__ __Concepts__<br> - OpenCL API<br> - Querying device properties<br>
1515
[errors/][]|This example discuss the different reasons for errors in OpenCL and how to handle them at runtime.|__Key__ __Concepts__<br> - OpenCL API<br> - Error handling<br>__Keywords__<br> - CL_SUCCESS<br> - CL_DEVICE_NOT_FOUND<br> - CL_DEVICE_NOT_AVAILABLE
1616
[errors_cpp/][]|This example discuss the different reasons for errors in OpenCL C++ and how to handle them at runtime.|__Key__ __Concepts__<br> - OpenCL Host API<br> - Error handling<br>__Keywords__<br> - CL_SUCCESS<br> - CL_DEVICE_NOT_FOUND<br> - CL_DEVICE_NOT_AVAILABLE<br> - CL_INVALID_VALUE<br> - CL_INVALID_KERNEL_NAME<br> - CL_INVALID_BUFFER_SIZE
17-
[hbm_bandwidth/][]|This is a HBM bandwidth check design. Design contains 3 compute units of a kernel which has access to all HBM pseudo-channels (0:31). Host application allocate buffer into all HBM banks and run these 3 compute units concurrently and measure the overall bandwidth between Kernel and HBM Memory.|
18-
[hbm_bandwidth_pseudo_random/][]|This is a HBM bandwidth example using a pseudo random 1024 bit data access pattern to mimic Ethereum Ethash workloads. The design contains 3 compute units of a kernel, reading 1024 bits from a pseudo random address in each of 2 pseudo channels and writing the results of a simple mathematical operation to a pseudo random address in 2 other pseudo channels. To maximize bandwidth the pseudo channels are used in P2P like configuration - See https://developer.xilinx.com/en/articles/maximizing-memory-bandwidth-with-vitis-and-xilinx-ultrascale-hbm-devices.html for more information on HBM memory access configurations. The host application allocates buffers in 12 HBM banks and runs the compute units concurrently to measure the overall bandwidth between kernel and HBM Memory.|__Key__ __Concepts__<br> - [High Bandwidth Memory](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/buildingdevicebinary.html#lgl1614021146997)<br> - [Multiple HBM Pseudo-channels](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/buildingdevicebinary.html#lgl1614021146997)<br> - Random Memory Access<br> - Linear Feedback Shift Register<br>__Keywords__<br> - [HBM](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/buildingdevicebinary.html#lgl1614021146997)<br> - [XCL_MEM_TOPOLOGY](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/optimizingperformance.html#utc1504034308941)<br> - [cl_mem_ext_ptr_t](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/optimizingperformance.html#utc1504034308941)
1917
[hbm_large_buffers/][]|This is a simple example of vector addition to describe how HBM pseudo-channels can be grouped to handle buffers larger than 256 MB.|__Key__ __Concepts__<br> - [High Bandwidth Memory](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/buildingdevicebinary.html#lgl1614021146997)<br> - Multiple HBM Pseudo-channel Groups<br>__Keywords__<br> - [HBM](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/buildingdevicebinary.html#lgl1614021146997)
2018
[hbm_rama_ip/][]|This is host application to test HBM interface bandwidth for buffers > 256 MB with pseudo random 1024 bit data access pattern, mimicking Ethereum Ethash workloads. Design contains 4 compute units of Kernel, 2 with and 2 without RAMA IP. Each compute unit reads 1024 bits from a pseudo random address in each of 2 pseudo channel groups and writes the results of a simple mathematical operation to a pseudo random address in 2 other pseudo channel groups. Each buffer is 1 GB large requiring 4 HBM banks. Since the first 2 CUs requires 4 buffers each and are then used again by the other 2 CUs, the .cfg file is allocating the buffers to all the 32 HBM banks. The host application runs the compute units concurrently to measure the overall bandwidth between kernel and HBM Memory.|__Key__ __Concepts__<br> - [High Bandwidth Memory](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/buildingdevicebinary.html#lgl1614021146997)<br> - [Multiple HBM Pseudo-channels](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/buildingdevicebinary.html#lgl1614021146997)<br> - Random Memory Access<br> - Linear Feedback Shift Register<br> - [RAMA IP](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/buildingdevicebinary.html#xxs1614054654284)<br>__Keywords__<br> - [HBM](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/buildingdevicebinary.html#lgl1614021146997)<br> - [ra_master_interface](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/buildingdevicebinary.html#xxs1614054654284)
2119
[hbm_simple/][]|This is a simple example of vector addition to describe how to use HLS kernels with HBM (High Bandwidth Memory) for achieving high throughput.|__Key__ __Concepts__<br> - [High Bandwidth Memory](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/buildingdevicebinary.html#lgl1614021146997)<br> - Multiple HBM pseudo-channels<br>__Keywords__<br> - [HBM](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/buildingdevicebinary.html#lgl1614021146997)<br> - [XCL_MEM_TOPOLOGY](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/optimizingperformance.html#utc1504034308941)<br> - [cl_mem_ext_ptr_t](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/optimizingperformance.html#utc1504034308941)<br> - [trace_memory](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/vitiscommandcompiler.html#lpy1600804966354__section_bmy_v3z_54b)<br> - [trace_buffer_size](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/xrtini.html#tpi1504034339424__section_tnh_pks_rx)<br> - [opencl_trace](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/xrtini.html#tpi1504034339424__section_tnh_pks_rx)
22-
[host_global_bandwidth/][]|Host to global memory bandwidth test|
23-
[host_memory_bandwidth/][]|This is host memory bandwidth example.|__Key__ __Concepts__<br> - host memory<br> - bandwidth<br> - address translation unit<br>__Keywords__<br> - XCL_MEM_EXT_HOST_ONLY<br> - HOST[0]
2420
[host_memory_copy_buffer/][]|This is simple host memory example to describe how host-only memory can be copied to device-only memory and vice-versa.|__Key__ __Concepts__<br> - host memory<br>__Keywords__<br> - XCL_MEM_EXT_HOST_ONLY<br> - CL_MEM_HOST_NO_ACCESS<br> - enqueueCopyBuffer
2521
[host_memory_copy_kernel/][]|This is a Host Memory Example to describe how data can be copied between host-only buffer and device-only buffer using User Copy Kernel.|__Key__ __Concepts__<br> - host memory<br>__Keywords__<br> - XCL_MEM_EXT_HOST_ONLY<br> - CL_MEM_HOST_NO_ACCESS<br> - [enqueueMapBuffer](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/opencl_programming.html#czb1555520653128)
2622
[host_memory_simple/][]|This is simple host memory example to describe how a user kernel can access the host memory. The host memory allocation is done through the host code. The kernel reads data from host memory and writes result to host memory.|__Key__ __Concepts__<br> - host memory<br> - address translation unit<br>__Keywords__<br> - XCL_MEM_EXT_HOST_ONLY<br> - HOST[0]
@@ -30,7 +26,6 @@ Example | Description | Key Concepts / Keywords
3026
[overlap/][]|This examples demonstrates techniques that allow user to overlap Host(CPU) and FPGA computation in an application. It will cover asynchronous operations and event object.|__Key__ __Concepts__<br> - OpenCL Host API<br> - [Synchronize Host and FPGA](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/opencl_programming.html#usz1524526733752)<br> - [Asynchronous Processing](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/opencl_programming.html#usz1524526733752)<br> - [Events](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/optimizingperformance.html#bsa1504034305860)<br> - [Asynchronous memcpy](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/opencl_programming.html#usz1524526733752)<br>__Keywords__<br> - [cl_event](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/opencl_programming.html#usz1524526733752)<br> - [cl::CommandQueue](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/opencl_programming.html#llr1524522915783)<br> - [CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/optimizingperformance.html#nzy1504034306881)<br> - [enqueueMigrateMemObjects](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/opencl_programming.html#xio1524524087132)
3127
[p2p_bandwidth/][]|This is simple example to test data transfer between SSD and FPGA.|__Key__ __Concepts__<br> - [P2P](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/Chunk938767849.html#qex1558551641915)<br> - SmartSSD<br> - XDMA<br>__Keywords__<br> - XCL_MEM_EXT_P2P_BUFFER<br> - pread<br> - pwrite
3228
[p2p_fpga2fpga/][]|This is simple example to explain P2P transfer between two FPGA devices.|__Key__ __Concepts__<br> - [P2P](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/Chunk938767849.html#qex1558551641915)<br> - Multi-FPGA Execution<br> - XDMA<br>__Keywords__<br> - XCL_MEM_EXT_P2P_BUFFER
33-
[p2p_fpga2fpga_bandwidth/][]|This is simple example to explain performance bandwidth for P2P transfer between two FPGA devices.|__Key__ __Concepts__<br> - [P2P](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/Chunk938767849.html#qex1558551641915)<br> - Multi-FPGA Execution<br> - XDMA<br>__Keywords__<br> - XCL_MEM_EXT_P2P_BUFFER
3429
[p2p_overlap_bandwidth/][]|This is simple example to test Synchronous and Asyncronous data transfer between SSD and FPGA.|__Key__ __Concepts__<br> - [P2P](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/Chunk938767849.html#qex1558551641915)<br> - SmartSSD<br> - XDMA<br>__Keywords__<br> - XCL_MEM_EXT_P2P_BUFFER<br> - pread<br> - pwrite
3530
[p2p_simple/][]|This is simple example of vector increment to describe P2P between FPGA and NVMe SSD.|__Key__ __Concepts__<br> - [P2P](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/Chunk938767849.html#qex1558551641915)<br> - NVMe SSD<br> - SmartSSD<br>__Keywords__<br> - XCL_MEM_EXT_P2P_BUFFER<br> - pread<br> - pwrite<br> - O_DIRECT<br> - O_RDWR
3631
[streaming_free_running_k2k/][]|This is simple example which demonstrate how to use and configure a free running kernel.|__Key__ __Concepts__<br> - [Free Running Kernel](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/streamingconnections.html#uug1556136182736)<br>__Keywords__<br> - [ap_ctrl_none](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/managing_interface_synthesis.html#qls1539734256651__ae476284)<br> - [stream_connect](https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/buildingdevicebinary.html#yha1565541199876)
@@ -46,13 +41,9 @@ Example | Description | Key Concepts / Keywords
4641
[device_query/]:device_query/
4742
[errors/]:errors/
4843
[errors_cpp/]:errors_cpp/
49-
[hbm_bandwidth/]:hbm_bandwidth/
50-
[hbm_bandwidth_pseudo_random/]:hbm_bandwidth_pseudo_random/
5144
[hbm_large_buffers/]:hbm_large_buffers/
5245
[hbm_rama_ip/]:hbm_rama_ip/
5346
[hbm_simple/]:hbm_simple/
54-
[host_global_bandwidth/]:host_global_bandwidth/
55-
[host_memory_bandwidth/]:host_memory_bandwidth/
5647
[host_memory_copy_buffer/]:host_memory_copy_buffer/
5748
[host_memory_copy_kernel/]:host_memory_copy_kernel/
5849
[host_memory_simple/]:host_memory_simple/
@@ -62,7 +53,6 @@ Example | Description | Key Concepts / Keywords
6253
[overlap/]:overlap/
6354
[p2p_bandwidth/]:p2p_bandwidth/
6455
[p2p_fpga2fpga/]:p2p_fpga2fpga/
65-
[p2p_fpga2fpga_bandwidth/]:p2p_fpga2fpga_bandwidth/
6656
[p2p_overlap_bandwidth/]:p2p_overlap_bandwidth/
6757
[p2p_simple/]:p2p_simple/
6858
[streaming_free_running_k2k/]:streaming_free_running_k2k/

0 commit comments

Comments
 (0)