This repository extends FlashAttention and other Transformer operators for Dragon.
Following the design principle of Dragon, this repository devotes to unify the modeling of Transformers for NVIDIA GPUs, Apple Silicon processors, Cambricon MLUs and more AI accelerators.
Clone this repository to local disk and install:
cd flash-attention && mkdir build
cd build && cmake .. && make install -j $(nproc)
pip install ..
We thank the repositories: FlashAttention.