Gemma3 models use sliding window attention for few layers and global attention for few layers. The ratio of this in gemma3 is 1/5 for global attention. Find the best ratio to use so the model is fast yet performs well.