Skip to content

Conversation

@noemotiovon
Copy link
Collaborator

  • Stop relying on CANN's internal device ID retrieval; use a global variable instead.
  • Enforce stricter dimension validation in aclnnArange for better compatibility across CANN versions.

Make sure to read the contributing guidelines before submitting a PR

- Stop relying on CANN's internal device ID retrieval; use a global variable instead.
- Enforce stricter dimension validation in aclnnArange for better compatibility across CANN versions.
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Ascend NPU issues specific to Ascend NPUs labels Oct 24, 2025
@noemotiovon
Copy link
Collaborator Author

Multi-NPU parallel precision testing

Input:    What is the meaning of life?
Response: The meaning of life is a philosophical question that has been debated by philosophers for centuries. Some people believe that the meaning of life is to find happiness and fulfillment, while others believe that it is to serve a higher power or to contribute to society. Ultimately, the meaning of life is a personal and subjective question that varies from person to person.<|endoftext|>

Client   3, seq  123, junk =    4, prompt = 124, started decoding ...
Client   2, seq 117/128, prompt  245 t, response   71 t, time  1.52 s, speed 207.29 t/s, cache miss 0  

Input:    I want to learn how to play the piano. What would be the best way to do it?
Response: There are many ways to learn how to play the piano, including taking lessons, practicing regularly, and reading books or watching videos. You can also try online tutorials or join a local piano class. It's important to find a teacher who is experienced and patient with you, and to have a good practice routine to help you improve your skills. Good luck!<|endoftext|>

Client   2, seq  124, junk =    5, prompt = 144, started decoding ...
Client   5, seq 118/128, prompt  152 t, response   90 t, time  1.85 s, speed 130.52 t/s, cache miss 0  

Input:    What is the meaning of life?
Response: The meaning of life is a philosophical question that has been debated for centuries. Some people believe that the meaning of life is to find happiness and fulfillment, while others believe that it is to find a purpose in life and to make a positive impact on the world. The meaning of life is a complex and subjective question that is difficult to answer for everyone. It is a question that continues to be explored and debated by philosophers, scientists, and other thinkers.<|endoftext|>

Client   5, seq  125, junk =    7, prompt = 185, started decoding ...
Client   0, seq 115/128, prompt   75 t, response  128 t, time  2.64 s, speed 76.85 t/s, cache miss 0  

Input:    Are you familiar with the Special Theory of Relativity and can you explain it to me?
Response: Yes, I am familiar with the Special Theory of Relativity. It is a theory of physics that describes the relationship between the laws of physics as observed in inertial reference frames. It was developed by Albert Einstein in the early 20th century and is one of the most important theories in physics. The Special Theory of Relativity states that the laws of physics are the same for all observers, regardless of their relative motion. It also states that the speed of light in a vacuum is constant for all observers, and that the laws of physics are the same for all observers in all inertial reference frames. The Special Theory of Relativity is

Client   0, seq  126, junk =    7, prompt = 196, started decoding ...
Client   7, seq 122/128, prompt  229 t, response   74 t, time  1.44 s, speed 210.54 t/s, cache miss 0  

Input:    I want to learn how to play the piano. What would be the best way to do it?
Response: There are many ways to learn how to play the piano, but one of the best ways is to take a class with a teacher. You can also practice by listening to piano recordings and playing along with sheet music. Additionally, you can try playing the piano on your own and practicing regularly. It's important to keep practicing and to keep learning new things. Good luck!<|endoftext|>

Client   7, seq  127, junk =    5, prompt = 134, started decoding ...
Client   1, seq 119/128, prompt  118 t, response  128 t, time  2.49 s, speed 98.95 t/s, cache miss 0  

Input:    How to get a job at Google?
Response: To get a job at Google, you need to have a bachelor's degree in computer science or a related field, and you must have a strong work ethic and be willing to put in the necessary hours. You should also have a good understanding of programming and be able to work well in a team. You should also have a good understanding of Google's policies and procedures, and be willing to follow them. You should also have a good understanding of the Google culture and be willing to work hard to fit in. You should also have a good understanding of the Google job market and be willing to apply for jobs. You should also have a good understanding

Client   4, seq 120/128, prompt  137 t, response  128 t, time  2.50 s, speed 106.19 t/s, cache miss 0  

Input:    Are you familiar with the Special Theory of Relativity and can you explain it to me?
Response: Yes, I am familiar with the Special Theory of Relativity. It is a theory of physics that describes the behavior of objects moving at constant velocity in a vacuum. It was developed by Albert Einstein in the early 20th century and is based on the idea that the laws of physics are the same for all observers, regardless of their relative motion. The Special Theory of Relativity is a fundamental concept in physics and has had a significant impact on our understanding of the universe. It is also known as the "Special Theory of Relativity" because it applies only to objects moving at a constant velocity in a vacuum. The Special Theory of Rel

Client   6, seq 121/128, prompt   45 t, response  120 t, time  2.74 s, speed 60.12 t/s, cache miss 0  

Input:    How to get a job at Google?
Response: To get a job at Google, you will need to apply for a job opening on their website. You will need to provide your resume, cover letter, and any other relevant information. You will also need to apply for the job and wait for a response. If you are successful, you will be offered a job at Google. You can also apply for other job openings on their website. It is important to research the company and the job opening to ensure that you are applying for the right position. You can also use Google's job search engine to find job openings in your area. Good luck!<|endoftext|>

Client   3, seq 123/128, prompt  124 t, response  128 t, time  3.09 s, speed 81.63 t/s, cache miss 0  

Input:    Are you familiar with the Special Theory of Relativity and can you explain it to me?
Response: Yes, I am familiar with the Special Theory of Relativity. It is a theory of physics that describes the relationship between the laws of physics as observed in inertial reference frames. It was developed by Albert Einstein in the early 20th century and is one of the most important contributions of his work. The Special Theory of Relativity states that the laws of physics are the same in all inertial reference frames, and that the speed of light is constant in all inertial reference frames. It also states that the laws of physics are the same in all reference frames, and that the speed of light is constant in all reference frames. It

Client   2, seq 124/128, prompt  144 t, response  128 t, time  3.13 s, speed 86.87 t/s, cache miss 0  

Input:    What is the best way to cook a steak?
Response: The best way to cook a steak is to sear it on the outside and then grill it on the inside. This will help to sear the steak and make it more flavorful. You can also add some herbs and spices to the steak before grilling it to add some flavor. The steak should be cooked to an internal temperature of 145 degrees Fahrenheit for medium-rare. The steak should be cooked for 5-7 minutes per side. The steak should be cooked until it is cooked through and the juices are clear. The steak should be served with a side of mashed potatoes or a salad. The steak should be served with

Client   5, seq 125/128, prompt  185 t, response  128 t, time  3.10 s, speed 100.85 t/s, cache miss 0  

Input:    How to get a job at Google?
Response: To get a job at Google, you need to have a bachelor's degree in computer science or a related field, and you must have a strong work ethic and a passion for technology. You should also have excellent communication and problem-solving skills. You should also be willing to work long hours and be able to work independently. You should also be willing to take on new challenges and be able to adapt to change. You should also be willing to work with a team and be able to work well with others. You should also be willing to work in a fast-paced environment and be able to work well under pressure. You should also be willing to work

Client   0, seq 126/128, prompt  196 t, response  128 t, time  2.99 s, speed 108.49 t/s, cache miss 0  

Input:    Recommend some interesting books to read.
Response: Here are some interesting books to read:
  1. "The Alchemist" by Paulo Coelho
  2. "The Great Gatsby" by F. Scott Fitzgerald
  3. "The Catcher in the Rye" by J.D. Salinger
  4. "The Great Gatsby" by F. Scott Fitzgerald
  5. "The Catcher in the Rye" by J.D. Salinger
  6. "The Great Gatsby" by F. Scott Fitzgerald
  7. "The Catcher in the Rye" by J.D. Salinger

Client   7, seq 127/128, prompt  134 t, response  128 t, time  2.77 s, speed 94.48 t/s, cache miss 0  

Input:    Recommend some interesting books to read.
Response: Here are some interesting books to read:
1. "The Great Gatsby" by F. Scott Fitzgerald
2. "To Kill a Mockingbird" by Harper Lee
3. "1984" by George Orwell
4. "The Catcher in the Rye" by J.D. Salinger
5. "The Alchemist" by Paulo Coelho
6. "The Great Gatsby" by F. Scott Fitzgerald
7. "To Kill a Mockingbird" by Harper Lee
8. "1984" by George Orwell
9. "The Catcher in the Rye"

main: clearing the KV cache

run parameters as of 2025-10-24 06:35:39

main: n_parallel = 8, n_sequences = 128, cont_batching = 1, system tokens = 273
External prompt file: used built-in defaults
Model and path used:  /home/lichenguang25/.ollama/models/blobs/sha256-6f96e01a3f550ca08aea1e5725bb8d5a7eccc6f281c30417e9d380b8c46467bd

Total prompt tokens:  17075, speed: 510.09 t/s
Total gen tokens:     13278, speed: 396.66 t/s
Total speed (AVG):           speed: 906.74 t/s
Cache misses:             0

llama_perf_context_print:        load time =    7605.80 ms
llama_perf_context_print: prompt eval time =   13543.93 ms / 30601 tokens (    0.44 ms per token,  2259.39 tokens per second)
llama_perf_context_print:        eval time =     144.60 ms /    25 runs   (    5.78 ms per token,   172.90 tokens per second)
llama_perf_context_print:       total time =   33479.73 ms / 30626 tokens
llama_perf_context_print:    graphs reused =       1440

@noemotiovon
Copy link
Collaborator Author

noemotiovon commented Oct 24, 2025

Single-NPU performance testing

Before:

# ----------------------------- llama-cli -----------------------------
llama_perf_sampler_print:    sampling time =      39.44 ms /   127 runs   (    0.31 ms per token,  3219.75 tokens per second)
llama_perf_context_print:        load time =    1709.70 ms
llama_perf_context_print: prompt eval time =      15.00 ms /    20 tokens (    0.75 ms per token,  1333.33 tokens per second)
llama_perf_context_print:        eval time =     473.70 ms /   106 runs   (    4.47 ms per token,   223.77 tokens per second)
llama_perf_context_print:       total time =    1849.35 ms /   126 tokens
llama_perf_context_print:    graphs reused =        105
llama_memory_breakdown_print: | memory breakdown [MiB]  | total    free    self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - CANN0 (Ascend910B4) | 30196 = 28468 + (1300 =   942 +      48 +     310) +         427 |

# ----------------------------- llama-bench -----------------------------
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen2 1B F16                   | 942.43 MiB |   494.03 M | CANN       |  99 |           pp512 |      3305.08 ± 24.27 |

After:

# ----------------------------- llama-cli -----------------------------
llama_perf_sampler_print:    sampling time =      86.00 ms /   344 runs   (    0.25 ms per token,  4000.05 tokens per second)
llama_perf_context_print:        load time =    1702.86 ms
llama_perf_context_print: prompt eval time =      15.24 ms /    20 tokens (    0.76 ms per token,  1312.59 tokens per second)
llama_perf_context_print:        eval time =    1405.12 ms /   323 runs   (    4.35 ms per token,   229.87 tokens per second)
llama_perf_context_print:       total time =    2182.19 ms /   343 tokens
llama_perf_context_print:    graphs reused =        321
llama_memory_breakdown_print: | memory breakdown [MiB]  | total    free    self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - CANN0 (Ascend910B4) | 30196 = 28468 + (1300 =   942 +      48 +     310) +         427 |

# ----------------------------- llama-bench -----------------------------
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen2 1B F16                   | 942.43 MiB |   494.03 M | CANN       |  99 |           pp512 |      3311.32 ± 23.50 |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants