Replies: 12 comments
-
You should basically never do manipulations using the raw data pointer unless you are implementing a kernel. If you find your self in that situation the first thing to check is if there is an op that can be used (in your case some combination of If no or combination of ops fits the bill (which is pretty unusual at this point) then that means we are missing the corresponding kernel. In which case you should file an issue and if it's something we would add to |
Beta Was this translation helpful? Give feedback.
-
PS Implementing llama using the cpp API is a great exercise, you will definitely learn it :). |
Beta Was this translation helpful? Give feedback.
-
Thank you very much for the guidance @awni. I'll resume my effort tomorrow. My goal is to have this stuff working by Friday. Also, assuming that I get this stuff working, is this the kind of code which should be contributed as an example? If so, to which repo? |
Beta Was this translation helpful? Give feedback.
-
Hi @dougdew64 what I think would be a better example is doing this but using a C++ version of the |
Beta Was this translation helpful? Give feedback.
-
Thanks @awni. I'll complete my current effort and share a performance comparison of my code with the code of llama2.c and llama2.cpp. I'm hoping to demonstrate that using MLX yields a performance improvement. After that, I'll start over by using a C++ version of the |
Beta Was this translation helpful? Give feedback.
-
I think that I'm misunderstanding how to use the various MLX array access operations such as Please pardon my ignorance. I'm not a python developer (at least not yet) and am accustomed to doing pointer arithmetic. |
Beta Was this translation helpful? Give feedback.
-
Pretty much every op you see in there should have a direct and simple translation to the C++ API with the exception of bracket style slicing |
Beta Was this translation helpful? Give feedback.
-
Thank you very much for the guidance @awni. I'm laughing at myself getting my butt kicked by this exercise. Fortunately, it's fun. |
Beta Was this translation helpful? Give feedback.
-
Another point, if you find yourself assigning to an |
Beta Was this translation helpful? Give feedback.
-
You've anticipated a future question which I was going to ask. Thanks! |
Beta Was this translation helpful? Give feedback.
-
I think that I will learn some numpy basics (https://numpy.org/doc/stable/user/basics.html) and then start over in my code. |
Beta Was this translation helpful? Give feedback.
-
@awni I'm going to do as you suggested and start over with a goal of implementing a C++ version of the mlx.nn API. It dawned on me that by doing so, I'd be able to compare my llama results with the results generated by the already-existing python implementation, and could ask questions here when my results are different. That would be a much better support situation than I would have faced tomorrow when attempting to debug my llama2.cpp -> llama2.ccp+MLX implementation. Thank you again for providing such great support. Very much appreciated! |
Beta Was this translation helpful? Give feedback.
-
As a learning exercise, I'm re-implementing llama2.cpp atop MLX. At the moment, I'm attempting to re-implement the RoPE logic.
It seems to me that the simplest way to implement the kind of array accesses of the RoPE logic (highlighted on the right side of the screenshot) is to just do pointer arithmetic based upon the MLX array raw data pointer.
Am I correct? If not, what is the best way?
Beta Was this translation helpful? Give feedback.
All reactions