[Curiosity]: When to build a Primitive and when _not_ to build a Primitive in MLX #1171

adhulipa · 2024-05-30T01:12:23Z

adhulipa
May 30, 2024

While working on pinv support in #875 it took me a rather embarrassingly long time to actually understand what MLX reviewers (Angelos, Awni) meant when they say ".. there isn't a need for a primitive; this op can be built in linalg namespace using svd..."

As I grappled with the design of the repo, cpp api and various other cascade of dependencies, it took me a while before I could write reliable unit tests and confidently run them to validate iterative changes.

One such iteration was a full-blown Primitive of PseudoInverse, which calls the SVD primitive and then hands off the matrix multiplication to lapack functions such as sgemm

Now that I had reliable unit tests, and a general build-change-test loop going, it was easier to remove all the Primitives code and simply build a linalg::pinv() cpp op that describes the pinv array as a graph of ops using linalg::svd.

What I am curious about is: why don't we want to build a Primitive here for pinv? Is it because all the lower-level lapack function it calls are just svd, and cblass_sgemm, which are both already have their mlx specific functions?

Is there a need to build a Primitive only when we discover a new algorithm (or package or library dependency) that provably performs a given operation, say pinv, better (in terms of speed, memory etc)?

awni · 2024-05-30T14:23:49Z

awni
May 30, 2024
Maintainer

There's no precise set of rules for when to make a new Primitive and when to use existing ops. Generally the default is to use existing ops unless there is a really good need for a primitive. Why?

Primitive's have a contract: they must be transformable under vmap, vjp, jvp, etc. Every new primitive is hence a lot of work if we don't want to break that contract (which we don't). If you can implement stuff in operations instead then you get all of that for free. (As you also probably observed, adding a new primitive is a lot more code / requires a much deeper integration. It's more complicated and the maintenance burden is higher. Best to avoid when possible)

So in that case when should you add a new Primitive?

If it's impossible or extremely inefficient to implement it in terms of existing ops
Kind of related, if it's much faster to implement custom kernels and making it faster is high impact. We put these in the mlx.fast namespace. Sometimes you call these "fat ops" since they are really just compositions of other ops that are hard to compile so we implement bespoke kernels for them. (E.g. RMSNorm, SDPA, etc).

0 replies

adhulipa · 2024-05-30T14:52:59Z

adhulipa
May 30, 2024
Author

Thanks Awni!

Primitive's have a contract: they must be transformable under vmap, vjp, jvp

Today, this contract isn’t enforced at compile time, right? IIRC it seems possible to add operation code as a Primitive without adding the vmap, vjp, jvp code. I’m not saying it should be enforced just curious if is; and the design overall; where the design is strict vs design is flexible for tinkering/hacking etc.

IIUC this contract is to ensure computations occur on the backward pass, correct? I.e. for training.

We put these in the mlx.fast
call these "fat ops"

Ah gotcha! I think I understand the “fat op” term better now. The catch here with these is that they’re only suitable for inference, right? I don’t recall if any of them have vjp, vmap impls —which I’m presuming are necessary only for training— and subsequently these fused fat op kernels cannot reliably easily implement vmap, vjp, jvp.

2 replies

awni May 30, 2024
Maintainer

Today, this contract isn’t enforced at compile time, right?

It's not and we already violate the contract in many places. But we're trying to decrease that number not increase it moving forward :).

IIUC this contract is to ensure computations occur on the backward pass

For example, yes. Other transforms are useful for other things (e.g. vmap).

The catch here with these is that they’re only suitable for inference, right?

That's not right. They can be suitable for anything it just depends on what you implement. Fast ops have fallbacks implemented in terms of MLX ops, so they are transformable. You can also implement custom transforms. Like RMSNorm has a custom VJP (which is itself a fast primitive). It's turtles the whole way.

adhulipa May 30, 2024
Author

Ahh I see. Super interesting stuff (more for me to read/dive into a la that proverbial turtle 🐢). Thanks Awni!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Curiosity]: When to build a Primitive and when _not_ to build a Primitive in MLX #1171

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[Curiosity]: When to build a Primitive and when _not_ to build a Primitive in MLX #1171

adhulipa May 30, 2024

Replies: 2 comments · 2 replies

awni May 30, 2024 Maintainer

adhulipa May 30, 2024 Author

awni May 30, 2024 Maintainer

adhulipa May 30, 2024 Author

adhulipa
May 30, 2024

Replies: 2 comments 2 replies

awni
May 30, 2024
Maintainer

adhulipa
May 30, 2024
Author

awni May 30, 2024
Maintainer

adhulipa May 30, 2024
Author