Skip to content

Conversation

@JTischbein
Copy link
Contributor

This is a draft of the hybrid model loading. It combines mmap for host buffers and DirectIO for device buffers.

  • Allows weight streaming from disk with the CPU backend
  • In case the backend is not capable of asynchronous updating the loading falls back to mmap (allows backends to decide whether it supports DirectIO or not)

Currently only the non-Windows path is implemented. Windows path will follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant