runtime (gc): restructure blocks GC metadata#5463
Open
niaow wants to merge 1 commit into
Open
Conversation
40a8317 to
4161a04
Compare
Originally, the blocks GC simply stored a 2-bit state value for each block with 4 options: - umarked head - marked head - tail (continuation of an object) - free The GC cycled the blocks through these states appropriately. Then the allocator would search for appropriate ranges of free blocks. This design resulted in excessive memory fragmentation due to the way that the allocator had to search for free ranges. To fix this issue, we created a data structure to track the free ranges that is rebuilt after every GC. This mostly fixed the memory fragmentation issue. The other issue with this original approach is that it resulted in quadratic performance degredation when scanning free lists. To solve this, we added a header to each heap object to form a linked stack. This ensured that each object only needed to be visited once. As these improvements were made, TinyGo began practically supporting larger and larger heaps. The current structure where we loop over individual blocks is no longer efficient. We need to change the metadata to support more efficient traversal. This commit changes the per-block metadata into a pair of bitmaps: an "ends" bitmap and a "visited" bitmap. The "ends" bitmap is used by the marking and sweeping logic to find the end (containing the header) of an object. The "visited" bitmap is to track blocks which have been visited by mark, including both ends and non-ends. Most operations can be performed by scanning over these bitmaps rather than looping over individual blocks. The "visited" bitmap also fixes the last remaining case for quadratic performance degredation. In the event that many pointers referred to the start of a large object, the marking code would scan across the whole object to find the end every time. The new marking code adds every block between the marked address and the end to the bitmap. Subsequent marks to the same object will detect the already-visited tail and stop early.
Member
Author
|
This change can improve the sweep performance by nearly 60x in large heaps. The new bitmap scanning code processes whole words per loop iteration (so 64 blocks on 64-bit systems) and the average resulting free range size can easily climb into tens of thousands of blocks on large heaps. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Originally, the blocks GC simply stored a 2-bit state value for each block with 4 options:
The GC cycled the blocks through these states appropriately. Then the allocator would search for appropriate ranges of free blocks.
This design resulted in excessive memory fragmentation due to the way that the allocator had to search for free ranges. To fix this issue, we created a data structure to track the free ranges that is rebuilt after every GC. This mostly fixed the memory fragmentation issue.
The other issue with this original approach is that it resulted in quadratic performance degredation when scanning free lists. To solve this, we added a header to each heap object to form a linked stack. This ensured that each object only needed to be visited once.
As these improvements were made, TinyGo began practically supporting larger and larger heaps. The current structure where we loop over individual blocks is no longer efficient. We need to change the metadata to support more efficient traversal.
This commit changes the per-block metadata into a pair of bitmaps: an "ends" bitmap and a "visited" bitmap. The "ends" bitmap is used by the marking and sweeping logic to find the end (containing the header) of an object. The "visited" bitmap is to track blocks which have been visited by mark, including both ends and non-ends. Most operations can be performed by scanning over these bitmaps rather than looping over individual blocks.
The "visited" bitmap also fixes the last remaining case for quadratic performance degredation. In the event that many pointers referred to the start of a large object, the marking code would scan across the whole object to find the end every time. The new marking code adds every block between the marked address and the end to the bitmap. Subsequent marks to the same object will detect the already-visited tail and stop early.