-
Notifications
You must be signed in to change notification settings - Fork 86
Description
We came across this problem during the updated ConvertToPlanarRGB tests.
Theoretically, the ConvertBits in this script is a NOP instruction, since nno bit depth conversion is performed.
However, if it calls GetFrame(0) in its constructor for frame-property detection, serious slowdown happens. When I removed the constructor-initiated GetFrame(0), it went back to normal speed. As a sidenote: ConvertToPlanarRGB is calling a similar GetFrame(0) but this has no slowing effect.
Blankclip(60000,width=1600, height=1600, pixel_type="YUV444P8")
ConvertToPlanarRGB(bits=8)
ConvertBits(8) # tested with and without this line.
The applied bits matched the whole flow: 8, 16 and 32 were used.
Benchmarks suggested that memory-cache size might also be involved in the problem. (<1% differences are only measurement errors: Avsmeter64 was used to get the fps data.)
# X,Y: Without-ConvertBits(n), With-ConvertBits(n) [fps]. ConvertBits constructor calls GetFrame(0)
# size: 400x400 600x600 800x800 1200x1200 1600x1600 2000x2000
# 8 bits 27376,27000 13200,13100 7600,7600 3351,3280 1720,1509 837,795
# 16 bits 24000,24000 11235,11200 6600,6100 1854,1420 671,599 380,353
# float 13900,14000 6300,5900 2600,1800 626, 545 285,270 178,171
Blank clip is single frame, and is generated once, then it goes into cache and remains there. Checked, true, we always get the very same single memory address for that precalculated frame. (BlankClip/Colorbars, all the same).
The in-constructor call GetFrame(0) is a usual way for obtaining clip-wide properties such as color matrix, full/limited/narrow range setting, etc.., assuming that such properties in frame #0 won't change across the whole clip. So the precalculations and the dispatched functions can be chosen once, during the filter instance creation.
Who is the culprit? Avisynth frame caching system? Or unfortunate memory layout? Memory? Mutexes waiting for each other?
Bad frame physical addresses? I read about that the cache lines and the address lookup and translation may depend on magic boundaries in processors. Logged the read and write pointers of ConvertToPlanarRGB for the different runs. But some thousand runs (automated, logged and analyzed) did not strengthen this possible reason.
At this point the free AI access was over :) Quickly subscribed my first AI month :)
I was adviced that what if I try GetFrame(100)? And yes, the slowdown was immediately over.
Reverted back to GetFrame(0) and after another advice I logged the first couple GetReadPtr and GetWritePtr pointers in ConvertToPlanarRGB and looked at it - this time within the same session. For the quick case (without ConvertBits) all addresses were the same. In the the slow cases the GetWritePtr pointers were altering between two addresses. And when ConvertBits's GetFrame(0) was removed, the addresses became identical again.
Those addresses came from env->NewVideoFrame, which in turn obtains the addresses from the buffer-reusing-helper, a so called Frame Registry.
Allocations based on Frame Registry are fast, really they are mostly not allocations. Frame Registry mechanism avoids OS re-allocations by giving back an unused (reference count = 0) frame/video frame buffer. Unused frames and buffer do not get freed up immediately, only in case of lack-of-memory.
So our write-pointer flip-flopped between two addresses at even-odd frames at the slow case. Analyzing the timings and the reference counts and even developing a last-released-first-reserved video frame buffer re-use logic, it still failed and was alternating between two addresses. (Btw: why it makes slower if writing into X address in even frames, Y address in odd frames instead of always writing into X address? This is subject to further investigation. Cache eviction?)
So far our good AI was helpful to detect what is not causing the problem.
The codebase is too complex and huge to pass an AI.
In the final solution I then tried to eliminate the creation of per-filter AvsCache instances prematurely, during the filter instantiation phase. Calling child->GetFrame(0) will construct a Cache object for the child (in our case, for ConvertToPlanarRGB), and fills it with frame #0 content, which somehow interacts with the future :)
I made another change: a No-Op filter simply returns one of its child/parameter clip unaltered. (e.g. during a clip1.ConvertBits(16) we detect that clip1 bit-depth is already 16-bits and return it.) In this case we prevent clip1 from getting yet another MTGuard and CacheGuard object.