|
| 1 | +# Performance with Native Code |
| 2 | + |
| 3 | +How to assess performance of Dart and native code, and how to improve it. |
| 4 | + |
| 5 | +## Profiling Performance |
| 6 | + |
| 7 | +| Tool | Platform | Primary Use Case | Measures (Dart CPU) | Measures (Native CPU) | Measures (Dart Heap) | Measures (Native Heap) | |
| 8 | +| --------------------------------------- | --------- | --------------------------------------- | ---------------------------- | ------------------------ | -------------------- | ---------------------------------------------------------------- | |
| 9 | +| [Dart DevTools] | All | Profiles Dart VM, UI jank, Dart heap | Yes | Opaque "Native" block | Yes | Tracks "External" VM-aware memory only; Misses native-heap leaks | |
| 10 | +| [Xcode Instruments (Time Profiler)] | iOS/macOS | Profiles native CPU call stacks | No | Yes (full symbolication) | No | No | |
| 11 | +| [Xcode Instruments (Leaks/Allocations)] | iOS/macOS | Profiles native heap (malloc, mmap) | No | No | No | Yes | |
| 12 | +| [Android Studio Profiler (CPU)] | Android | Profiles native C/C++ CPU execution | No | Yes (traces C++ calls) | No | No | |
| 13 | +| [Perfetto (heapprofd)] | Android | Advanced native heap profiling | No | No | No | Yes (traces malloc/free call stacks) | |
| 14 | +| [Linux perf] | Linux | Unified Dart AOT + Native CPU profiling | Yes (requires special flags) | Yes | No | No | |
| 15 | +| [Visual Studio CPU Usage Profiler] | Windows | Profiles native C/C++ CPU execution | No | Yes (traces C++ calls) | No | No | |
| 16 | +| [WPA (Heap Analysis)] | Windows | Advanced native heap profiling | No | No | No | Yes (traces malloc/free call stacks) | |
| 17 | + |
| 18 | +<!-- TODO: Add documentation for the other tools. --> |
| 19 | + |
| 20 | +### Dart DevTools |
| 21 | + |
| 22 | +For only assessing the performance of the Dart code, and treating native code as |
| 23 | +a black box, use the Dart performance tooling. |
| 24 | + |
| 25 | +See the documentation on https://dart.dev/tools/dart-devtools and |
| 26 | +https://docs.flutter.dev/perf. For FFI, most specifically, you can use |
| 27 | +https://docs.flutter.dev/tools/devtools/cpu-profiler and |
| 28 | +https://docs.flutter.dev/tools/devtools/performance#timeline-events-tab. |
| 29 | +For synchronous FFI calls you can add synchronous timeline events, and for |
| 30 | +asynchronous code (using async callbacks or helper isolates) you can use async |
| 31 | +events. |
| 32 | + |
| 33 | +### `perf` on Linux |
| 34 | + |
| 35 | +To see both Dart and native symbols in a flame graph, you can use `perf` on |
| 36 | +Linux. |
| 37 | + |
| 38 | +To run the [FfiCall benchmark] in JIT mode with `perf`: |
| 39 | + |
| 40 | +``` |
| 41 | +$ perf record -g dart --generate-perf-events-symbols benchmarks/FfiCall/dart/FfiCall.dart && \ |
| 42 | +perf report --hierarchy |
| 43 | +``` |
| 44 | + |
| 45 | +Note that Flutter apps are deployed in AOT mode. So prefer profiling in AOT |
| 46 | +mode. |
| 47 | + |
| 48 | +For AOT, we currently don't have a [single command |
| 49 | +yet](https://github.com/dart-lang/sdk/issues/54254). You need to use |
| 50 | +`precompiler2` command from the Dart SDK. See [building the Dart SDK] for how to |
| 51 | +build the Dart SDK. |
| 52 | + |
| 53 | +``` |
| 54 | +$ pkg/vm/tool/precompiler2 benchmarks/FfiCall/dart/FfiCall.dart benchmarks/FfiCall/dart/FfiCall.dart.bin && \ |
| 55 | +perf record -g pkg/vm/tool/dart_precompiled_runtime2 --generate-perf-events-symbols benchmarks/FfiCall/dart/FfiCall.dart.bin && \ |
| 56 | +perf report --hierarchy |
| 57 | +``` |
| 58 | + |
| 59 | +To analyze a performance issue in Flutter, it is best to reproduce the issue in |
| 60 | +Dart standalone. |
| 61 | + |
| 62 | +## Improving performance |
| 63 | + |
| 64 | +There are some typical patterns to improve performance: |
| 65 | + |
| 66 | +* To avoid dropped frames, move long-running FFI calls to a helper isolate. |
| 67 | +* To avoid copying data where possible: |
| 68 | + * Keep data in native memory, operating on [`Pointer`][]s and using |
| 69 | + [`asTypedList`][] to convert the pointers into [`TypedData`][]. |
| 70 | + * For short calls, if the memory is in Dart, avoid copying by using leaf calls |
| 71 | + ([`isLeaf`][], [`isLeaf` (2)][], [`isLeaf` (3)][]) and [`address`]. (Leaf |
| 72 | + calls prevent the Dart GC from running on all isolates, which allows giving |
| 73 | + a pointer to native code of an object in Dart.) |
| 74 | + * Use [`Isolate.exit`][] to send large data from a helper isolate to the main |
| 75 | + isolate after a large computation. |
| 76 | +* For many small calls, limit the overhead per call. This makes a significant |
| 77 | + difference for calls shorter than 1 us (one millionth of a second), and can be |
| 78 | + considered for calls of up to 10 us. |
| 79 | + * Use leaf calls ([`isLeaf`][], [`isLeaf` (2)][], [`isLeaf` (3)][]). |
| 80 | + * Prefer using [build hooks][] with [`Native`] `external` |
| 81 | + functions over [`DynamicLibrary.lookupFunction`][] and |
| 82 | + [`Pointer.asFunction`][]. |
| 83 | + |
| 84 | + For reference, the [FfiCall benchmark][] reports 1000 FFI calls in AOT on Linux x64: |
| 85 | + ``` |
| 86 | + FfiCall.Uint8x01(RunTime): 234.61104068226345 us. |
| 87 | + FfiCall.Uint8x01Leaf(RunTime): 71.9994712538334 us. |
| 88 | + FfiCall.Uint8x01Native(RunTime): 216.07292770828917 us. |
| 89 | + FfiCall.Uint8x01NativeLeaf(RunTime): 27.64136415181509 us. |
| 90 | + ``` |
| 91 | + A single call that is native-leaf takes 28 ns, while an `asFunction`-non-leaf |
| 92 | + takes 235 ns. So for calls taking ~1000 ns that's a 20% speedup. |
| 93 | + |
| 94 | +## Community sources |
| 95 | + |
| 96 | +* (Video) Using Dart FFI for Compute-Heavy Tasks: |
| 97 | + https://www.youtube.com/watch?v=eJR5C0VRCjU |
| 98 | +* (Video) Maximize Speed with Dart FFI: Beginner’s Guide to High-Performance |
| 99 | + Integration https://www.youtube.com/watch?v=HF8gHAakb1Q |
| 100 | + |
| 101 | +[`address`]: https://api.dart.dev/dart-ffi/StructAddress/address.html |
| 102 | +[`asTypedList`]: https://api.dart.dev/dart-ffi/Uint8Pointer/asTypedList.html |
| 103 | +[`DynamicLibrary.lookupFunction`]: https://api.dart.dev/dart-ffi/DynamicLibraryExtension/lookupFunction.html |
| 104 | +[`isLeaf` (2)]: https://api.dart.dev/dart-ffi/NativeFunctionPointer/asFunction.html |
| 105 | +[`isLeaf` (3)]:https://api.dart.dev/dart-ffi/DynamicLibraryExtension/lookupFunction.html |
| 106 | +[`isLeaf`]: https://api.dart.dev/dart-ffi/Native/isLeaf.html |
| 107 | +[`Isolate.exit`]: https://api.dart.dev/dart-isolate/Isolate/exit.html |
| 108 | +[`Native`]: https://api.dart.dev/dart-ffi/Native-class.html |
| 109 | +[`Pointer.asFunction`]: https://api.dart.dev/dart-ffi/NativeFunctionPointer/asFunction.html |
| 110 | +[`Pointer`]: https://api.dart.dev/dart-ffi/Pointer-class.html |
| 111 | +[`TypedData`]: https://api.dart.dev/dart-typed_data/TypedData-class.html |
| 112 | +[Android Studio Profiler (CPU)]: https://developer.android.com/studio/profile |
| 113 | +[build hooks]: https://dart.dev/tools/hooks |
| 114 | +[building the Dart SDK]: https://github.com/dart-lang/sdk/blob/main/docs/Building.md |
| 115 | +[Dart DevTools]: https://dart.dev/tools/dart-devtools |
| 116 | +[FfiCall benchmark]: https://github.com/dart-lang/sdk/blob/main/benchmarks/FfiCall/dart/FfiCall.dart |
| 117 | +[Linux perf]: https://perfwiki.github.io/main/ |
| 118 | +[Perfetto (heapprofd)]: https://perfetto.dev/ |
| 119 | +[Visual Studio CPU Usage Profiler]: https://learn.microsoft.com/en-us/visualstudio/profiling/cpu-usage |
| 120 | +[WPA (Heap Analysis)]: https://learn.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-analyzer |
| 121 | +[Xcode Instruments (Leaks/Allocations)]: https://developer.apple.com/documentation/xcode/gathering-information-about-memory-use |
| 122 | +[Xcode Instruments (Time Profiler)]: https://developer.apple.com/tutorials/instruments |
0 commit comments