hot spots #44

xificurC · 2020-03-03T11:53:56Z

xificurC
Mar 3, 2020

Imagine a flame graph like

|-X-|----Y----|---X---|--X--|-X-|
|-----A1------|---C1--|--D1-|-E1|
|-----A-------|---C---|--D--|-E-|

We see most of the time was spent in A and there in Y, however after a closer look one might see X is the method where most of the CPU time is being spent. With an arbitrarily complex call tree it gets extremely complex to find such culprits in the flame graph.

What I'm trying to say is that there's several ways to look at the results:

looking from bottom we can say A is taking longest to run, therefore we need to optimize it. Maybe Y can be optimized. Maybe calls to A can be partially cached. Maybe ...
looking from execution time we can say X took longest to run, therefore optimizing it will yield great overall benefits.

Is there a way to obtain the latter results? In the underlying async-profiler I see something similar being printed to the JVM's console output, maybe giving access to that would be a start?

alexander-yakushev · 2020-03-03T12:54:50Z

alexander-yakushev
Mar 3, 2020
Maintainer

Hello Peter! I totally understand what you are talking about. It is quite often that the culprit hides in multiple flames, and digging them out requires a bit of effort. But there are not one or two, but even three ways to do it.

If you know X precisely, you can use flamegraph's Search feature. Searching for X will not just highlight X stacks in the graph, but will also display their total width percentage in the lower right corner.
You can pass :reverse? true option when creating a flamegraph to make the "upside down" graph. The flames will grow from the lowest frames of the stack towards the stack entrance. This is useful to uncover a common low-level operation (e.g., a hashmap access) that is evenly spread across many different code paths.
The most flexible solution, but requiring the most effort, is to use :transform option. It allows you to change each stack however you like using regular string-processing functions (e.g., regexes) before the flamegraph is generated. You can use it to trim and shave graphs, splice recursive stacks, merge separate code paths into one, etc. You can find some examples here: http://clojure-goes-fast.com/blog/clj-async-profiler-tips/#stack-transforming. In your case, this is ideal if you know and can enumerate all base stacks that precede X to make them disappear, and all different Xs to be rendered as a whole.

Hope this helps!

0 replies

xificurC · 2020-03-03T21:09:27Z

xificurC
Mar 3, 2020
Author

Hi Alex,

thanks for your tips. I'm familiar with the :transform option. I was mainly talking about an exploratory situation where you don't know what you're looking for, i.e. X is not known. There are e.g. times when a helper function or a library sits in between your business logic and low-level operations, causing slowdowns. These can become hard to spot in a flamegraph and it's hard to write a generic transformer that would uncover these cases, especially if they sit in the middle of the callstack.

I realized though that the flamegraphs are generated from a simple txt file that contains the stacktrace and # of samples. It's not hard to generate a rudimentary table from that. Here's a sample of where I got in 30 minutes:

(defn st&nm [ln] (let [ix (str/last-index-of ln " ")] [(subs ln 0 ix) (Long/valueOf (subs ln (inc ix)))]))
(defn st->calls [s] (set (str/split (clj-async-profiler.post-processing/demunge-java-clojure-frames s) #";")))
(defn profile->data [sts]
  (loop [mp {} tc 0 sts sts]
    (if-some [[ln & mr] sts]
      (let [[st nm] (st&nm ln)
            cls (st->calls st)]
        (recur (reduce (fn [ac cl] (update ac cl #(if % (+ % nm) nm))) mp cls) (+ tc nm) mr))
      [mp tc])))

(let [[mp tc] (profile->data (line-seq (io/reader "/tmp/clj-async-profiler/results/18-cpu-2020-03-03-14-04-30.txt")))]
  (println "Total calls:" tc)
  (pp/print-table (reduce (fn [ac [mt cn]]
                            (if (re-matches #"(clojure|nrepl)\..*" mt)
                              ac
                              (conj ac {:method mt :count cn :usage-% (-> cn (/ tc) (* 100) long)})))
                          [] (sort-by (comp - second) mp))))

and the output is something like

Total calls: 2060

|                                                                  :method | :count | :usage-% |
|--------------------------------------------------------------------------+--------+----------|
|                                                     java.lang.Thread.run |   1943 |       94 |
|                                              bus.program-impls/eval80541 |   1928 |       93 |
|                                    bus.program-impls/eval80541/fn--80542 |   1928 |       93 |
|          refactor-nrepl.ns.slam.hound.regrow/wrap-clojure-repl/fn--61172 |   1928 |       93 |
|                          bus.program-impls/eval80541/fn--80542/fn--80543 |   1917 |       93 |
|                                                          bus.program/run |   1776 |       86 |
|                                                         bus.program/-run |   1717 |       83 |
|                                               bus.program/-run/fn--79575 |    864 |       41 |
|                                                    bus.iterators/collect |    753 |       36 |
|                        bus.program_impls$_QMARK_map$reify__79865.hasNext |    461 |       22 |
|                                bus.program-impls/-QMARK-map/-next--79859 |    348 |       16 |
|                      bus.program-impls/-QMARK-map/-next--79859/fn--79860 |    327 |       15 |
|                bus.program-impls/eval80541/fn--80542/fn--80543/fn--80544 |    323 |       15 |
|                             bus.iterators$transform$reify__67544.hasNext |    302 |       14 |
|                                bus.iterators$transform$reify__67544.next |    298 |       14 |
|                                                       bus.iterators/take |    295 |       14 |
|                                       bus.program-impls/buffer/fn--79786 |    287 |       13 |
|                                                   bus.program-impls/beat |    252 |       12 |
|                                                            bus.util/get! |    176 |        8 |

This output makes me happy that flamegraphs are the default :) Nevertheless the table output can be helpful in certain scenarios, in addition to the flamegraphs.

Thanks for your time and let me know your thoughts on this!

0 replies

xificurC · 2020-03-09T11:48:48Z

xificurC
Mar 9, 2020
Author

Since this isn't really an issue I'm closing. I'm open to further discussions, if there's any interest.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

hot spots #44

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

hot spots #44

Uh oh!

xificurC Mar 3, 2020

Replies: 3 comments

Uh oh!

alexander-yakushev Mar 3, 2020 Maintainer

Uh oh!

xificurC Mar 3, 2020 Author

Uh oh!

xificurC Mar 9, 2020 Author

xificurC
Mar 3, 2020

alexander-yakushev
Mar 3, 2020
Maintainer

xificurC
Mar 3, 2020
Author

xificurC
Mar 9, 2020
Author