Been thinking about that lately, might write a blog, but here's a thread in the meantime.

I'm focused on Android but similar challenges as measuring server side perf.

Thread 🧵⤵️ https://twitter.com/gunnarmorling/status/1309923787354312704
First you need to decide what to measure. Typically it's "time to do X", though on UIs it can also be to FPS during a flow.

Volatility of these metrics is the real hard part. Perfs tests often run in complex systems where it's hard to get consistent and meaningful results.
One way to limit this is to make the CI system more consistent, but that's a hopeless battle until.. you change your metrics.
Time is a terrible way to measure perf in CI! So many factors influencing it. Instead, I like to pick proxy metrics that are shown to dominate the impact on performance.
I've done that recently with LeakCanary: the perf sensitive work is single threaded, must keep memory low and is primarily impact by IO reads.
So I added a layer to capture the number and size of IO reads, and also automated heap dump analysis at each step of the work to compute the exact memory retained by the worker thread (➡️independent from GC)
IO & retained memory are much less volatile & env dependent than "time to do X"
Of course, it doesn't exactly track reality. But neither does a time based test that runs in a test environment. Truth is, the only real test is running in production.
SEVs, load testing, perf investigations are great to find the shape of your perf problem. Once you know that, write simpler tests tracking proxy metrics to automate in CI.
Once you have a metric, recording metric value for each commit + automated PR comments + trend & alerts likely work better than green / red thresholds to avoid failure on a "last drop" change that has low impact but brings you over a threshold
You can follow @Piwai.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: