Here’s a fun bit of compiler optimization weirdness discovered by someone on /r/haskell:

1. Compile this program with `clang -O2` and time it.
2. Now split the definitions of do_work into separate files and compile and link them.

The separately-linked version is 30% faster!
This happens because Clang inlines do_work, and for some reason that makes the code meaningfully slower! Unfortunately I don’t know x86_64 well enough to understand why, but you can see the ASM diff here:

https://godbolt.org/z/hV6Je7 

Maybe someone who knows more can explain!
You can follow @lexi_lambda.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: