Here’s a fun bit of compiler optimization weirdness discovered by someone on /r/haskell:
1. Compile this program with `clang -O2` and time it.
2. Now split the definitions of do_work into separate files and compile and link them.
The separately-linked version is 30% faster!
1. Compile this program with `clang -O2` and time it.
2. Now split the definitions of do_work into separate files and compile and link them.
The separately-linked version is 30% faster!
This happens because Clang inlines do_work, and for some reason that makes the code meaningfully slower! Unfortunately I don’t know x86_64 well enough to understand why, but you can see the ASM diff here:
https://godbolt.org/z/hV6Je7
Maybe">https://godbolt.org/z/hV6Je7&... someone who knows more can explain!
https://godbolt.org/z/hV6Je7
Maybe">https://godbolt.org/z/hV6Je7&... someone who knows more can explain!