Something I hadn& #39;t really considered with memory-mapped or bulk-read IO is that once you& #39;re reading through the buffer linearly interleaved with other processing, it& #39;s going to kick everything else out of the caches because of LRU replacement.
If you do traditional buffered IO where you keep updating the same buffer, that buffer& #39;s cache lines are just going to stay in cache and not pressure the rest of your working set.
E.g. consider a symbol table in a lexer/parser. Given the streaming read pattern, in theory the symbol table could use up most of your 256k L1 cache without any contention, so long as you were doing buffered IO with a buffer that& #39;s large enough to amortize each buffered read.