okay so hamming distance of 2 strings with 1000 (u8/byte) characters each takes 80 nanoseconds. yes, NANOseconds

thats 1 byte every 0.08 ns on my 2.7 GHz (0.4 ns/cycle) intel i7 cpu, or 5 byte every cycle
wait... how can 1 byte take less than 1 cycle to process?? simd! 32 bytes are processed at once in a loop. therefore, each loop iteration takes slightly more than 6 cycles
this makes sense, since 2 pipelined loads + 1 cycle compare + 2 cycle movemask + 1 cycle popcount + 1 cycle addition = ~6 cycles
however, the benchmarked throughput matches the theoretical throughput so well that i think there must be a bug somewhere. i need to double check
and of course, levenshtein distance takes ~8.5 us for two 1000 byte strings, which is 1 byte every 8.5 ns or 1 byte every 23 cycles

very, very fast but my code only supports up to 30 edits between two strings and im not paid enough to make it handle more
You can follow @daniel_c0deb0t.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: