Thread by @daniel_c0deb0t, okay so hamming distance of 2 strings with 1000 (u8/byte) characters each [...]

Daniel Liu

daniel_c0deb0t

okay so hamming distance of 2 strings with 1000 (u8/byte) characters each takes 80 nanoseconds. yes, NANOseconds

thats 1 byte every 0.08 ns on my 2.7 GHz (0.4 ns/cycle) intel i7 cpu, or 5 byte every cycle

wait... how can 1 byte take less than 1 cycle to process?? simd! 32 bytes are processed at once in a loop. therefore, each loop iteration takes slightly more than 6 cycles

this makes sense, since 2 pipelined loads + 1 cycle compare + 2 cycle movemask + 1 cycle popcount + 1 cycle addition = ~6 cycles

however, the benchmarked throughput matches the theoretical throughput so well that i think there must be a bug somewhere. i need to double check

and of course, levenshtein distance takes ~8.5 us for two 1000 byte strings, which is 1 byte every 8.5 ns or 1 byte every 23 cycles

very, very fast but my code only supports up to 30 edits between two strings and im not paid enough to make it handle more

You can follow @daniel_c0deb0t.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: