okay so hamming distance of 2 strings with 1000 (u8/byte) characters each takes 80 nanoseconds. yes, NANOseconds
thats 1 byte every 0.08 ns on my 2.7 GHz (0.4 ns/cycle) intel i7 cpu, or 5 byte every cycle
thats 1 byte every 0.08 ns on my 2.7 GHz (0.4 ns/cycle) intel i7 cpu, or 5 byte every cycle
wait... how can 1 byte take less than 1 cycle to process?? simd! 32 bytes are processed at once in a loop. therefore, each loop iteration takes slightly more than 6 cycles
this makes sense, since 2 pipelined loads + 1 cycle compare + 2 cycle movemask + 1 cycle popcount + 1 cycle addition = ~6 cycles
however, the benchmarked throughput matches the theoretical throughput so well that i think there must be a bug somewhere. i need to double check
and of course, levenshtein distance takes ~8.5 us for two 1000 byte strings, which is 1 byte every 8.5 ns or 1 byte every 23 cycles
very, very fast but my code only supports up to 30 edits between two strings and im not paid enough to make it handle more
very, very fast but my code only supports up to 30 edits between two strings and im not paid enough to make it handle more