Thread by @WEREMSOFT, This thread is about Data Oriented Design, #webassembly and about how the [...]

This thread is about Data Oriented Design, #webassembly and about how the future of Web Development(specially front-end development) looks a lot like systems programming.

Browsers becomes a lot like virtual machines, in the sense that they have a hardware abstraction layer, and can run programs, closer to the hardware.

Still, web development was not caching up. The development ecosystem is polluted and the programs that comes from the field are wasteful and inefficient. Web applications are too big, too small, and consumes too much electricity.

Power consumption is VERY important in the BIG (data centers) and in the small(mobile devices).

Data Centers from companies like google, amazon, or netflix, consumes as much energy as you can give them. There is a point that, no matter how much money you are willing to pay, you just can not have more energy.

This fact is wonderfully explained in the talk from @chandlerc1024 from 2014. "Efficiency with Algorithms, Performance with Data Structures" https://www.youtube.com/watch?v=fHNmRkzxHWs">https://www.youtube.com/watch...

The idea behind this, is that you want to create code that runs FAST, being fast means that you are more efficient. That translates on saving processing power, that consumes less energy, that all together means that you need LESS MONEY to run your business.

Back to frontend developer, newer browsers allows you to use hardware in a way that it wasn& #39;t possible before. Compiling your program in #webassembly allows you to use CPU features like SIMD and Multi threading. Things that was unthinkable before.

In the following graph, we can see the performance gain by using Single Instruction Multiple Data(SIMD) feature in Chrome Canary: The same operation runs 3x faster using SIMD.

Now look this graph showing the performance gain using multi thread on Chrome Canary in a webassembly program that process a matrix: Your code is 2x faster if you process your data in multiple threads.

That& #39;s pretty cool. You can be much faster if you use modern CPU features. But remember: "The fact that you are not VERY VERY SLOW, doesn& #39;t means that you are not SLOW, and you want to be FAST" as @Jonathan_Blow said.

The biggest problem today, is the time you need to get information from the memory, to the registers from the CPU. Take a look on this image: This is a photo of a Gigabyte TRX40 Aorus Pro motherboard, designed to host a Ryzen Threadripper 3970.

An AMD Rysen threadripper is one of the most powerful computers you can buy in a computer shop. WHAT A ROCKET! But there is a detail in the image I want you to notice. The distance between the CPU and the memory bank is 5.69 cm, that means around 11cm for a round trip.

In an hypothetical scenario, where you can send a ray of light, to fetch data from the CPU to the memory(ignoring the time that memory needs to look for the data), we want to know how much the CPU needs to wait to get the data into its registers to process it.

We can know this by dividing the speed of light by the number of CPU cycles: 300.000 km/s split by 4.5 Ghz. That give us that the light coers 6.67 cm in one CPU cycle. This is NOT fast enough to bring the data in one CPU cycle! Our data is 11cm away from the CPU!!

This is very nicely explained by @cmuratori on his video "Intro to C in windows" https://www.youtube.com/watch?v=T4CjOB0y9nI">https://www.youtube.com/watch...

Off course, we don& #39;t use light in the vacuum to move data from the memory to the CPU, in a straight line, we use electricity through copper wires that needs to trace big hooks to avoid electronic components in the motherboard. To solve this problem, the CPU uses cache.

In the following image, we can see how much the processor needs to wait for the data, depending in which cache is located. Think about "cycles" as the amount of time your CPU can not work on your problem because is waiting for data.

This was nicely explained in detail by @mike_acton on his GDC talk "Data Oriented Design in C++". https://www.youtube.com/watch?v=rX0ItVEVjHc">https://www.youtube.com/watch...

We can talk a lot about this subject, but the bottom line is that if you don& #39;t use you cache in a smart way, your program becomes very very slow. The way to be FAST is to put the data we want to process together process it in batch.

A good way to prove this, is traversing matrices row by row (very efficient because the data is contiguous) or column by column (very inefficient) because the CPU needs to chase the data jumping a row length in the memory.

Take a look in to the following tests I did in different languages: C++, C, Java, javascript...we have that the code is 19x faster if we manage to have it together. Doesn& #39;t matters which language your are using.

Take a look at this graphs and compare it with the ones that we use SIMD and multithreading. The performance gain we get with multithreading and SIMD is NOTHING compared with the performance increase we have if we code in a way that use the CPU cache efficiently.

Webassembly is not better because it allows us to use multi thread and SIMD, but because allows us to handle memory in a more efficient way in the browser. This knowledge was tabu in the frontend world, where frameworks like Angular and React are kings. But not anymore.

As frontend developers, we need to learn about hardware if we want to innovate and take the web development to the next level. I was looking into this subject for a while now and I was thinking on write a series of articles in a blog format, since the subject is huge.

Sorry for the typos. I& #39;ll try to put my experiences in a format that I can control more and where I can post code examples, I& #39;m thinking about @Medium . Stay tuned.

Latest Threads Unrolled: