if you ported a libc to use a `io_uring`-style syscall interface that serializes the arguments to one ring buffer and then polls another for completion, you could make an application that you compile once per architecture and then use per-OS personality modules to run on any OS
this mechanism is a lot like the Windows subsystems, but instead of being handled by the kernel, the subsystem code lives entirely in the userspace and doesn't need any OS cooperation at all

at the same time it has low overhead since it doesn't use dynamic binary translation
so this highlights the most interesting part of this approach: `io_uring`-style interfaces erase the boundary between "syscalls" and "message passing". you're submitting IO requests to a bus of a distributed system, which can have any internal structure https://twitter.com/balister/status/1521148602667745281
i'm imagining an OS ABI based on a single syscall that performs communication through a pair of ring buffers. ptrace()? run your application under test with an interposer, then you can trivially observe and modify anything that it does
containers? FUSE? all doable in userspace
it is very similar, and that's one place i took inspiration from! modern PCI devices communicate data flow through ring buffers, and events through doorbell registers and IRQs.
we could get rid of the single syscall by having the entrypoint register 'IRQs' https://twitter.com/lachlansneff/status/1521150559612547077
(if you go for an IRQ-style interface rather than a syscall-style interface, you need to either transform all your code to CPS, or do really weird things to support threads with C, so a syscall-style interface is probably more practical. but they're equivalent)
this is how hardware generally works: the chips that run your computer are made out of FIFOs, IRQs, and shared memory. i think that an OS and userspace that are designed in the same way would be beautiful and provide interesting capabilities
although i'm not very familiar with Windows NT internals, my understanding is that what i'm describing is similar to how IRPs work https://twitter.com/pawel_lasek/status/1521152910066954241
what i'm describing is also similar to how microkernels work, but has an important difference: this OS ABI doesn't require a specific kernel architecture. POSIX applications talk POSIX to... something. it could be a monolithic kernel like Linux, or a userspace personality module
the core interface, the only thing that actually needs to run in the privileged mode, would only need one extra function: passing memory capabilities around

`io_uring` actually already has that! the "fixed buffer" (and "fixed fd") functionality is capabilities in disguise
yes, exactly! if your command queue interface directly involves POSIX (rather than something lower-level like Mach message passing) then you can stall a lot less because your code doesn't need to thread results into next requests and you pipeline more https://twitter.com/404IdentityNot1/status/1521155687048392706
this style of interface is also perfect for virtualization. there's no longer any meaningful difference between "virtualization", "sandboxing", "syscall tracing", "subsystem implementation".
seccomp becomes a `switch(...){default: abort();}` in userspace https://twitter.com/whitequark/status/1521150798029279232
it looks like RPC because so far i've focused on implementing POSIX syscalls and compared the approach to `io_uring`. replace CQ/SQ with "input FIFO" and "output FIFO" and now you can do arbitrary message passing https://twitter.com/evntdrvn/status/1521159520784896003
oh absolutely. this model explicitly makes the core OS a distributed (software) system that can be nicely mapped to a distributed (hardware) system that it runs on. you could transparently replace OS components with hardware, too! https://twitter.com/evntdrvn/status/1521161206148521984
scratch the "one syscall" part. it can be done in _zero syscalls_. make every ringbuffer you use have a control page that uses MMIO (with a hardware impl) or traps on access (with a software impl). a write to CQ tail pointer returns when it's submitted https://twitter.com/sargun/status/1521163176527548416
You can follow @whitequark.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: