It's running in US and EU (helps avoid atlantic routrip tax), in this one i am doing some 100s of checks, not simple CRUD work. With Go you can optimize a lot without complexity of Rust.
Specifically how can you use pgx with sqlite while pgx is a postgres-specific library? Sqlc works great with Postgres or Sqlite, Sqlc works with pgx when connecting to Postgres, but pgx can't be used with Sqlite AFAIK
Thanks for the link, bookmarking. I should note ymawky's main portability issues are unfortunately at the syscall layer rather than the asm layer. proc_info() and getdirentries64() are pretty Darwin-specific, so making it portable would require reworking that whole area rather than adjusting register/calling conventions.
Thank you! I've been obsessed with this idea for a while, finally decided to start on it, then obsessed over it for a couple weeks. I'd love to see some of your projects if you have anything similar, I'm glad I'm not the only one too! I think most programmers would benefit a lot from taking a few weeks or months to try and learn some assembly, and demystify how CPUs and compiled languages work.
Even after we've all retired (pretty soon for those who can afford it) or transitioned out of software engineering (for those who can't), we'll still get to amuse each other with home-brew projects like this. Warm fuzzy feeling - I'll take it!
I feel the guy’s suspicion towards any high level language. I exclusively programmed in assembly on C64, Amiga and the recognized that this ain’t sustainable on PC because there are more and more edge cases or different machine configurations.
I had a very hard time simply using and even utilizing C++ or Java.
C and Turbo Pascal especially was easier because the compiled code was very much resembling to hand written code.
As the author described, you can do in 4.000 lines what others can do with way less pain in 100.
So you build macros, come up with your own library and in the end you kind of build a meta language build on top of assembly because some lines are so hard to grasp that you delegate working code into a library for reuse.
It is funny how much we take conventions for numbers for granted. If you happen to know assembly and its intricacies you immediately will learn to work with a sign bits which mark negative numbers. But how do you know? Maybe you use the whole addressable space only for positive numbers.
Small things that make a huge different.
Nice article, I enjoyed your adventures and would do the same.
Thank you! The thing about eventually building your own meta language ends up happening all the time with bigger assembly projects. I do have a fair few quality-of-life macros too, but probably fewer than I should. I did end up needing to implement by hand what would be standard functions, things like atoi, itoa, strlen, memcpy, streqn.
Higher level languages are more convenient for 99% of things, but the directness of Assembly gives me a rush unlike any other. I didn't live through the C64/Amiga, but I was obsessed with old C64/ZX emulators growing up.
I'm wanting to read this repository as a learning tool, so it'd also be nice to include docs—even AI-generated docs, but obvious I'd prefer docs with your own design notes and decisions—about the architecture of the code.
Thanks, I appreciate it a lot! I tried to comment my code pretty heavily (~3000 lines of code, ~1000 lines of comments all together), since this was a learning project for myself in the first place. Hopefully those will be of some use. But separate in-depth documentation is definitely a good idea, I'll work on adding that. In the meantime I'm always down to answer any questions about it!
Honestly, read the main file, ymawky.S first. Then I'd read through get.S maybe, checking parse.S on an as-needed basis for parsing-related functions. delete.S or options.S are pretty short, too, so give those a read too.
Modularizing it into multiple files was easier than I expected it to be, you basically have other functions/labels in other files, and mark them as .global at the top. The Makefile compiles each file into their own .o, which you then link all together. You can "b" or "bl" to any label from any other file, as long as it's global and linked together. Same with data in .bss or .data, mark them as .global and they can be accessed from elsewhere.
Why stop there? Next, I'm prying open a CPU and poking the transistors with a 9V battery and paperclips to make it execute what I want. Slower, but you get so much control.
This post seems to now link to the writeup rather than the repository, sorry! The repo can be found at the top of that page, or directly here: https://github.com/imtomt/ymawky
Whoops that was my fault. Fixed now. (I emailed you, btw, that we'd changed your title, but I forgot to switch the URL back to the repo. Both links are cool.)
I'm sure I'm not the only one who has fantasized about doing something like this as a self-soothing enterprise. Kudos to you for actually doing it!
Hey, thank you! Means a lot. It's an odd sort of meditation, but is surprisingly the most almost-therapeutic project I've worked on. Something about the constraints of Assembly that really pull you into the minutiae and clears your head, maybe.
An agentic LLM should be pretty good at Arm64 assembly generation, but maintainability of large code could become an issue. Why would it not run on Linux?
I wrote it for MacOS because I don't have a Linux machine right now :( Once I get one up and running again, I'll probably work on porting this.
As for why it wouldn't run on Linux, there are some pretty big differences in the actual assembly. One pretty superficial difference is calling conventions -- MacOS uses the x16 register for syscall numbers, Linux uses x8. Calling the kernel in Mac uses "svc #0x80", in Linux it's "svc #0". That's ~120 lines that need to be replaced, but easy enough to just use sed. Syscall numbers are all different, as are the struct layouts for sigaction(), MacOS has an "sa_tramp" field that Linux doesn't have. Enforcing max processes is done here using the MacOS-specific proc_info() syscall, which can be used to get the number of children any given process has. Linux doesn't have an equivalent, so process tracking would need to be done differently. Finally, Linux has the getdents64() syscall, rather than getdirentries64(), which uses a different struct and is called differently.
I'm sure an LLM could make all those changes, but it's a pretty large codebase, so it would probably make some mistakes or miss things.
I've used Python (django/flask/fast api), Java (springboot), Ruby on Rails for writing web applications and APIs.
Nothing beats Go.
When you use HTMLX (goat) + sqlc (goat) + pgx (another goat) + Chi (yet another goat) and Sqlite (goat).
Most apps will not need anything more than Sqlite, i've several sqlite apps doing a couple of million visits per day.
Compiles to signal binary blazingly fast.
Deploy using systemd service, capture logs with alloy / Loki graphana setup, set up alerts and monitoring and go home.
And you can serve millions of requests on a server with 512MB RAM.
I don't think you'd ever need more speed than this.
Everything else is bloated, slow and doesn't give you enough room for optimization.
Here's the latency of one of my hobby projects (network latency not included): https://i.ibb.co/hJ6FQtyw/d3d6c9d15765.png
Request rate: https://i.ibb.co/Fq80nfJ4/67fcdbdb7491.png
It's running in US and EU (helps avoid atlantic routrip tax), in this one i am doing some 100s of checks, not simple CRUD work. With Go you can optimize a lot without complexity of Rust.
Can you share what some of those apps are?
How are you merging sqlc and pgx with sqlite ?
Specifically how can you use pgx with sqlite while pgx is a postgres-specific library? Sqlc works great with Postgres or Sqlite, Sqlc works with pgx when connecting to Postgres, but pgx can't be used with Sqlite AFAIK
Did you reply to the wrong submission?
no usually people believe assembly = speed/performance.
So I knew this post would bring people who were fed up with the slow performance of their web stack.
Maybe it will help someone or spark a discussion.
What's your point?
You don't need to use assembly for high performance web app when you can just use Go.
This is a great resource, thank you!
The last time I did anything in assembler was x86 under DOS. Your code makes ARM64 with a modern OS less scary than I thought it would be.
Here's a piece on writing portable ARM64 assembly: https://ariadne.space/2023/04/12/writing-portable-arm-assemb...
Thanks for the link, bookmarking. I should note ymawky's main portability issues are unfortunately at the syscall layer rather than the asm layer. proc_info() and getdirentries64() are pretty Darwin-specific, so making it portable would require reworking that whole area rather than adjusting register/calling conventions.
Gave me a warm feeling to know that someone would actually still bother to do this by hand. I'm not the only one!
Thank you! I've been obsessed with this idea for a while, finally decided to start on it, then obsessed over it for a couple weeks. I'd love to see some of your projects if you have anything similar, I'm glad I'm not the only one too! I think most programmers would benefit a lot from taking a few weeks or months to try and learn some assembly, and demystify how CPUs and compiled languages work.
Solid article. I appreciate the depth of explanation around show building. Would love to see a follow-up on the implementation details.
Even after we've all retired (pretty soon for those who can afford it) or transitioned out of software engineering (for those who can't), we'll still get to amuse each other with home-brew projects like this. Warm fuzzy feeling - I'll take it!
Thank you! This is one of the nicest things I've heard in a while.
That fake O'Reilly book cover is pure gold.
That book is exactly what inspired me to make this in the first place, haha. The subtitle of the book gave me the acronym I named it.
This is a really interesting approach. I particularly like how they handled show building. Has anyone tried implementing this in production?
I don’t know why, but this project has me irrationally excited!
I feel the guy’s suspicion towards any high level language. I exclusively programmed in assembly on C64, Amiga and the recognized that this ain’t sustainable on PC because there are more and more edge cases or different machine configurations.
I had a very hard time simply using and even utilizing C++ or Java.
C and Turbo Pascal especially was easier because the compiled code was very much resembling to hand written code.
As the author described, you can do in 4.000 lines what others can do with way less pain in 100.
So you build macros, come up with your own library and in the end you kind of build a meta language build on top of assembly because some lines are so hard to grasp that you delegate working code into a library for reuse.
It is funny how much we take conventions for numbers for granted. If you happen to know assembly and its intricacies you immediately will learn to work with a sign bits which mark negative numbers. But how do you know? Maybe you use the whole addressable space only for positive numbers.
Small things that make a huge different.
Nice article, I enjoyed your adventures and would do the same.
Thank you! The thing about eventually building your own meta language ends up happening all the time with bigger assembly projects. I do have a fair few quality-of-life macros too, but probably fewer than I should. I did end up needing to implement by hand what would be standard functions, things like atoi, itoa, strlen, memcpy, streqn.
Higher level languages are more convenient for 99% of things, but the directness of Assembly gives me a rush unlike any other. I didn't live through the C64/Amiga, but I was obsessed with old C64/ZX emulators growing up.
I'm wanting to read this repository as a learning tool, so it'd also be nice to include docs—even AI-generated docs, but obvious I'd prefer docs with your own design notes and decisions—about the architecture of the code.
Really cool project though!
Thanks, I appreciate it a lot! I tried to comment my code pretty heavily (~3000 lines of code, ~1000 lines of comments all together), since this was a learning project for myself in the first place. Hopefully those will be of some use. But separate in-depth documentation is definitely a good idea, I'll work on adding that. In the meantime I'm always down to answer any questions about it!
My first question would be where should I start reading? It seems like you modularized it into multiple assembly files (how does that even work?)
Honestly, read the main file, ymawky.S first. Then I'd read through get.S maybe, checking parse.S on an as-needed basis for parsing-related functions. delete.S or options.S are pretty short, too, so give those a read too.
Modularizing it into multiple files was easier than I expected it to be, you basically have other functions/labels in other files, and mark them as .global at the top. The Makefile compiles each file into their own .o, which you then link all together. You can "b" or "bl" to any label from any other file, as long as it's global and linked together. Same with data in .bss or .data, mark them as .global and they can be accessed from elsewhere.
Awesome. Any resource recommendations to learn ARM assembly?
Need a straight binary port now
Why stop there? Next, I'm prying open a CPU and poking the transistors with a 9V battery and paperclips to make it execute what I want. Slower, but you get so much control.
This is fucking nuts
This post seems to now link to the writeup rather than the repository, sorry! The repo can be found at the top of that page, or directly here: https://github.com/imtomt/ymawky
Whoops that was my fault. Fixed now. (I emailed you, btw, that we'd changed your title, but I forgot to switch the URL back to the repo. Both links are cool.)
I'm sure I'm not the only one who has fantasized about doing something like this as a self-soothing enterprise. Kudos to you for actually doing it!
Hey, thank you! Means a lot. It's an odd sort of meditation, but is surprisingly the most almost-therapeutic project I've worked on. Something about the constraints of Assembly that really pull you into the minutiae and clears your head, maybe.
An agentic LLM should be pretty good at Arm64 assembly generation, but maintainability of large code could become an issue. Why would it not run on Linux?
I wrote it for MacOS because I don't have a Linux machine right now :( Once I get one up and running again, I'll probably work on porting this.
As for why it wouldn't run on Linux, there are some pretty big differences in the actual assembly. One pretty superficial difference is calling conventions -- MacOS uses the x16 register for syscall numbers, Linux uses x8. Calling the kernel in Mac uses "svc #0x80", in Linux it's "svc #0". That's ~120 lines that need to be replaced, but easy enough to just use sed. Syscall numbers are all different, as are the struct layouts for sigaction(), MacOS has an "sa_tramp" field that Linux doesn't have. Enforcing max processes is done here using the MacOS-specific proc_info() syscall, which can be used to get the number of children any given process has. Linux doesn't have an equivalent, so process tracking would need to be done differently. Finally, Linux has the getdents64() syscall, rather than getdirentries64(), which uses a different struct and is called differently.
I'm sure an LLM could make all those changes, but it's a pretty large codebase, so it would probably make some mistakes or miss things.
The first paragraph of the README says this was hand written so I’m not sure why you’re bringing up LLMs