▲Serving a half billion requests per day with Rust and CGIjacob.gold

103 points by feep 2 days ago | 104 comments

neilv 6 hours ago [-]

One reason to use CGI is legacy systems. A large, complex, and important system that I inherited was still using CGI (and it worked, because a rare "10x genuinely more productive" developer built it). Many years later, to reduce peak resource usage, and speed up a few things, I made an almost drop-in replacement library, to permit it to also run with SCGI (and back out easily to CGI if there was a problem in production). https://docs.racket-lang.org/scgi/

Another reason to use CGI is if you have a very small and simple system. Say, a Web UI on a small home router or appliance. You're not going to want the 200 NPM packages, transpilers and build tools and 'environment' managers, Linux containers, Kubernetes, and 4 different observability platforms. (Resume-driven-development aside.)

A disheartening thing about most my recent Web full-stack project was that I'd put a lot of work into wrangling it the way Svelte and SvelteKit wanted, but upon finishing, wasn't happy with the complicated and surprisingly inefficient runtime execution. I realized that I could've done it in a fraction of the time and complexity -- in any language with convenient HTML generation, a SQL DB library, and an HTTP/CGI/SCGI-ish library, plus a little client-side JS).

ptsneves 6 hours ago [-]

I found that ChatGPT revived vanilla javascript and jquery for me.

Most of the chore part is done by chatgpt and the mental model of understanding what it wrote is very light and often single file. It is also easily embedded in static file generators.

On the contrary Vue/React have a lot of context required to understand and mentally parse. On react the useCallback/useEffect/useMemo make me need to manually manage dependencies. This really reminds me of manual memory management in C, with perhaps even more pitfalls. On vue the difference between computed, props and vanilla variables. I am amazed that the supposed more approachable part of tech is actually more complex than regular library/script programming.

KronisLV 4 hours ago [-]

> I found that ChatGPT revived vanilla javascript and jquery for me.

I used jQuery in a project recently where I just needed some interactivity for an internal dashboard/testing solution. I didn't have a bunch of time to setup a whole toolchain for Vue (and Pinia, Vue Router, PrimeVue, PrimeIcons, PrimeFlex and the automated component imports) because while I like using all of them and the developer experience is quite nice, the setup still takes a bit of time unless you have a nice up to date boilerplate project that's ready to go.

Not even having a build step was also really pleasant, didn't need to do complex multi-stage builds or worry that copying assets would somehow slow down a Maven build for the back end (relevant for those cases when you package your front end and back end together in the same container and use the back end to serve the front end assets, vs two separate containers where one is just a web server).

Only problem was that jQuery doesn't compose as nice, I missed the ability to nest a bunch of components. Might just have to look at Lit or something.

maxwell 4 hours ago [-]

I've had a similar experience. Generating Vue/React scaffolding is nice, but yeah debugging and refactoring require the additional context you described. I've been using web components lately on personal projects, nice to jump into comprehensible vanilla JS/HTML/CSS when needed.

pkal 6 hours ago [-]

I have recently been writing CGI scripts for the web server of our universities computer lab in Go, and it has been a nice experience. In my case, the Guestbook doesn't use SQLite but I just encode the list of entries using Go's native https://pkg.go.dev/encoding/gob format, and it worked out well -- and critically frees me from using CGO to use SQLite!

But in the end efficiency isn't my concern, as I have almost not visitors, what turns out to be more important is that Go has a lot of useful stuff in the standard library, especially the HTML templates, that allow me to write safe code easily. To test the statement, I'll even provide the link and invite anyone to try and break it: https://wwwcip.cs.fau.de/~oj14ozun/guestbook.cgi (the worst I anticipate happening is that someone could use up my storage quota, but even that should take a while).

kragen 5 hours ago [-]

How do you protect against concurrency bugs when two visitors make guestbook entries at the same time? With a lockfile? Are you sure you won't write an empty guestbook if the machine gets unexpectedly powered down during a write? To me, that's one of the biggest benefits of using something like SQLite.

pkal 5 hours ago [-]

That is exactly what I do, and it works well enough because if the power-loss were to happen, I wouldn't have lost anything of crucial value. But that is admittedly a very instance-specific advantage I have.

kragen 5 hours ago [-]

There's a fsync/close/rename dance that ext4fs recognizes as a safe, durable atomic file replacement, which is often sufficient for preventing data loss in cases like this.

masklinn 4 hours ago [-]

FWIW POSIX requires that rename(2) be atomic, it's not just ext4, any POSIX FS should work that way.

However this still requires a lockfile because while rename(2) is an atomic store it's not a full CAS, so you can have two processes reading the file concurrently, doing their internal update, writing to a temp file, then rename-ing to the target. There will be no torn version of the reference file, but the process finishing last will cancel out the changes of the other one.

The lockfile can be the "scratch" file as open(O_CREAT | O_EXCL) is also guaranteed to be atomic, however now you need a way to wait for that path to disappear before retrying.

kragen 3 hours ago [-]

I forget the particular failure mode that POSIX arguably permitted in this case. I think it had to do with fsyncing the directory? And the file ending up existing but empty after the reboot? Anyway, it broke userspace, so the ext4fs maintainers killed it. I think there were some LWN articles.

I might have been misremembering: https://lwn.net/Articles/322823/

"With ext4's delayed allocation, the metadata changes can be journalled without writing out the blocks. So in case of a crash, the metadata changes (that were journalled) get replayed, but the data changes don't."

That was without fsync, though. But apparently it's a topic of discussion more recently: https://lwn.net/Articles/789600/

I agree that it still requires a lockfile if write conflicts are not acceptable.

kragen 5 hours ago [-]

This is a followup to Gold's previous post that served 200 million requests per day with CGI, which Simon Willison wrote a post about, which we had a thread about three days ago at https://news.ycombinator.com/item?id=44476716. It addresses some of the misconceptions that were common in that thread.

Summary:

- 60 virtual AMD Genoa CPUs with 240 GB (!!!) of RAM

- bash guestbook CGI: 40 requests per second (and a warning not to do such a thing)

- Perl guestbook CGI: 500 requests per second

- JS (Node) guestbook CGI: 600 requests per second

- Python guestbook CGI: 700 requests per second

- Golang guestbook CGI: 3400 requests per second

- Rust guestbook CGI: 5700 requests per second

- C guestbook CGI: 5800 requests per second

https://github.com/Jacob2161/cgi-bin

I wonder if the gohttpd web server he was using was actually the bottleneck for the Rust and C versions?

simonw 7 hours ago [-]

I really like the code that accompanies this as an example of how to build the same SQLite powered guestbook across Bash, Python, Perl, Rust, Go, JavaScript and C: https://github.com/Jacob2161/cgi-bin

masklinn 5 hours ago [-]

Checked the rust version, it has a toctou error right at the start, which likely would not happen in a non-cgi system because you’d do your db setup on load and only then would accept requests. I assume the others are similar.

This neatly demonstrates one of the issues with CGI: they add synchronisation issues while removing synchronisation tooling.

5 hours ago [-]

simonw 5 hours ago [-]

Had to look that up: Time-Of-Check-to-Time-Of-Use

Here's that code:

  let new = !Path::new(DB_PATH).exists();
  let conn = Connection::open(DB_PATH).expect("open db");

  // ...
  if new {
      conn.execute_batch(
          r#"
          CREATE TABLE guestbook(

So the bug here would occur only the very first time the script is executed, IF two processes run it at the same time such that one of them creates the file while the other one assumes the file did not exist yet and then tries to create the tables.

That's pretty unlikely. In this case the losing script would return a 500 error to that single user when the CREATE TABLE fails.

Honestly if this was my code I wouldn't even bother fixing that.

(If I did fix it I'd switch to "CREATE TABLE IF NOT EXISTS...")

... but yeah, it's a good illustration of the point you're making about CGI introducing synchronization errors that wouldn't exist in app servers.

kragen 5 hours ago [-]

That sounds correct to me, but I think I would apply your suggested fix.

Bluestein 7 hours ago [-]

This is a veritable Rosetta stone of a repo. Wow.-

antoineleclair 4 hours ago [-]

We used CGI to add support for extensions in Disco (https://disco.cloud/).

It's so simple and it can run anything, and it was also relatively easy to have the CGI script run inside a Docker container provided by the extension.

In other words, it's so flexible that it means the extension developers would be able to use any language they want and wouldn't have to learn much about Disco.

I would probably not push to use it to serve big production sites, but I definitely think there's still a place for CGI.

In case anyone is curious, it's happening mostly here: https://github.com/letsdiscodev/disco-daemon/blob/main/disco...

0xbadcafebee 6 hours ago [-]

> No one should ever run a Bash script under CGI. It’s almost impossible to do so securely, and performance is terrible.

Actually shell scripting is the perfect language for CGI on embedded devices. Bash is ~500k and other shells are 10x smaller. It can output headers and html just fine, you can call other programs to do complex stuff. Obviously the source compresses down to a tiny size too, and since it's a script you can edit it or upload new versions on the fly. Performance is good enough for basic work. Just don't let the internet or unauthenticated requests at it (use an embedded web server with basic http auth).

kragen 5 hours ago [-]

Easy uploading of new versions is a good point, and I agree that the likely security holes in the bash script are less of a concern if only trusted users have access to it. However, about 99% of embedded devices lack an MMU, much less 50K of storage, which makes it hard to run Unix shells on them.

0xbadcafebee 5 hours ago [-]

Busybox runs MMU-less and has ash built in. It also has a web server! It can be a little chonky but you can remove unneeded components. Things like wireless routers and other devices that have a decent amount of storage are a good platform for it

kragen 5 hours ago [-]

Yeah, a lot of wireless routers would have no trouble. A lot of them do in fact have MMUs. I wonder if you could get Busybox running on an ESP32? Probably not most 8051s, though, or AVR8s.

shrubble 7 hours ago [-]

In a corporate environment, for internal use, I often see egregiously specced VMs or machines for sites that have very low requests per second. There's a commercial monitoring app that runs on K8s, 3 VMs of 128GB RAM each, to monitor 600 systems; using 500MB per system, basically, just to poll it each 5 minutes, do some pretty graphs, etc. Of course it has a complex app server integrated into the web server and so forth.

RedShift1 7 hours ago [-]

Yep. ERP vendors are the worst offenders. Last deployment for 40-ish users "needed" an 22 CPU cores and 44 GB of RAM. After long back and forths I negotiated down to 8 CPU cores and 32 GB. Looking at the usage statistics, it's 10% MAX... And it's cloud infra so paying a lot for RAM and CPU sitting unused.

ted537 6 hours ago [-]

Haha yes -- like what do you mean this CRUD app needs 20 GB of RAM and half an hour to startup?

jchw 7 hours ago [-]

Honestly, I'm just trying to understand why people want to return to CGI. It's cool that you can fork+exec 5000 times per second, but if you don't have to, isn't that significantly better? Plus, with FastCGI, it's trivial to have separate privileges for the application server and the webserver. The CGI model may still work fine, but it is an outdated execution model that we left behind for more than one reason, not just security or performance. I can absolutely see the appeal in a world where a lot of people are using cPanel shared hosting and stuff like that, but in the modern era when many are using unmanaged Linux VPSes you may as well just set up another service for your application server.

Plus, honestly, even if you are relatively careful and configure everything perfectly correct, having the web server execute stuff in a specific folder inside the document root just seems like a recipe for problems.

zokier 6 hours ago [-]

Having completely isolated ephemeral request handlers with no shared state and no persistent runtime makes very clean and nice programming model. It also makes deployments simple because there is no graceful shutdown or service management to worry about; in simplest case you can just drop in new executables and they will be automatically taken into use without any service interruption. Fundamentally CGI model allows leveraging lot of tools that Linux/UNIX has to offer.

inetknght 5 hours ago [-]

> there is no ... service management to worry about

Service management:

    systemctl start service.app

    docker run --restart=unless-stopped --name=myservice myservice:version

If it isn't written as a service, then it doesn't need management. If it is written as a service, then service management tools make managing it easy.

> there is no graceful shutdown ... to worry about

Graceful shutdown:

    kill -9

    docker kill -9 myservice

If your app/service can't handle that, then it's designed poorly.

taeric 3 hours ago [-]

I think the point is having to worry about runaway memory or other bitrot inherent in long running services?

rajaravivarma_r 4 hours ago [-]

I'm wondering the same, but honestly I have a soft corner for the old way of doing things as well, and I think it stems from it.

The performance numbers seem to show how bad it is in real world.

For testing I converted the CGI script into a FastAPI script and benchmarked it on my MacBookPro M3. I'm getting super impressive performance numbers,

Read ``` Statistics Avg Stdev Max Reqs/sec 2019.54 1021.75 10578.27 Latency 123.45ms 173.88ms 1.95s HTTP codes: 1xx - 0, 2xx - 30488, 3xx - 0, 4xx - 0, 5xx - 0 others - 0 Throughput: 30.29MB/s ``` Write (shown in the graph of the OP) ``` Statistics Avg Stdev Max Reqs/sec 931.72 340.79 3654.80 Latency 267.53ms 443.02ms 2.02s HTTP codes: 1xx - 0, 2xx - 0, 3xx - 13441, 4xx - 0, 5xx - 215 others - 572 Errors: timeout - 572 Throughput: 270.54KB/s ```

At this point, the contention might be the single SQL database. Throwing a beefy server like in the original post would increase the read performance numbers pretty significantly, but wouldn't do much on the write path.

I'm also thinking that at this age, one needs to go out of their way to do something with CGI. All macro, micro web frameworks comes with a HTTP server and there are plenty of options. I wouldn't do this for anything apart from fun.

FastAPI-guestbook.py https://gist.github.com/rajaravivarma-r/afc81344873791cb52f3...

0x000xca0xfe 6 hours ago [-]

I guess multiprocessing got a bad reputation because it used to be slow and simple so it got looked down upon as a primitive tool for less capable developers.

But the world has changed. Modern systems are excellent for multiprocessing, CPUs are fast, cores are plentiful and memory bandwidth just continues getting better and better. Single thread performance has stalled.

It really is time to reconsider the old mantras. Setting up highly complicated containerized environments to manage a fleet of anemic VMs because NodeJS' single threaded event loop chokes on real traffic is not the future.

ben-schaaf 6 hours ago [-]

That really has nothing to do with the choice to use CGI. You can just as well use rust with Axum or Actix and get a fully threaded web server without having to fork for every request.

0x000xca0xfe 6 hours ago [-]

Absolutely, I'm not recommending for everybody to go back using CGI (the protocol). I was responding to this:

> The CGI model may still work fine, but it is an outdated execution model

The CGI model of one process per request is excellent for modern hardware and really should not be scoffed at anymore IMO.

It can both utilize big machines, scale to zero, is almost leak-proof as the OS cleans up all used memory and file descriptors, is language-independent, dead simple to understand, allows for finer granularity resource control (max mem, file descriptor count, chroot) than threads, ...

How is this execution model "outdated"?

jchw 6 hours ago [-]

The part of the execution model that is dated is this:

> having the web server execute stuff in a specific folder inside the document root just seems like a recipe for problems

toast0 5 hours ago [-]

Typically I've run cgi from a directory outside the document root. That's easy, and I think was the defaults?

That said, fork+exec isn't the best for throughput. Especially if the httpd doesn't isolate forking into a separate, barebones, child process, fork+exec involves a lot of kernel work.

FastCGI or some other method to avoid forking for each request is valuable regardless of runtime. If you have a runtime with high startup costs, even more so.

SahAssar 3 hours ago [-]

> FastCGI or some other method to avoid forking for each request is valuable regardless of runtime. If you have a runtime with high startup costs, even more so.

What's the point of using FastCGI compared to a plain http server then? If you are going to have a persistent server running why not just use the protocol you are already using the semantics of?

toast0 2 hours ago [-]

I don't generally want or need my application server to serve static files, but I may want to serve them on the same hostname (or maybe I don't).

There's potential benefits for the httpd to manage specifics of client connections as well: If I'm using a single threaded process per request execution model, keep-alive connections really ruin that. Similarly with client transfer-encoding requests, does my application server need to know about that. Does my application server need to understand http/2 or http/3?

You could certainly do a reverse proxy and use HTTP instead of FastCGI as the protocol between the client facing httpd and the application server... although then you miss out on some speciality things like X-Sendfile to accelerate sending of files from the application server without actually transferring them through sockets to the httpd. You could add that to an http proxy too, I suppose.

SahAssar 2 hours ago [-]

> You could certainly do a reverse proxy and use HTTP instead of FastCGI as the protocol between the client facing httpd and the application server

That's what I meant. Things like X-Sendfile (or X-Accel-Redirect in nginx) works with http backends. Why involve a different protocol to transfer a HTTP request to a backend instead of... HTTP? I really don't get the point of FastCGI over plain HTTP when a reverse proxy is talking to a upstream backend server.

toast0 22 minutes ago [-]

I mean, that protocol doesn't really matter to me, that's why I said "FastCGI or some other method" The important bit is avoiding fork+exec on every request.

FastCGI is binary based, which has benefits, but hopefully a reverse proxy sends well-formed HTTP requests ... but maybe having the application runtime provide an http frontend encourages running the application software directly accessible, which isn't always wise... some of them are really bad at HTTP.

kragen 5 hours ago [-]

What kind of problems? Like, if the administrator put something inside that directory (Unix doesn't have folders) that the web server shouldn't execute? That kind of problems? I've literally never had that problem in my life and I've had web pages for 30 years.

jchw 5 hours ago [-]

> Like, if the administrator put something inside that directory

Path traversal bugs allowing written files to land in the cgi-bin used to be a huge exploit vector. Interestingly, some software actually relied on being able to write executable files into the document root, so the simple answer of making the permissions more limited is actually not a silver bullet.

If you've never seen or heard of this, ¯\_(ツ)_/¯

> Unix doesn't have folders

Great and very important point. Someone should go fix all of these bugs:

https://github.com/search?q=repo%3Atorvalds%2Flinux%20folder...

kragen 4 hours ago [-]

I've certainly heard of that problem, but I've never experienced it, because it's easy to avoid. At least, it's easy if you're not running certain pieces of software. I'd suggest not using Wordpress (or, ideally, PHP) and disabling ExecCGI in whatever directories you need to host untrusted executables in.

Of course, disabling ExecCGI in one directory won't help if you do have path traversal holes in your upload-handling code. I'm not convinced that disabling CGI will help if attackers can use a path traversal hole to upload malicious executables to arbitrary paths you can write to. They can overwrite your .bashrc or your FastCGI backend program or whatever you're likely to execute. CGI seems like the wrong thing to blame for that.

Why are you linking me to a "Sign in to search code on GitHub" page?

jchw 4 hours ago [-]

> Why are you linking me to a "Sign in to search code on GitHub" page?

GitHub is basically the only service I'm aware of that actually has the ability to grep over the Linux kernel. Most of the other "code search" systems either cost money to use or only search specific symbols (e.g. the one hosted on free-electrons.)

For a similar effect, grep the Linux kernel and be amazed as the term "folder" is actually used quite a lot to mean "directory" because the distinction doesn't matter anymore (and because when you're implementing filesystem drivers you have to contend with the fact that some of them do have "folders".)

0x000xca0xfe 5 hours ago [-]

Yep, that is definitely problematic. But it also allowed a sprawling ecosystem of tons of small applications that people could just download and put on their website via FTP and do the configuration in the browser afterwards.

This is easy enough for non-technical people or school kids and still how it works for many Wordpress sites.

The modern way of deploying things is safer but the extra complexity has pushed many, many folks to just put their stuff on Facebook/Instagram instead of leveling up their devops skills.

Somehow we need to get the simplicity back, I think. Preferably without all the exploits.

jchw 6 hours ago [-]

I feel it necessary to clarify that I am not suggesting we should use single-threaded servers. My go-to approach for one-offs is Go HTTP servers and reverse proxying. This will do quite well to utilize multiple CPU cores, although admittedly Go is still far from optimal.

Still, even when people run single-thread event loop servers, you can run an instance per CPU core; I recall this being common for WSGI/Python.

taeric 7 hours ago [-]

I thought the general view was that leaving the CGI model was not necessarily better for most people? In particular, I know I was at a bigger company that tried and failed many times to replace essentially a CGI model with a JVM based solution. Most of the benefits that they were supposed to see from not having the outdated execution model, as you call it, typically turned into liabilities and actually kept them from hitting the performance they claimed they would get to.

And, sadly, there is no getting around the "configure everything perfectly" problem. :(

Nzen 6 hours ago [-]

My personal interest in CGI stems from my website host offering it as a means of responding to requests [0] in addition to static assets.

[0] https://www.nearlyfreespeech.net/help/faq#CGISupport

kragen 5 hours ago [-]

Serverless is a marketing term for CGI, and you can observe that serverless is very popular.

A couple of years ago my (now) wife and I wrote a single-event Evite clone for our wedding invitations, using Django and SQLite. We used FastCGI to hook it up to the nginx on the server. When we pushed changes, we had to not just run the migrations (if any) but also remember to restart the FastCGI server, or we would waste time debugging why the problem we'd just fixed wasn't fixed. I forget what was supposed to start the FastCGI process, but it's not running now. I wish we'd used CGI, because it's not working right now, so I can't go back and check the wedding invitations until I can relogin to the server. I know that password is around here somewhere...

A VPS would barely have simplified any of these problems, and would have added other things to worry about keeping patched. Our wedding invitation RSVP did need its own database, but it didn't need its own IPv4 address or its own installation of Alpine Linux.

It probably handled less than 1000 total requests over the months that we were using it, so, no, it was not significantly better to not fork+exec for each page load.

You say "outdated", I say "boring". Boring is good. There's no need to make things more complicated and fragile than they need to be, certainly not in order to save 500 milliseconds of CPU time over months.

jchw 5 hours ago [-]

> Serverless is a marketing term for CGI, and you can observe that serverless is very popular.

No, it's not.

CGI is Common Gateway Interface, a specific technology and protocol implemented by web servers and applications/scripts. The fact that you do a fork+exec for each request is part of the implementation.

"Serverless" is a marketing term for a fully managed offering where you give a PaaS some executable code and it executes it per-request for you in isolation. What it does per request is not defined since there is no standard and everything is fully managed. Usually, rather than processes, serverless platforms usually operate on the level of containers or micro VMs, and can "pre-warm" them to try to eliminate latency, but obviously in case of serverless the user gets a programming model and not a protocol. (It could obviously be CGI under the hood, but when none of the major platforms actually do that, how fair is it to call serverless a "marketing term for CGI"?)

CGI and serverless are only similar in exactly one way: your application is written "as-if" the process is spawned each time there is a request. Beyond that, they are entirely unrelated.

> A couple of years ago my (now) wife and I wrote a single-event Evite clone for our wedding invitations, using Django and SQLite. We used FastCGI to hook it up to the nginx on the server. When we pushed changes, we had to not just run the migrations (if any) but also remember to restart the FastCGI server, or we would waste time debugging why the problem we'd just fixed wasn't fixed. I forget what was supposed to start the FastCGI process, but it's not running now. I wish we'd used CGI, because it's not working right now, so I can't go back and check the wedding invitations until I can relogin to the server. I know that password is around here somewhere...

> A VPS would barely have simplified any of these problems, and would have added other things to worry about keeping patched. Our wedding invitation RSVP did need its own database, but it didn't need its own IPv4 address or its own installation of Alpine Linux.

> It probably handled less than 1000 total requests over the months that we were using it, so, no, it was not significantly better to not fork+exec for each page load.

> You say "outdated", I say "boring". Boring is good. There's no need to make things more complicated and fragile than they need to be, certainly not in order to save 500 milliseconds of CPU time over months.

To be completely honest with you, I actually agree with your conclusion in this case. CGI would've been better than Django/FastCGI/etc.

Hell, I'd go as far as to say that in that specific case a simple PHP-FPM setup seems like it would've been more than sufficient. Of course, that's FastCGI, but it has the programming model that you get with CGI for the most part.

But that's kind of the thing. I'm saying "why would you want to fork+exec 5000 times per second" and you're saying "why do I care about fork+exec'ing 1000 times in the total lifespan of my application". I don't think we're disagreeing in the way that you think we are disagreeing...

9rx 4 hours ago [-]

> No, it's not.

It is not strictly limited to the CGI protocol, of course, but it is the marketing term for the concept of the application not acting as the server, of which CGI applications would be included. CGI, like all serverless applications, outsource the another process, such as Apache or nginx, to provide the server. Hence the literal name.

> "Serverless" is a marketing term for a fully managed offering where you give a PaaS

Fully managed offerings are most likely to be doing the marketing, so it is understandable how you might reach that conclusion, but the term is being used to sell to developers. It communicates to them, quite literally, that they don't have to make their application a server, which has been the style for networked applications for a long time now. But if you were writing a CGI application to run on your own systems, it would also be serverless.

jchw 4 hours ago [-]

The term "serverless" is a generic PaaS marketing term to refer to managed services where you don't have to manage a server to use them, e.g. "Amazon Aurora Serverless". If you're managing CGI scripts on a traditional server, you're still managing a server.

The point isn't really that the application is unaware of the server, it's that the server is entirely abstracted away from you. CGI vs serverless is apples vs oranges.

> [...] but the term is being used to sell to developers. It communicates to them, quite literally, that they don't have to make their application a server [...]

I don't agree. It is being sold to businesses, that they don't have to manage a server. The point is that you're paying someone else to be the sysadmin and getting all of the details abstracted away from you. Appealing to developers by making their lives easier is definitely a perk, but that's not why the term "serverless" exists. Before PaaSes I don't think I've ever seen anyone once call CGI "serverless".

9rx 4 hours ago [-]

> It is being sold to businesses, that they don't have to manage a server.

Do you mean a... computer? Server is a software term. It is a process that listens for network requests.

At least since CGI went out of fashion, embedding a server right in your application has been the style. Serverless sees a return to the application being less a server, pushing the networking bits somewhere else. Modern solutions may not use CGI specifically, but the idea is the same.

If you did mistakenly type "server" when you really meant "computer", PaaS offerings already removed the need for businesses to manage computers long before serverless came around. "Serverless" appeared specifically in reference to the CGI-style execution model, it being the literal description of what it is.

jchw 4 hours ago [-]

> Do you mean a... computer? Server is a software term. It is a process that listens for network requests.

Between this and the guy arguing that UNIX doesn't have "folders" I can see that these kinds of threads bring out the most insane possible lines of rhetoric. Are you sincerely telling me right now you've never seen the term "server" used to refer to computers that run servers? Jesus Christ.

Pedantry isn't a contest, and I'm not trying to win it. I'm not sitting here saying that "Serverless is not a marketing term for CGI" to pull some epic "well, actually..." I'm saying it because God damnit, it's true. Serverless was a term invented specifically by providers of computers-that-aren't-yours to give people options to not need to manage the computers-that-aren't-yours. They actually use this term serverless for many things, again including databases, where you don't even write an application or a server in the first place; we're just using "serverless" as a synonym for "serverless function", which I am fine to do, but pointing that out is important for more than just pedantry reasons because it helps extinguish the idea that "serverless" was ever meant to have anything to do with application design. It isn't and doesn't. Serverless is not a marketing term for CGI. Not even in a relaxed way, it's just not. The selling point of Serverless functions is "you give us your request handler and we'll handle running it and scaling it up".

This has nothing to do with the rise of embedding a server into your application.

kragen 4 hours ago [-]

> Serverless is not a marketing term for CGI. Not even in a relaxed way, it's just not. The selling point of Serverless functions is "you give us your request handler and we'll handle running it and scaling it up".

That was the selling point of CGI hosting though. Except that the "scaling it up" part was pretty rare. There were server farms that ran CGI scripts (NCSA had a six-server cluster with round-robin DNS when they first published a paper describing how they did it, maybe 01994) but the majority of CGI scripts were almost certainly on single-server hosting platforms.

jchw 3 hours ago [-]

With NCSA HTTPd I'm pretty sure it was literally the only way to do dynamic things at least initially. Which makes sense for the time period, I mean it's the same basic idea as inetd but for HTTP and with some differing implementation details.

Is the selling point of shared hosting and "serverless" PaaS platforms similar? To an extent it definitely is, but I think another major selling point of shared hosting was the price. For a really long time it was the only economically sane option, and even when cheap low end VPS options (usually OpenVZ-based) emerged, they were usually not as good for a lot of workloads as a similarly priced shared hosting option.

But at that point, we're basically debating whether or not the term "serverless" has merit, and that's not an argument I plan to make. I'm only trying to make the argument that serverless is about the actual abstraction of traditional server machines. Shared hosting is just about having someone else do it for you. These are similar, but different.

kragen 3 hours ago [-]

I agree that it's very much like inetd. Or the Unix shell, which launches one or more processes for each user command.

But, no, you could very easily edit the httpd source to do the dynamic things and recompile it. As an example of what you could do, stock NCSA httpd supported "server-side includes" very early on, for example, definitely in 01994, maybe in 01993. The big advantage of CGI was that it decoupled the management of the server as a whole from particular gateway programs. It didn't take all that long for people to start writing their gateways in languages that weren't C, of course, and that was a different benefit of CGI. (If you were running Plexus instead, you could hack Perl dynamic things into your server source code.) And running the CGI (or SSI) as the user who owned the file instead of as the web server came years later.

By "abstraction of traditional server machines" do you mean "load balancing"? Like, so that your web service can scale up to handle larger loads, and doesn't become unavailable when a server fails, and your code has access to the same data no matter which machine it happens to get run on? Because, as I explained above, NCSA (the site, not NCSA httpd at other sites) was doing that in the mid-90s. Is there some other way that AWS Lambda "abstracts" the servers from the point of view of Lambda customers?

With respect to the price, I guess I always sort of assumed that the main reason you'd go with "serverless" offerings rather than an EC2 VPS or equivalent was the price, too. But certainly not having to spend any time configuring and monitoring servers is an upside of CGI and Lambda and Cloud Run and whatever other "serverless" platforms there are out there.

9rx 4 hours ago [-]

> Serverless was a term invented specifically by providers of computers-that-aren't-yours to give people options to not need to manage the computers-that-aren't-yours.

No. "Cloud" was the term invented for that, inherited from networking diagrams where it was common to represent the bits you don't manage as cloud figures. Usage of "Serverless" emerged from AWS Lamba, which was designed to have an execution model much like CGI. "Severless" refers to your application being less a server. Lamba may not use CGI specifically, but the general idea is very much the same.

jchw 3 hours ago [-]

Okay. Let's ask Amazon since they invented the term:

> Serverless computing is an application development model where you can build and deploy applications on third-party managed server infrastructure. All applications require servers to run. But in the serverless model, a cloud provider manages the routine work; they provision, scale, and maintain the underlying infrastructure. The cloud provider handles several tasks, such as operating system management, security patches, file system and capacity management, load balancing, monitoring, and logging. As a result, your developers can focus on application design and still receive the benefits of cost-effective, efficient, and massively scalable server infrastructure.

Right. And that makes sense. Because again, what we're talking about when we're talking about AWS Lambda is serverless functions. But AWS also uses the term for other things that are "serverless", again, like Aurora Serverless. Aurora Serverless is basically the same idea: the infrastructure is abstracted, except for a database. This effectively means the database can transparently scale from 0 to whatever the maximum instance sizes Amazon supports without a human managing database instances.

That's also the same idea for serverless functions. It's not about whether your application has a "server" in it.

kragen 3 hours ago [-]

The only word of this that is not is a description of old-fashioned shared CGI hosting is "massively scalable". (And maybe "efficient".)

9rx 3 hours ago [-]

> Serverless computing is an application development model

Exactly. And how that development model differs from the traditional approach is that you don't have to implement a server. Deployment isn't a development model. The development is necessarily done by the time you get there.

> But AWS also uses the term for other things

The terms has expended to be used for all kinds of different things, sure. There is probably a toaster out there somewhere solid as being "Serverless" nowadays.

If we really want to get into the thick of it, "serverless" seems to go back much further, used to refer to certain P2P systems. But we know from context that isn't what we're talking about. Given the context, it is clear we are talking about "serverless" as it emerged out of Lamba, referring to systems that were CGI-esq in nature.

jchw 3 hours ago [-]

It's funny how you added that part even though Amazon's own description continues in a completely different way that doesn't emphasize this at all. That's not a mistake on Amazon's part; it's not that they forgot to mention it. The reason why it's not there is because it's not actually the point.

You're reading "application development model" and thinking "Exactly! It's all about the request handling model!" but that's not what Amazon said or meant. Consider the description of Amazon Fargate, a service that in fact can be used to run regular old web servers:

> AWS Fargate is a serverless, pay-as-you-go compute engine that lets you focus on building applications without managing servers.

I guess the next argument is that Amazon is just diluting the term and originally it meant what you think it meant, and that is the terminal state of this debate since there is no more productive things to say.

Edit: you added more but it's just more attempting to justify away things that are plainly evident... But I can't help myself. This is just non-sense:

> Deployment isn't a development model,

Software development is not just writing code.

9rx 3 hours ago [-]

> Software development is not just writing code.

But it remains that deployment is normally considered to be independent of development. If you put your binaries on a CD instead of sending it to AWS, the application will still be considered developed by most people. Deployment is a post-development activity.

> I guess the next argument is that Amazon is just diluting the term

Could be. Would it matter? The specific definition you offer didn't even emerge until ~2023, nearly a decade after Lamba was introduced, so clearly they're not hung up on some kind of definitional purity. Services like Cloud Run figured out that you could keep the server in the application, while still exhibiting the spirit of CGI, so it is not like it is hard technical requirement, but it is the technical solution that originally emerged and was named as such.

If what you are trying to say, and not conveying it well, is that it has become a marketing term for all kinds of different things, you're not wrong. Like I suggested in another comment, there is probably a "Serverless" toaster for sale out there somewhere these days.

kragen 4 hours ago [-]

> If you're managing CGI scripts on a traditional server, you're still managing a server.

Usually somebody else is managing the server, or servers, so you don't have to think about it. That's been how it's worked for 30 years.

> Before PaaSes I don't think I've ever seen anyone once call CGI "serverless".

No, because "serverless" was a marketing term invented to sell PaaSes because they thought that it would sell better than something like "CloudCGI" (as in FastCGI or SpeedyCGI, which also don't use the CGI protocol). But CGI hosting fits cleanly within the roomy confines of the term.

jchw 3 hours ago [-]

Oh my god! This could go on forever.

Having a guy named Steve manage your servers is not "serverless" by my definition, because it's not about you personally having to manage the server, it's about anyone personally having to manage it. AWS Lambda is managed by Amazon as a singular giant computer spawning micro VMs. And sure yes, some human has to sit here and do operations, but the point is that they've truly abstracted the concept of a running server from both their side and yours. It's abstracted to the degree that even asking "what machine am I running on?" doesn't even have a meaningful answer and if you did have the answer you couldn't do anything with it.

Shared hosting with a cgi-bin is closer to this, but it falls short of fully abstracting the details. You're still running on a normal-ish server with shared resources and a web server configuration and all that jazz, it's just that you don't personally have to manage it... But someone really does personally have to manage it.

And anyway, there's no reason to think that serverless platforms are limited to things that don't actually run a server. On the contrary there are "serverless" platforms that run servers! Yes, truly, as far as I know containers running under cloud run are in fact normal HTTP servers. I'm actually not an expert on serverless despite having to be on this end of the argument, but I'll let Google speak for what it means for Cloud Run to be "serverless":

> Cloud Run is a managed compute platform that enables you to run stateless containers that are invocable via HTTP requests. Cloud Run is serverless: it abstracts away all infrastructure management, so you can focus on what matters most — building great applications.

These PaaSes popularized the term to mean this from the gitgo, just because you have passionately formed a belief that it ever meant something else doesn't change a thing.

9rx 3 hours ago [-]

> On the contrary there are "serverless" platforms that run servers!

That's the trouble when a term catches on — everyone wants to jump all over it and use it as they please.

This is hardly a unique situation. Look at SQL. According to the very creator of the relational model, SQL isn't relational, but the SQL specification latched onto the term anyway because it was trendy to do so. As a result, today, I think it is fair to say that "relational" has taken on dual meaning, both referring to the model as originally conceived as well as what SQL created.

If you wish to maintain that "serverless" now refers to both an execution model and outsourced management of computer systems, I think that is fair. However, it is apparent that "serverless" was originally popularized by Lamba, named as such due to its CGI-inspired execution model. Other angles came later.

kragen 3 hours ago [-]

Codd was happy enough to tout SQL as "relational" in his Turing Award address! Maybe you mean Date? He was involved from early on but didn't invent it.

I do think that SQL falls short of the relational-data-bank ideal in a number of important ways, and I mostly agree with Date on them. I just don't agree with Date's saying he's not contradicting Codd's early work.

3 hours ago [-]

jchw 3 hours ago [-]

OK good point. Let's see what Amazon describes the selling point of AWS Lambda is in the original press release from 2014; in fact, so early that it's actually not even the final draft[1]. Surely it will mention something about developers no longer having to write network server applications since (apparently) that is what the "server" in "serverless" is referring to (although this draft actually predates the term "serverless" entirely.)

> SEATTLE – (Nov XX, 2014) – Amazon Web Services LLC (AWS), an Amazon.com company (NASDAQ:AMZN), today announced the introduction of AWS Lambda, the simplest way to run code in the cloud. Previously, running code in the cloud meant creating a cloud service to host the application logic, and then operating the service, requiring developers to be experts in everything from automating failover to security to service reliability. Lambda eliminates the operational costs and learning curve for developers by turning any code into a secure, reliable and highly available cloud service with a web accessible end point within seconds. Lambda uses trusted AWS infrastructure to automatically match resources to incoming requests, ensuring the resulting service can instantaneously scale with no change in performance or behavior. This frees developers to focus on their application logic – there is no capacity planning or up-front resource type selection required to handle additional traffic. There is no learning curve to get started with Lambda – it supports familiar platforms like Java, Node.js, Python and Ruby, with rich support for each language’s standard and third-party libraries. Lambda is priced at $XXX for each request handled by the developer’s service and $YYY for each 250ms of execution time, making it cost effective at any amount of usage. To get started, visit aws.amazon.com/lambda.

Let me emphasize some points here:

> Previously, running code in the cloud meant creating a cloud service to host the application logic...

> then operating the service, requiring developers to be experts in everything from automating failover to security to service reliability...

> Lambda eliminates the operational costs and learning curve for developers by turning any code into a secure, reliable and highly available cloud service with a web accessible end point within seconds.

> there is no capacity planning or up-front resource type selection required to handle additional traffic

It is genuinely impressive how devastatingly, horrifically incorrect the idea is that "serverless" ever had anything to do with whether your application binary has a network request server in it. It's just not a thing.

We can talk about the parallels between CGI servers and Lambda all day and all night, but I am not letting this non-sense go. Serverless is not a marketing term for CGI.

[1]: https://www.allthingsdistributed.com/2024/11/aws-lambda-turn...

kragen 2 hours ago [-]

This is great, thanks for digging it up!

It does support the thesis that Amazon was attempting to prevent customers from realizing that what they were offering was basically CGI on a big load-balanced server farm, by claiming that it was something radically new that you couldn't get before, but their value proposition is still just the value proposition of shared CGI hosting. On a big load-balanced server farm. Which, to be perfectly fair, probably was bigger than anyone else's.

There is one major difference—the accounting, where you get charged by the megabyte-millisecond or whatever. Service bureaus ("cloud computing vendors") in the 01960s did do such billing, but Linux shared CGI hosts in the 01990s generally didn't; accton(8) doesn't record good enough information for such things. While in some sense that's really the value proposition for Amazon rather than the customer, it gives customers some confidence that their site isn't going to go down because a sysadmin decided they were being a CPU hog.

I agree that there's no evidence that they were talking about "servers" in the sense of processes that listen on a socket, rather than "servers" in the sense of computers that those processes run on.

Just to be clear, I know I'm not going to convince you of anything, but I'm really appreciating how much better informed I'm becoming because of this conversation!

jchw 2 hours ago [-]

> It does support the thesis that Amazon was attempting to prevent customers from realizing that what they were offering was basically CGI on a big load-balanced server farm, by claiming that it was something radically new that you couldn't get before, but their value proposition is still just the value proposition of shared CGI hosting. On a big load-balanced server farm. Which, to be perfectly fair, probably was bigger than anyone else's.

I loathe to be of service defending Amazon's marketing BS, but I think you're saying the selling point of AWS Lambda is that it's "like CGI", and that serverless functions are substantially equivalent to CGI. I disagree. The programming model of serverless functions is definitely substantially equivalent to CGI, but the selling point of serverless functions isn't the "functions" part, it's the "serverless" part. It would've had the exact same draw and could've even had a very similar programming model (The Lambda SDK makes your applications look like a typical request handling server, probably for development purposes) and ran multi-request servers under the hood and as long as it had the same billing and management most people would've been happy with it. The thing that unites Fargate and Lambda in being "serverless" is the specific way they're abstracting infrastructure.

Amazon could've and could still launch something like CloudCGI if they wanted to, and if it used the same model as Lambda I'm sure it'd be successful. If I had to guess why they didn't, the less cynical answer is that they just felt it was outdated and wanted to make something new and shiny with a nice developer experience. The more cynical answer is probably truer, because vendor lock-in. Even if they did launch something like "CloudCGI" though, it would still be a very big departure from anything people called "CGI hosting".

9rx 2 hours ago [-]

Yup. "turning any code into a cloud service". In other words: No need to write complicated server-bits that can be easy to screw up. Just write a "function" that accepts inputs and returns outputs and let something else will worry about the rest. Just like CGI (in sprit).

It is great that you were willing to share this as it proves without a doubt that Amazon were thinking about (the concept of) CGI during this time. But perhaps all you've been trying to say, poorly, is that "serverless" is no longer limited to marketing just one thing these days?

kragen 3 hours ago [-]

> Having a guy named Steve manage your servers is not "serverless" by my definition, because it's not about you personally having to manage the server, it's about anyone personally having to manage it. AWS Lambda is managed by Amazon as a singular giant computer

Well, that's sort of true of AWS Lambda, but it's just as true of EC2 and EBS, which aren't called "serverless". Moreover, "serverless" is a marketing term used to sell the service to Amazon customers, who can't tell whether or not there's a guy named Steve working at Amazon who lovingly nurtures each server†, or whether Amazon manages their whole Lambda farm as a giant herd of anonymous nodes, so I don't think it makes sense to argue that this is what it's intended to mean. As you point out, it's kind of a nonsense term since the code does in fact run on servers. I believe you were correct the first time in your earlier comments that you are now contradicting: they call it "serverless" because the customer doesn't have to manage servers, not because their own employees don't have to manage servers (except collectively).

> enables you to run stateless containers that are invocable via HTTP requests. (...) abstracts away all infrastructure management

This is a precise description of the value proposition that old-fashioned CGI hosting offers to hosting customers. (The containers are processes instead of KVM machines like Firecracker or cgroups like Docker, but "container" is a pretty generic term.)

So I think you've very clearly established that CGI scripts are "serverless" in the sense that Google's marketing uses, and, in https://news.ycombinator.com/item?id=44512427, the sense that Amazon's marketing uses.

______

† Well, Steve would probably cost more than what Amazon charges, so customers may have a pretty good guess, but it could be a loss leader or something.

kragen 4 hours ago [-]

CGI and other "serverless" technologies have essentially the same benefits and drawbacks. Sometimes an AWS Lambda function has longer startup time than if you had a running process already waiting to service a web request, because it's spinning up (AFAIK) an entire VPS. So all the arguments for "serverless" are also arguments for CGI, and all the arguments against CGI are arguments against "serverless".

That's the sense in which I mean "Serverless is a marketing term for CGI." But you're right that it's not, strictly speaking, true, because (AFAIK, e.g.) AWS doesn't actually use the CGI protocol in between the parts of their setup, and I should have been clear about that.

PHP is great as a runtime, but it sucks as a language, so I didn't want to use it. Django in regular CGI would have been fine; I just didn't realize that was an option.

jchw 4 hours ago [-]

> CGI and other "serverless" technologies have essentially the same benefits and drawbacks. Sometimes an AWS Lambda function has longer startup time than if you had a running process already waiting to service a web request, because it's spinning up (AFAIK) an entire VPS. So all the arguments for "serverless" are also arguments for CGI, and all the arguments against CGI are arguments against "serverless".

Honestly this isn't even the right terminology. The point of "serverless" is that you don't manage a server. You can, for example, have a "serverless" database, like Aurora Serverless or Neon; those do not follow the "CGI" model.

What you're talking about is "serverless functions". The point of that is still that you don't have to manage a server, not that your function runs once per request.

To make it even clearer, there is also Google Cloud Run, which is another "serverless" platform that runs request-oriented applications, except it actually doesn't use the function call model. Instead, it runs instances of a stateful server container on-demand.

Is "serverless functions" just a marketing term for CGI? Nope. Again, CGI is a non-overlapping term that refers to a specific technology. They have the same drawbacks as far as the programming model is considered. Serverless functions have pros and cons that CGI does not and vice versa.

> because it's spinning up (AFAIK) an entire VPS

For AWS Lambda, it is spinning up Firecracker instances. I think you could conceivably consider these to not be entire VPS instances, even though they are hardware virtualization domains.

But actually it can do things that CGI does not, since all that's prescribed is the programming model and not the execution model. For example, AWS Lambda can spin up multiple instances of your program and then freeze them right before the actual request is sent, then resume them right when the requests start flowing in. And like yeah, I suppose you could build something like that for CGI programs, or implement "serverless functions" that use CGI under the hood, but the point of "serverless" is that it abstracts the "server" away and the point of CGI is that it let you run scripts under NCSA HTTPd requests.

Because the programming language models are compatible, it would be possible to adapt a CGI program to run under AWS Lambda. However, the reverse isn't necessarily true, since AWS Lambda also supports doing things that CGI doesn't, like servicing requests other than HTTP requests.

Saying that "serverless is just a marketing term for CGI" is wrong in a number of ways, and I really don't understand this point of contention. It is a return to a simpler CGI-like programming model, but it's pretty explicitly about the fact that you don't have to manage the server...

> PHP is great as a runtime, but it sucks as a language, so I didn't want to use it. Django in regular CGI would have been fine; I just didn't realize that was an option.

I'm starting to come back around to PHP. I can't argue against that it has some profound ugliness, but they've sincerely cleaned things up a lot and made life generally better. I like what they've done with PHP 7 and PHP 8 and think that it is totally suitable for simple one-off stuff. And, package management with composer seems straight-forward enough for me.

To be completely clear, I still haven't actually started a new project in PHP in over 15 years, but my opinion has gradually shifted and I fear I may see the day where I return.

I used to love Django, because I thought it was a very productive way to write apps. There are things that Django absolutely gets right, like the built-in admin panel; it's just amazing to have for a lot of things. That said, I've fallen off with Django and Python. Python may not have as butt-ugly as a past as PHP, but it has aged poorly for me. I feel like it is an easy language to write bugs in. Whereas most people agree that TypeScript is a big improvement for JavaScript development, I think many would argue that the juice just isn't worth the squeeze with gradual typing in Python, and I'd have to agree, I just feel like the type checking and ecosystem around it in Python just makes it not worth the effort. Surprisingly, PHP actually pulled ahead here, adding type annotations with some simple run-time checking, making it much easier to catch a lot of bugs that were once very common in PHP. Django has probably moved on and improved since I was last using it, but I definitely lost some of my appreciation for it. For one thing, while it has a decent ecosystem, it feels like that ecosystem is just constantly breaking. I recall running into so many issues migrating across Django versions, and dealing with things like static files. Things that really should be simple...

kragen 4 hours ago [-]

I appreciate the notes on the different nuances of "serverless".

I think you might not be very familiar with how people typically used CGI in the 01990s and 02000s, because you say "[serverless] is a return to a simpler CGI-like programming model, but it's pretty explicitly about the fact that you don't have to manage the server..." when that was the main reason to use CGI rather than something custom at the time; you could use a server that someone else managed. But you seem to think it was a difference rather than a similarity.

Why do you suppose we were running our CGI scripts under NCSA httpd before Apache came out? It wasn't because the HTTP protocol was super complicated to implement. I mean, CGI is a pretty thin layer over HTTP! But you can implement even HTTP/1.1 in an afternoon. It was because the guys in the computer center had a server (machine) and we didn't. Not only didn't we have to manage the server; they wouldn't let us!

As for Python, yeah, I'm pretty disenchanted with Python right now too, precisely because the whole Python ecosystem is just constantly breaking. And static files are kind of a problem for Django; it's optimized for serving them from a separate server.

p2detar 7 hours ago [-]

For smaller things, and I mean single-script stuff, I pretty much always use php-fpm. It’s fast, it scales, it’s low effort to run on a VPS. Shipped a side-project with a couple of PHP scripts a couple of years ago. It works to this day.

jchw 7 hours ago [-]

php-fpm does work surprisingly well. Though, on the other hand, traditional PHP using php-fpm kinda does follow the CGI model of executing stuff in the document root.

UK-Al05 6 hours ago [-]

It's very unix. A single process executable to handle a request then shuts down.

9rx 6 hours ago [-]

I suppose because they can. While there were other good reasons leave CGI behind, performance was really the only reason it got left behind. Now that performance isn't the same concern it once was...

monkeyelite 6 hours ago [-]

Think about all the problems associated with process life cycle - is a process stalled? How often should I restart a crashed process? Why is that process using so much memory? How should my process count change with demand? All of those go away when the lifecycle is tied to the request.

It’s also more secure because each request is isolated at the process level. Long lived processes leak information to other requests.

I would turn it around and say it’s the ideal model for many applications. The only concern is performance. So it makes sense that we revisit this question given that we make all kinds of other performance tradeoffs and have better hardware.

Or you know not every site is about scaling requests. It’s another way you can simplify.

> but it is an outdated execution model

Not an argument.

The opposite trend of ignoring OS level security and hoping your language lib does it right seems like the wrong direction.

jchw 6 hours ago [-]

> Think about all the problems associated with process life cycle - is a process stalled? Should I restart it? Why is that process using so much memory? How should my process count change with demand? All of those go away when the lifecycle is tied to the request.

So the upshot of writing CGI scripts is that you can... ship broken, buggy code that leaks memory to your webserver and have it work mostly alright. I mean look, everyone makes mistakes, but if you are routinely running into problems shipping basic FastCGI or HTTP servers in the modern era you really need to introspect what's going wrong. I am no stranger to writing one-off Go servers for things and this is not a serious concern.

Plus, realistically, this only gives a little bit of insulation anyway. You can definitely still write CGI scripts that explode violently if you want to. The only way you can really prevent that is by having complete isolation between processes, which is not something you traditionally do with CGI.

> It’s also more secure because each request is isolated at the process level. Long lived processes leak information to other requests.

What information does this leak, and why should I be concerned?

> Or you know not every site is about scaling requests. It’s another way you can simplify.

> > but it is an outdated execution model

> Not an argument.

Correct. That's not the argument, it's the conclusion.

For some reason you ignored the imperative parts,

> It's cool that you can fork+exec 5000 times per second, but if you don't have to, isn't that significantly better?

> Plus, with FastCGI, it's trivial to have separate privileges for the application server and the webserver.

> [Having] the web server execute stuff in a specific folder inside the document root just seems like a recipe for problems.

Those are the primary reasons why I believe the CGI model of execution is outdated.

> The opposite trend of ignoring OS level security and hoping your language lib does it right seems like the wrong direction.

CGI is in the opposite direction, though. With CGI, the default behavior is that your CGI process is going to run with similar privileges to the web server itself, under the same user. On a modern Linux server it's relatively easy to set up a separate user with more specifically-tuned privileges and with various isolation options and resource limits (e.g. cgroups.)

taeric 6 hours ago [-]

I'd push back on some of this. Specifically, the memory management that is somewhat inherent to how a CGI script works is typically easier to manage than longer life cycle things. You just tear down the entire process; instead of having to carefully tear down each thing created during the process.

Sure, it is easy to view this as the process being somewhat sloppy with regards to how it did memory. But it can also be seen as just less work. If you can toss the entire allocated range of memory, what benefit is there to carefully walking back each allocated structure? (Notably, arenas and such are efforts to get this kind of behavior in longer lived processes.)

jchw 5 hours ago [-]

True, it is simpler to just simply never free memory and let process teardown take care of it, but I'm only disagreeing with the notion that it's non-trivial to write servers that simply don't leak memory per-request. I think with modern tools, it's pretty easy for anyone to accomplish. Hell, if you can just slap Boehm GC into your C program, maybe it's trivial to accomplish with old tools, too.

taeric 5 hours ago [-]

Fair. My push was less on just not leaking memory entirely, and more that it can scale faster. Both using a GC and relying on teardown are essentially punting the problem from the specific request handling code onto something else. It was not uncommon to see GC based systems fall behind under load. Specifically because their task was more work than tearing down a process.

monkeyelite 6 hours ago [-]

> So the upshot of writing CGI scripts is that you can... ship broken, buggy code that leaks memory to your webserver and have it work mostly alright

Yes. The code is already shitty. That’s life. Let’s make the system more reliable and fault tolerant.

This argument sounds a lot like “garbage collection is for bad programmers who can’t manage their memory”.

But let me add another reason with your framing. In fire/forget programmers get used to crashing intentionally at the first sign of trouble. This makes it easy to detect failures and improve code. The incentive for long running processes is to avoid crashing, so programs get into bad states instead.

> The only way you can really prevent that is by having complete isolation between processes

Yes. That’s the idea. Separate memory spaces.

> What information does this leak

Anything that might be in a resource, or memory. Or even in the resource of a library.

> and why should I be concerned

Accessing leaked information form a prior run is a common attack.

> but if you don't have to, isn't that significantly better?

Long running processes are inherently more complex. The only benefit is performance.

> H’the web server execute stuff in a specific folder inside the document root just seems like a recipe for problems.

As opposed to? All processes have a working directory. What problems come from using the file system?

> cgroups

Yes it’s the same amount of effort to configure.

jchw 6 hours ago [-]

> Yes. The code is already shitty. That’s life. Let’s make the system more reliable so that small code mistakes are disasters.

> This argument sounds a lot like “garbage collection is for bad programmers who can’t manage their memory”.

This is not a "Simply don't make mistakes" type of argument, it's more like a "We've moved past this problem" type of argument. The choice of garbage collection as an example is a little funny, because actually I'd argue heavily in favor of using garbage collection if you're not latency-sensitive; after all, like I said, I use Go for a lot of one-off servers.

It'd be one thing if every person had to sit there and solve again the basic problems behind writing an HTTP server, but you don't anymore. Many modern platforms put a perfectly stable HTTP server right in the standard library, freeing you from even needing to install more dependencies to be able to handle HTTP requests effectively.

> > The only way you can really prevent that is by having complete isolation between processes

> Yes. That’s the idea. Web server forks, and execs. Separate memory spaces.

That's not complete isolation between processes. You can still starve the CPU or RAM, get into contention over global locks (e.g. sqlite database), do conflicting file I/O inside the same namespace. I can go on but the point is that I don't consider two processes running on the same machine to be "isolated" with each-other. ("Process isolation" is typically used to talk about isolation between processes, not isolation of workloads into processes.) If you do it badly, you can wind up with requests that sporadically fail or hang. If you do it worse, you can wind up with corruption/interleaving writes/etc.

Meanwhile, if you're running a typical Linux distro with systemd, you can slap cgroups and namespacing onto your service with the triviality of slapping some options into an INI file. (And if you're not because you hate systemd, well, all of the features are still there, you just may need to do more work to use them.)

> > What information does this leak

> Anything that might be in a resource, or memory. Or even in the resource of a library you use.

> > and why should I be concerned

> Accessing leaked information form a prior run is a common attack.

I will grant you that you can't help it if one of your dependencies (or God help you, the standard library/runtime of your programming language) is buggy and leaks global state between instantiations. Practically speaking though, if you are already not sharing state between requests this is just not a huge issue.

Sometimes it feels like we're comparing "simple program written in CGI where it isn't a big deal if it fails or has some bugs" to "complex program written using a FastCGI or HTTP server where it is a big deal if it leaks a string between users".

> As opposed to? All processes have a working directory. What problems come from using the file system?

The problem isn't the working directory, it's the fact that anything in a cgi-bin directory 1. will be exec'd if it can be 2. exists under the document root, which the webserver typically has privileges to write to.

> Yes it’s the same amount of effort to configure this.

I actually really didn't read this before writing out how easy it was to use these with systemd, so I guess refer to the point above.

0xbadcafebee 5 hours ago [-]

It's the same reason people are using SQLite for their startup's production database, or why they self-host their own e-mail server. They're tech hipsters. Old stuff is cool, man. Now if you'll excuse me, I need to typewrite a letter and OCR it into Markdown so I can save it in CVS and e-mail my editor an ar'd version of the repo so they can edit the new pages of my upcoming book ("Antique Tech: Escaping Techno-Feudalism with Old Solutions to New Problems")

g-mork 7 hours ago [-]

processless is the new serverless, it lets you fit infinite jobs in RAM thus enabling impressive economies of scale. only dinosaurs run their own processes

dengolius 5 hours ago [-]

What is the reason to choose gohttpd? I mean there are a lot of non standard libraries for go that are pretty fast or faster then gohttpd - https://github.com/valyala/fasthttp/ as example

carodgers 5 hours ago [-]

Looks like CGI was recently removed from python 3. https://docs.python.org/3/library/cgi.html

What is a modern python-friendly alternative?

kragen 5 hours ago [-]

Python has a policy against maintaining compatibility with boring technology. We discussed this at some length in this thread the other day at https://news.ycombinator.com/item?id=44477966; many people voiced their opposition to the policy. The alternatives suggested for the specific case of the cgi module were:

- wsgiref.handlers.CGIHandler, which is not deprecated yet. gvalkov provided example code for Flask at https://news.ycombinator.com/item?id=44479388

- use a language that isn't Python so you don't have to debug your code every year to make it work again when the language maintainers intentionally break it

- install the old cgi module for new Python from https://github.com/jackrosenthal/legacy-cgi

- continue using Python 3.12, where the module is still in the standard library, until mid-02028

exabrial 5 hours ago [-]

Currently in Europe. Earlier, was trying to use the onboard wifi on a train, which has frequent latency spikes as you can imagine. It never quite drops out, but latency does vary between 50ms-5000ms on most things.

I struggled for _15 mins_ on yet another f#@%ng-Javascript-based-ui-that-does-not-need-to-be-f#@%ng-Javascript, simply trying to reset my password for Venmo.

Why... oh why... do we have to have 9.1megabytes of f#@*%ng scripts just to reset a single damn password? This could be literally 1kb of HTML5 and maybe 100kb of CSS?

Anyway, this was a long way of saying I welcome FastCGI and server side rendering. Js need to be put back into the toys bin... er trash bin, where it belongs.

hedgehog 4 hours ago [-]

CGI still makes a lot of sense when there are many applications that each only get requests at a low rate. Pack them onto servers, no RAM requirement unless actively serving a request. If the most of the requests can be served straight from static files by the web server then it's really only the write rate that matters, so even a high traffic sites could be a good match. With sendfile and kTLS the static content doesn't even need to touch user space.

7 hours ago [-]

rokob 6 hours ago [-]

I’m interested why Rust and C have similarly bad tail latencies but Go doesn’t.

scraptor 5 hours ago [-]

sqlite resolves lock contention between processes with exponential backoff. When the WAL reaches 4MB it stops all writes while it gets compacted into the database. Once the compaction is over all the waiting processes probably have retry intervals in the hundred millisecond range, and as they exit they are immediately replaced with new processes with shorter initial retry intervals. I don't know enough queuing theory to state this nicely or prove it, but I imagine the tail latency for the existing processes goes up quickly as the throughput of new processes approaches the limit of the database.

twh270 6 hours ago [-]

OP posited SQLite database contention. I don't know enough about this space to agree or disagree. It would be interesting, and perhaps illuminating, to perform a similar experiment with Postgres.

bracketfocus 6 hours ago [-]

The author guessed it was a result of database contention.

I’d also be interested in getting a concrete reason though.

oxcabe 6 hours ago [-]

It'd be interesting to compare the performance of the author's approach to an analogous design that changes CGI for WASI, and scripts/binaries to Wasm.

IshKebab 6 hours ago [-]

Would it? It would be exactly the same but a bit slower because of the WASM overhead.

kragen 5 hours ago [-]

No, Linux typically takes about 1ms to fork/exit/wait and another fraction of a millisecond to exec, and was only getting about 140 requests per second per core in this configuration, while creating a new WASM context is closer to 0.1ms. I suspect the bottleneck is either the web server or the database, not the CGI processes.

andrewstuart 7 hours ago [-]

How meaningful is “per day” as a performance metric?

kragen 5 hours ago [-]

It was traditional 30 years ago to describe web site traffic levels in terms of hits per day, perhaps because "two hundred thousand hits per day" sounds more impressive than "2.3 hits per second". Consequently a lot of us have some kind of intuition for what kind of service might need to handle a thousand hits per day, a million hits per day, or a billion hits per day.

As other commenters have pointed out, peak traffic is actually more important.

diath 7 hours ago [-]

Not at all, it may be a useful marketing metric, but not a performance one. The average load does not matter when your backend can't handle the peaks.

xnx 7 hours ago [-]

True, though a lot higher spec'ed systems couldn't handle the minimum 5000 requests/second this implies.

dspillett 5 hours ago [-]

As a comparison between implementations it can be useful. It is more than a big enough number that, if the test was actually done over a day, temporary oddities are dwarfed. If the test was done over an hour and multiplied then it is meaningless: just quote the per hour figure. Same, but more so, if the tests were much shorter than an hour.

hu3 5 hours ago [-]

I work on a system for a client that averages 50 requests per second but handles 6k req/s during peaks and we have SLA of P99% <= 50ms.

So I'd say per day is not very meaningful.

8 hours ago [-]

Loading comments...

neilv 6 hours ago [-]

ptsneves 6 hours ago [-]

I found that ChatGPT revived vanilla javascript and jquery for me.

Most of the chore part is done by chatgpt and the mental model of understanding what it wrote is very light and often single file. It is also easily embedded in static file generators.

KronisLV 4 hours ago [-]

> I found that ChatGPT revived vanilla javascript and jquery for me.

Only problem was that jQuery doesn't compose as nice, I missed the ability to nest a bunch of components. Might just have to look at Lit or something.

maxwell 4 hours ago [-]

pkal 6 hours ago [-]

kragen 5 hours ago [-]

pkal 5 hours ago [-]

kragen 5 hours ago [-]

There's a fsync/close/rename dance that ext4fs recognizes as a safe, durable atomic file replacement, which is often sufficient for preventing data loss in cases like this.

masklinn 4 hours ago [-]

FWIW POSIX requires that rename(2) be atomic, it's not just ext4, any POSIX FS should work that way.

The lockfile can be the "scratch" file as open(O_CREAT | O_EXCL) is also guaranteed to be atomic, however now you need a way to wait for that path to disappear before retrying.

kragen 3 hours ago [-]

I might have been misremembering: https://lwn.net/Articles/322823/

That was without fsync, though. But apparently it's a topic of discussion more recently: https://lwn.net/Articles/789600/

I agree that it still requires a lockfile if write conflicts are not acceptable.

kragen 5 hours ago [-]

Summary:

- 60 virtual AMD Genoa CPUs with 240 GB (!!!) of RAM

- bash guestbook CGI: 40 requests per second (and a warning not to do such a thing)

- Perl guestbook CGI: 500 requests per second

- JS (Node) guestbook CGI: 600 requests per second

- Python guestbook CGI: 700 requests per second

- Golang guestbook CGI: 3400 requests per second

- Rust guestbook CGI: 5700 requests per second

- C guestbook CGI: 5800 requests per second

https://github.com/Jacob2161/cgi-bin

I wonder if the gohttpd web server he was using was actually the bottleneck for the Rust and C versions?

simonw 7 hours ago [-]

masklinn 5 hours ago [-]

This neatly demonstrates one of the issues with CGI: they add synchronisation issues while removing synchronisation tooling.

5 hours ago [-]

simonw 5 hours ago [-]

Had to look that up: Time-Of-Check-to-Time-Of-Use

Here's that code:

  let new = !Path::new(DB_PATH).exists();
  let conn = Connection::open(DB_PATH).expect("open db");

  // ...
  if new {
      conn.execute_batch(
          r#"
          CREATE TABLE guestbook(

That's pretty unlikely. In this case the losing script would return a 500 error to that single user when the CREATE TABLE fails.

Honestly if this was my code I wouldn't even bother fixing that.

(If I did fix it I'd switch to "CREATE TABLE IF NOT EXISTS...")

... but yeah, it's a good illustration of the point you're making about CGI introducing synchronization errors that wouldn't exist in app servers.

kragen 5 hours ago [-]

That sounds correct to me, but I think I would apply your suggested fix.

Bluestein 7 hours ago [-]

This is a veritable Rosetta stone of a repo. Wow.-

antoineleclair 4 hours ago [-]

We used CGI to add support for extensions in Disco (https://disco.cloud/).

It's so simple and it can run anything, and it was also relatively easy to have the CGI script run inside a Docker container provided by the extension.

In other words, it's so flexible that it means the extension developers would be able to use any language they want and wouldn't have to learn much about Disco.

I would probably not push to use it to serve big production sites, but I definitely think there's still a place for CGI.

In case anyone is curious, it's happening mostly here: https://github.com/letsdiscodev/disco-daemon/blob/main/disco...

0xbadcafebee 6 hours ago [-]

> No one should ever run a Bash script under CGI. It’s almost impossible to do so securely, and performance is terrible.

kragen 5 hours ago [-]

0xbadcafebee 5 hours ago [-]

kragen 5 hours ago [-]

Yeah, a lot of wireless routers would have no trouble. A lot of them do in fact have MMUs. I wonder if you could get Busybox running on an ESP32? Probably not most 8051s, though, or AVR8s.

shrubble 7 hours ago [-]

RedShift1 7 hours ago [-]

ted537 6 hours ago [-]

Haha yes -- like what do you mean this CRUD app needs 20 GB of RAM and half an hour to startup?

jchw 7 hours ago [-]

zokier 6 hours ago [-]

inetknght 5 hours ago [-]

> there is no ... service management to worry about

Service management:

    systemctl start service.app

    docker run --restart=unless-stopped --name=myservice myservice:version

If it isn't written as a service, then it doesn't need management. If it is written as a service, then service management tools make managing it easy.

> there is no graceful shutdown ... to worry about

Graceful shutdown:

    kill -9

    docker kill -9 myservice

If your app/service can't handle that, then it's designed poorly.

taeric 3 hours ago [-]

I think the point is having to worry about runaway memory or other bitrot inherent in long running services?

rajaravivarma_r 4 hours ago [-]

I'm wondering the same, but honestly I have a soft corner for the old way of doing things as well, and I think it stems from it.

The performance numbers seem to show how bad it is in real world.

For testing I converted the CGI script into a FastAPI script and benchmarked it on my MacBookPro M3. I'm getting super impressive performance numbers,

FastAPI-guestbook.py https://gist.github.com/rajaravivarma-r/afc81344873791cb52f3...

0x000xca0xfe 6 hours ago [-]

I guess multiprocessing got a bad reputation because it used to be slow and simple so it got looked down upon as a primitive tool for less capable developers.

ben-schaaf 6 hours ago [-]

That really has nothing to do with the choice to use CGI. You can just as well use rust with Axum or Actix and get a fully threaded web server without having to fork for every request.

0x000xca0xfe 6 hours ago [-]

Absolutely, I'm not recommending for everybody to go back using CGI (the protocol). I was responding to this:

> The CGI model may still work fine, but it is an outdated execution model

The CGI model of one process per request is excellent for modern hardware and really should not be scoffed at anymore IMO.

How is this execution model "outdated"?

jchw 6 hours ago [-]

The part of the execution model that is dated is this:

> having the web server execute stuff in a specific folder inside the document root just seems like a recipe for problems

toast0 5 hours ago [-]

Typically I've run cgi from a directory outside the document root. That's easy, and I think was the defaults?

That said, fork+exec isn't the best for throughput. Especially if the httpd doesn't isolate forking into a separate, barebones, child process, fork+exec involves a lot of kernel work.

FastCGI or some other method to avoid forking for each request is valuable regardless of runtime. If you have a runtime with high startup costs, even more so.

SahAssar 3 hours ago [-]

> FastCGI or some other method to avoid forking for each request is valuable regardless of runtime. If you have a runtime with high startup costs, even more so.

What's the point of using FastCGI compared to a plain http server then? If you are going to have a persistent server running why not just use the protocol you are already using the semantics of?

toast0 2 hours ago [-]

I don't generally want or need my application server to serve static files, but I may want to serve them on the same hostname (or maybe I don't).

SahAssar 2 hours ago [-]

> You could certainly do a reverse proxy and use HTTP instead of FastCGI as the protocol between the client facing httpd and the application server

toast0 22 minutes ago [-]

I mean, that protocol doesn't really matter to me, that's why I said "FastCGI or some other method" The important bit is avoiding fork+exec on every request.

kragen 5 hours ago [-]

jchw 5 hours ago [-]

> Like, if the administrator put something inside that directory

If you've never seen or heard of this, ¯\_(ツ)_/¯

> Unix doesn't have folders

Great and very important point. Someone should go fix all of these bugs:

https://github.com/search?q=repo%3Atorvalds%2Flinux%20folder...

kragen 4 hours ago [-]

Why are you linking me to a "Sign in to search code on GitHub" page?

jchw 4 hours ago [-]

> Why are you linking me to a "Sign in to search code on GitHub" page?

0x000xca0xfe 5 hours ago [-]

This is easy enough for non-technical people or school kids and still how it works for many Wordpress sites.

The modern way of deploying things is safer but the extra complexity has pushed many, many folks to just put their stuff on Facebook/Instagram instead of leveling up their devops skills.

Somehow we need to get the simplicity back, I think. Preferably without all the exploits.

jchw 6 hours ago [-]

Still, even when people run single-thread event loop servers, you can run an instance per CPU core; I recall this being common for WSGI/Python.

taeric 7 hours ago [-]

And, sadly, there is no getting around the "configure everything perfectly" problem. :(

Nzen 6 hours ago [-]

My personal interest in CGI stems from my website host offering it as a means of responding to requests [0] in addition to static assets.

[0] https://www.nearlyfreespeech.net/help/faq#CGISupport

kragen 5 hours ago [-]

Serverless is a marketing term for CGI, and you can observe that serverless is very popular.

It probably handled less than 1000 total requests over the months that we were using it, so, no, it was not significantly better to not fork+exec for each page load.

jchw 5 hours ago [-]

> Serverless is a marketing term for CGI, and you can observe that serverless is very popular.

No, it's not.

CGI and serverless are only similar in exactly one way: your application is written "as-if" the process is spawned each time there is a request. Beyond that, they are entirely unrelated.

> It probably handled less than 1000 total requests over the months that we were using it, so, no, it was not significantly better to not fork+exec for each page load.

To be completely honest with you, I actually agree with your conclusion in this case. CGI would've been better than Django/FastCGI/etc.

9rx 4 hours ago [-]

> No, it's not.

> "Serverless" is a marketing term for a fully managed offering where you give a PaaS

jchw 4 hours ago [-]

The point isn't really that the application is unaware of the server, it's that the server is entirely abstracted away from you. CGI vs serverless is apples vs oranges.

> [...] but the term is being used to sell to developers. It communicates to them, quite literally, that they don't have to make their application a server [...]

9rx 4 hours ago [-]

> It is being sold to businesses, that they don't have to manage a server.

Do you mean a... computer? Server is a software term. It is a process that listens for network requests.

jchw 4 hours ago [-]

> Do you mean a... computer? Server is a software term. It is a process that listens for network requests.

This has nothing to do with the rise of embedding a server into your application.

kragen 4 hours ago [-]

jchw 3 hours ago [-]

kragen 3 hours ago [-]

I agree that it's very much like inetd. Or the Unix shell, which launches one or more processes for each user command.

9rx 4 hours ago [-]

> Serverless was a term invented specifically by providers of computers-that-aren't-yours to give people options to not need to manage the computers-that-aren't-yours.

jchw 3 hours ago [-]

Okay. Let's ask Amazon since they invented the term:

That's also the same idea for serverless functions. It's not about whether your application has a "server" in it.

kragen 3 hours ago [-]

The only word of this that is not is a description of old-fashioned shared CGI hosting is "massively scalable". (And maybe "efficient".)

9rx 3 hours ago [-]

> Serverless computing is an application development model

> But AWS also uses the term for other things

The terms has expended to be used for all kinds of different things, sure. There is probably a toaster out there somewhere solid as being "Serverless" nowadays.

jchw 3 hours ago [-]

> AWS Fargate is a serverless, pay-as-you-go compute engine that lets you focus on building applications without managing servers.

Edit: you added more but it's just more attempting to justify away things that are plainly evident... But I can't help myself. This is just non-sense:

> Deployment isn't a development model,

Software development is not just writing code.

9rx 3 hours ago [-]

> Software development is not just writing code.

> I guess the next argument is that Amazon is just diluting the term

kragen 4 hours ago [-]

> If you're managing CGI scripts on a traditional server, you're still managing a server.

Usually somebody else is managing the server, or servers, so you don't have to think about it. That's been how it's worked for 30 years.

> Before PaaSes I don't think I've ever seen anyone once call CGI "serverless".

jchw 3 hours ago [-]

Oh my god! This could go on forever.

These PaaSes popularized the term to mean this from the gitgo, just because you have passionately formed a belief that it ever meant something else doesn't change a thing.

9rx 3 hours ago [-]

> On the contrary there are "serverless" platforms that run servers!

That's the trouble when a term catches on — everyone wants to jump all over it and use it as they please.

kragen 3 hours ago [-]

Codd was happy enough to tout SQL as "relational" in his Turing Award address! Maybe you mean Date? He was involved from early on but didn't invent it.

3 hours ago [-]

jchw 3 hours ago [-]

Let me emphasize some points here:

> Previously, running code in the cloud meant creating a cloud service to host the application logic...

> then operating the service, requiring developers to be experts in everything from automating failover to security to service reliability...

> there is no capacity planning or up-front resource type selection required to handle additional traffic

We can talk about the parallels between CGI servers and Lambda all day and all night, but I am not letting this non-sense go. Serverless is not a marketing term for CGI.

[1]: https://www.allthingsdistributed.com/2024/11/aws-lambda-turn...

kragen 2 hours ago [-]

This is great, thanks for digging it up!

I agree that there's no evidence that they were talking about "servers" in the sense of processes that listen on a socket, rather than "servers" in the sense of computers that those processes run on.

Just to be clear, I know I'm not going to convince you of anything, but I'm really appreciating how much better informed I'm becoming because of this conversation!

jchw 2 hours ago [-]

9rx 2 hours ago [-]

kragen 3 hours ago [-]

> enables you to run stateless containers that are invocable via HTTP requests. (...) abstracts away all infrastructure management

______

† Well, Steve would probably cost more than what Amazon charges, so customers may have a pretty good guess, but it could be a loss leader or something.

kragen 4 hours ago [-]

PHP is great as a runtime, but it sucks as a language, so I didn't want to use it. Django in regular CGI would have been fine; I just didn't realize that was an option.

jchw 4 hours ago [-]

What you're talking about is "serverless functions". The point of that is still that you don't have to manage a server, not that your function runs once per request.

> because it's spinning up (AFAIK) an entire VPS

For AWS Lambda, it is spinning up Firecracker instances. I think you could conceivably consider these to not be entire VPS instances, even though they are hardware virtualization domains.

> PHP is great as a runtime, but it sucks as a language, so I didn't want to use it. Django in regular CGI would have been fine; I just didn't realize that was an option.

To be completely clear, I still haven't actually started a new project in PHP in over 15 years, but my opinion has gradually shifted and I fear I may see the day where I return.

kragen 4 hours ago [-]

I appreciate the notes on the different nuances of "serverless".

p2detar 7 hours ago [-]

jchw 7 hours ago [-]

php-fpm does work surprisingly well. Though, on the other hand, traditional PHP using php-fpm kinda does follow the CGI model of executing stuff in the document root.

UK-Al05 6 hours ago [-]

It's very unix. A single process executable to handle a request then shuts down.

9rx 6 hours ago [-]

I suppose because they can. While there were other good reasons leave CGI behind, performance was really the only reason it got left behind. Now that performance isn't the same concern it once was...

monkeyelite 6 hours ago [-]

It’s also more secure because each request is isolated at the process level. Long lived processes leak information to other requests.

Or you know not every site is about scaling requests. It’s another way you can simplify.

> but it is an outdated execution model

Not an argument.

The opposite trend of ignoring OS level security and hoping your language lib does it right seems like the wrong direction.

jchw 6 hours ago [-]

> It’s also more secure because each request is isolated at the process level. Long lived processes leak information to other requests.

What information does this leak, and why should I be concerned?

> Or you know not every site is about scaling requests. It’s another way you can simplify.

> > but it is an outdated execution model

> Not an argument.

Correct. That's not the argument, it's the conclusion.

For some reason you ignored the imperative parts,

> It's cool that you can fork+exec 5000 times per second, but if you don't have to, isn't that significantly better?

> Plus, with FastCGI, it's trivial to have separate privileges for the application server and the webserver.

> [Having] the web server execute stuff in a specific folder inside the document root just seems like a recipe for problems.

Those are the primary reasons why I believe the CGI model of execution is outdated.

> The opposite trend of ignoring OS level security and hoping your language lib does it right seems like the wrong direction.

taeric 6 hours ago [-]

jchw 5 hours ago [-]

taeric 5 hours ago [-]

monkeyelite 6 hours ago [-]

> So the upshot of writing CGI scripts is that you can... ship broken, buggy code that leaks memory to your webserver and have it work mostly alright

Yes. The code is already shitty. That’s life. Let’s make the system more reliable and fault tolerant.

This argument sounds a lot like “garbage collection is for bad programmers who can’t manage their memory”.

> The only way you can really prevent that is by having complete isolation between processes

Yes. That’s the idea. Separate memory spaces.

> What information does this leak

Anything that might be in a resource, or memory. Or even in the resource of a library.

> and why should I be concerned

Accessing leaked information form a prior run is a common attack.

> but if you don't have to, isn't that significantly better?

Long running processes are inherently more complex. The only benefit is performance.

> H’the web server execute stuff in a specific folder inside the document root just seems like a recipe for problems.

As opposed to? All processes have a working directory. What problems come from using the file system?

> cgroups

Yes it’s the same amount of effort to configure.

jchw 6 hours ago [-]

> Yes. The code is already shitty. That’s life. Let’s make the system more reliable so that small code mistakes are disasters.

> This argument sounds a lot like “garbage collection is for bad programmers who can’t manage their memory”.

> > The only way you can really prevent that is by having complete isolation between processes

> Yes. That’s the idea. Web server forks, and execs. Separate memory spaces.

> > What information does this leak

> Anything that might be in a resource, or memory. Or even in the resource of a library you use.

> > and why should I be concerned

> Accessing leaked information form a prior run is a common attack.

> As opposed to? All processes have a working directory. What problems come from using the file system?

> Yes it’s the same amount of effort to configure this.

I actually really didn't read this before writing out how easy it was to use these with systemd, so I guess refer to the point above.

0xbadcafebee 5 hours ago [-]

g-mork 7 hours ago [-]

processless is the new serverless, it lets you fit infinite jobs in RAM thus enabling impressive economies of scale. only dinosaurs run their own processes

dengolius 5 hours ago [-]

What is the reason to choose gohttpd? I mean there are a lot of non standard libraries for go that are pretty fast or faster then gohttpd - https://github.com/valyala/fasthttp/ as example

carodgers 5 hours ago [-]

Looks like CGI was recently removed from python 3. https://docs.python.org/3/library/cgi.html

What is a modern python-friendly alternative?

kragen 5 hours ago [-]

- wsgiref.handlers.CGIHandler, which is not deprecated yet. gvalkov provided example code for Flask at https://news.ycombinator.com/item?id=44479388

- use a language that isn't Python so you don't have to debug your code every year to make it work again when the language maintainers intentionally break it

- install the old cgi module for new Python from https://github.com/jackrosenthal/legacy-cgi

- continue using Python 3.12, where the module is still in the standard library, until mid-02028

exabrial 5 hours ago [-]

I struggled for _15 mins_ on yet another f#@%ng-Javascript-based-ui-that-does-not-need-to-be-f#@%ng-Javascript, simply trying to reset my password for Venmo.

Why... oh why... do we have to have 9.1megabytes of f#@*%ng scripts just to reset a single damn password? This could be literally 1kb of HTML5 and maybe 100kb of CSS?

Anyway, this was a long way of saying I welcome FastCGI and server side rendering. Js need to be put back into the toys bin... er trash bin, where it belongs.

hedgehog 4 hours ago [-]

7 hours ago [-]

rokob 6 hours ago [-]

I’m interested why Rust and C have similarly bad tail latencies but Go doesn’t.

scraptor 5 hours ago [-]

twh270 6 hours ago [-]

OP posited SQLite database contention. I don't know enough about this space to agree or disagree. It would be interesting, and perhaps illuminating, to perform a similar experiment with Postgres.

bracketfocus 6 hours ago [-]

The author guessed it was a result of database contention.

I’d also be interested in getting a concrete reason though.

oxcabe 6 hours ago [-]

It'd be interesting to compare the performance of the author's approach to an analogous design that changes CGI for WASI, and scripts/binaries to Wasm.

IshKebab 6 hours ago [-]

Would it? It would be exactly the same but a bit slower because of the WASM overhead.

kragen 5 hours ago [-]

andrewstuart 7 hours ago [-]

How meaningful is “per day” as a performance metric?

kragen 5 hours ago [-]

As other commenters have pointed out, peak traffic is actually more important.

diath 7 hours ago [-]

Not at all, it may be a useful marketing metric, but not a performance one. The average load does not matter when your backend can't handle the peaks.

xnx 7 hours ago [-]

True, though a lot higher spec'ed systems couldn't handle the minimum 5000 requests/second this implies.

dspillett 5 hours ago [-]

hu3 5 hours ago [-]

I work on a system for a client that averages 50 requests per second but handles 6k req/s during peaks and we have SLA of P99% <= 50ms.

So I'd say per day is not very meaningful.

8 hours ago [-]