This constraint is common to most n-tier architectures (with Postgres or MySQL) ...

throwdbaaway · on May 10, 2022

Once you have that constraint, it means you will either have the same network latency when writing to SQLite (if it is fronted by some lightweight proxy), or have a lot more frequent failover of SQLite (if it is running embedded within the app, thus following the app's deployment schedule).

I suppose if someone decides to deploy Postgres/MySQL replicas as a sidecar, then it will be the same as what you will end up with?

tptacek · on May 10, 2022

Yes: nobody is claiming otherwise. SQLite drastically speeds up reads, and it speeds up writes in single-server settings. In a multi-server setting, writes have comparable (probably marginally poorer, because of database-level locking, in a naive configuration) performance to Postgres. The lay-up wins of SQLite in a multi-server environment are operational simplicity (compared to running, say, a Postgres cluster) and read acceleration.

throwaway894345 · on May 10, 2022

> The lay-up wins of SQLite in a multi-server environment are operational simplicity (compared to running, say, a Postgres cluster) and read acceleration.

What's the operational simplicity? You still have to do backups and replication and SSL. Maybe you don't have to worry about connectivity between the app and the database? Maybe auth?

tptacek · on May 10, 2022

You don't have to manage a database server if there is no database server.

throwaway894345 · on May 10, 2022

I mean, there are managed SQL services too. Comparing managed SQLite to DIY Postgres seems disingenuous.

EDIT: I didn’t expect this to be controversial, but I’d like to know where I’ve erred. If you need lightstream to make SQLite operationally simple (beyond single servers, anyway), that seems pretty analogous to RDS to make Postgres operationally simple, right?

nemothekid · on May 10, 2022

>I didn’t expect this to be controversial, but I’d like to know where I’ve erred

I don't think what you are saying is controversial but it feels like you are being contentious for no reason. Your rebuttal doesn't even make sense - telling me I can pay someone to manage it for me, is not the same as it not needing management at all.

Whether I manage it, or someone else does; there is still an operational or financial cost.

tremon · on May 10, 2022

The parent's point seems to be that using litestream to replicate your sqlite data also is not the same as not needing management at all.

Whether you do your own sqlite replication, or litestream does it for you; there is still an operational or financial cost.

nouveaux · on May 10, 2022

I didn't downvote you. Postgres as a database server is operationally more complex when compared to Sqlite. Since Postgres is a network service, you have to deal with networking and security. Upgrading Postgres is a big task in and of itself. Backups has to happen over the network.

Number of network connections is another sore point. One of Postgres' biggest pain point is the low number of connections it supports. It is not uncommon to have to run a proxy in front of Postgres to increase the number of connections.

Sqlite gives you so much for free as long as you can work within its constraint, which is single writer (for the most part.)

vinay_ys · on May 10, 2022

You have a more complex network setup actually. You have north-south traffic between your client->LB->servers. and you have east-west traffic between your servers for sqlite replication. Both happening on the same nodes and no isolation whatsoever. More things can go wrong and will require more tooling to disambiguate between different potential failures.

W.r.t security, you have same challenges to secure east/west vs north/south traffic. W.r.t # of connections, Postgres has a limit on number of connections for a reason – if you are running a multi-process or milt-thread app framework that's talking to sqlite, you have just traded connection limit to concurrent process/thread access limit to sqlite. I don't know if one is better than other – it all depends on your tooling to debug things when things inevitably fail at redline stress conditions.

GauntletWizard · on May 11, 2022

You're technically right, which is to say: Not at all.

You don't replicate east-west with Litestream, you replicate north -> south where the "North" is your writer and your "South" is some S3-like service. Basically every application has "Some S3 Like Service" as a dependency now, so that's not a new network connection.

nouveaux · on May 11, 2022

You make a really good point about trading Sqlite write limits to the limited number of Postgres connections. My comment is mostly about having to run a proxy, which is another service that ends up being a liability. Regardless, if you need multi-writers, then the obvious solution is Postgres.

"Both happening on the same nodes and no isolation whatsoever. More things can go wrong and will require more tooling to disambiguate between different potential failures."

As a Postgres fan, the extra Postgres service to me is a far bigger liability than Sqlite. The east-west networking is true for Postgres replication as well, so it's a wash to me.

Abishek_Muthian · on May 10, 2022

> Upgrading Postgres is a big task in and of itself.

Learnt it the hard way when I first upgraded the major version, Only to realize that the data needs to be migrated first. pg_upgrade requires binaries of the older version and so we need copies of data, as well as binaries of old & new version of postgres[1] i.e. if not manually dumped; Fortunately it was just my home server.

[1] https://wiki.archlinux.org/title/PostgreSQL#Upgrading_Postgr...

jhugo · on May 10, 2022

It isn't comparable to a managed PostgreSQL. There is no server. SQLite is just a file format and a library to access it.

throwaway894345 · on May 11, 2022

It still has to run in a server process on a server host, and virtually all of the challenges of managing a database server are just moved up to the application layer. There are fewer actual hosts, but for the likes of fly.io or a cloud provider a difference of a handful of hosts is negligible because hosts are cattle rather than pets.

There might be advantages to SQLite (e.g., maybe lightstream's replication/backup/etc story is simpler than Postgres's), but "number of hosts" or "number of server processes" doesn't seem compelling to me.

jhugo · on May 12, 2022

Of course, it has nothing to do with the number of hosts or server processes, and I didn't see anyone claim that it does.

It's difficult-to-impossible to truly run Postgres as a proper "herd of cattle", due to the need to failover replication roles between reader and reader-writer, ensure replication is in sync on startup before allowing reads, handle upgrades carefully, etc. If you're using something like RDS or another managed Postgres, this is still the case, it's just being handled for you by someone else.

So it's not that you're just reducing the number of hosts; you're eliminating an entire class of hosts that have complex state and maintenance requirements. Your application processes are hopefully a herd of identical cattle and can be killed off and relaunched at will, and that property remains the same after introducing Lightstream.

nwienert · on May 10, 2022

Litestream is a database server, isn’t it?

tptacek · on May 10, 2022

No; there's no such thing as a sqlite3 server. The database is the file(s). Litestream runs alongside everything else using sqlite3 and ensures that it's replicating. If Litestream crashes, reads from the database keep working fine (though, of course, they'll start to stale if it doesn't come back up).

This is why we called out in the post that Litestream is "just sqlite3". It's not sitting between apps and the database.

throwoutway · on May 10, 2022

That seems disingenuous. If sqlite3 isn't a server, then neither is apache2. But in reality they're both binaries 'serving' 'files' over an interface. You're just hosting them on the same machine, reverting to a monolith-style deployment. Which is fine, but then lets call it what it is.

ignoramous · on May 10, 2022

> That seems disingenuous. If sqlite3 isn't a server, then neither is apache2.

Your argument really is with Dr. Richard Hipp: https://sqlite.org/serverless.html

dagw · on May 10, 2022

But in reality they're both binaries 'serving' 'files' over an interface.

By that definition fopen() is also a server.

WorldMaker · on May 10, 2022

According to Plan9, fopen() is also a server.

nwienert · on May 10, 2022

That’s my point though, if you want to use SQLite in production / with Litestream, you’re comparing two databases that need servers to function just implemented quite differently.

In the same vein as your article, it seems very fair to say that if the Litestream server went down in production you’d have a broken app for most use cases within a few minutes. So in practical effect, the server of Litestream is about as essential.

ignoramous · on May 10, 2022

> Litestream crashes, reads from the database keep working fine.

fly-app's litestream-base dockerfile suggests that the litestream process supervises the app process... I guess then that's a limitation specific to fly.io's deployment model and not litestream?

mwcampbell · on May 10, 2022

> have a lot more frequent failover of SQLite (if it is running embedded within the app, thus following the app's deployment schedule).

That does sound like it's going to be difficult to get right. But if Litestream eventually implements a robust solution for this problem, then I think some added complexity in the deployment process will be a reasonable price to pay for increased app performance the rest of the time.

tptacek · on May 10, 2022

For what it's worth, I think this problem (the complexity that bleeds into the app for handling leaders) is mostly orthogonal to the underlying database. You have the same complexity with multi-reader single-writer Postgres. But the code that makes multi-reader SQLite work is a lot easier to reason about.

Let me know if you think I'm off about that.

mwcampbell · on May 10, 2022

Unless I'm misunderstanding something, I do think using SQLite makes a significant difference in the complexity of app deployment. When using multi-region Postgres, it's true that you only want the Postgres leader to be accessed by app instances in the same region, so the app instances all have to know which region is running the leader. But multiple app instances in that region can connect to that Postgres leader, so it's easy to do a typical rolling deploy. With SQLite, only one app instance at a time can write to the database, so IIUC, there will have to be a reliable way of doing failover with every app deploy. I suppose the same thing has to happen in the Postgres scenario when updating Postgres itself, but that's way less frequent than deploying new versions of the app.

nickcox · on May 10, 2022

> multiple app instances in that region can connect to that Postgres leader, so it's easy to do a typical rolling deploy

This is mentioned as a drawback at towards the end of the blogpost, isn't it? It does seem it would make deployments rather awkward.

ngrilly · on May 10, 2022

You can have two app instances writing to the database concurrently, as long as they are running on the same machine. Then it’s possible to deploy a new version without downtime by starting the new binary, switching traffic to the new instance, draining connections to the old instance, and shut it down. That’s kind of how nginx for example upgrades itself without downtime. That’s not the usual way nowadays with containers and VMs, but it was very common before. It’s definitely doable with SQLite, and not very complex.

nijave · on May 10, 2022

Your app machine has to be overspecced so it can support running 2 copies of the app at once

ngrilly · on May 11, 2022

No. The number of processes requests in unchanged. They are just dispatched between 2 instances: the new one and the old one. Actually, all new requests are going to the new instance, and only the pending requests are processed by the old instance.

nijave · on May 12, 2022

Sure but most applications need a certain amount of memory regardless of the amount of requests they're servicing

ngrilly · on May 13, 2022

Agreed, RAM usage may be slightly elevated during the switch. But it doesn’t seem to be a big issue in practice. nginx for example is updating itself like this and it works fine.

throwaway894345 · on May 10, 2022

On the other hand, your application code is probably less reliable than Postgres or MySQL and now it can bring down your master, so failover is a more likely scenario. Probably not "worse", just "different".

tptacek · on May 10, 2022

I don't think so. Single-master, multi-reader is the most common clustering pattern for all these databases. If you lose the app server that's handling writes in any of those systems, you have the same problem. Meanwhile, when your app server is down, the sqlite database is just fine, and so is Litestream.

I'm not thinking very careful in answering this, but I think there's a subtlety I'm not capturing well here, and that it's not the case that this pattern has poorer fail-over than clustered Postgres. I think there are more things that can go wrong with Postgres.

btown · on May 10, 2022

IMO the benefit to n-tier is that you can have multiple instances on your app tier with a single-master multi-reader database tier, and that makes the system resilient to app server crashes.

For a real-world example, some time ago I wrote an admin tool as part of a production monolith, for a rare (but frequent-enough) internal use case where a user could put in a URL, the app server would fetch the remote HTML, process it, and save the results to a database. A few months later, we start getting weird error reports that the server is crashing. Sure enough, a malformed document could cause enough recursion to exceed the Python maximum recursion depth and crash the process.

If this had been the single process running server-side SQLite, even if multithreaded and/or concurrent (with e.g. something like gevent), the crash would have taken down our entire site for however long it would take to restart the process. But since there were other app server processes eager and ready to take ongoing requests, the only disrupted user was the one trying to use that admin tool, not all the other customers using the product at the time. Said user was confused, but was very glad they didn't disrupt the critical work our customers were doing at the time!

Of course, one size doesn't fit all, and there are many services that are limited enough in scope, working with known and battle-tested components, that this wouldn't be a problem. But if I make the conservative assumption that any process that is running code that I myself wrote, no matter how simple-seeming, has a much higher chance of segfaulting or otherwise crashing than a standalone professional database... then that late-night ping is much more likely to be something that can be handled the following morning.

v21 · on May 10, 2022

It's worth noting that "single-writer" refers to a single machine, not a single process. Multiple programs can write to the same SQLite DB at once (although contention can be a problem, etc etc). So, if that admin tool was running on the same machine plugged into that same SQLite file, it could crash and leave the main customer-facing services running fine.

pipe_connector · on May 9, 2022

Is simplified failover referring to the currently-in-beta async replicas? Or is there something else on the way that will make it easier to failover?

mrkurt · on May 10, 2022

There's nothing that's quite ready to look at for easy failover, but this is a big priority. We absolutely have to figure out failover during deploys. It's doable we just want to reduce the operational complexity as much as we can.

mtremsal · on May 10, 2022

I _think_ he's referring to the ease of use of streaming the WAL to S3 and reloading from it upon starting a new write "leader".