> You don't need to be Facebook or Google to have more than one service in your ...

kelnos · on May 27, 2024

I still don't think people in the middle need JWTs.

If we're talking about a web session, time-limited randomly-generated session tokens that are stored in a DB still work fine. If you really need it, put a caching layer (memcached or redis or valkey or whatever) in front of it. Yes, then you've created cache invalidation problems for yourself, but it's still less annoying than JWT.

If we're talking about authenticating API requests, long-lived randomly-generated auth tokens stored in a database work fine, generally. (But allow your users to create more than one, and make rotation and revocation easy. Depending on your application, allowing your users to scope the tokens can also be a good thing to do.) Again, put a caching layer in front of your database once you get to the scale where you need it. You probably won't need it for a while if you're sending your reads to read-only replicas.

(Source: worked at Twilio for 10 years; we definitely eventually ran into scaling problems around our user/auth DB, and our initial one-auth-token-is-all-you-need setup was terrible for users, but these problems were fixed over time. Twilio does use JWTs for some things, but IMO that was unnecessary, and they created more headaches than they solved.)

I'm not saying no one ever needs JWTs, but I think they're needed in far fewer circumstances than most people think, even people who agree that JWTs should be looked upon with some skepticism. If you need to be able to log people out or invalidate sessions or disable accounts, then JWTs are going to create problems that are annoying to solve.

(One possibly-interesting solution for JWT-using systems that I haven't tried anywhere is to do the reverse: don't cache your user/auth database, but have a distributed cache of JWTs that have been revoked. The nice thing about JWTs is that they expire, so you can sweep your cache and drop tokens that have expired every night or whenever. Not sure how well this would work in practice, but maybe it's effective. One big problem is that now your caching layer needs to be fail-closed, whereas in a system where you're caching your user/auth DB, a caching layer failure can fall back to the user/auth DB... though that may melt it, of course. I also feel like it's easier to write logic bugs around "if this record is not found, allow" rather than "if this record is not found, deny".)

jupp0r · on May 27, 2024

"If we're talking about a web session, time-limited randomly-generated session tokens that are stored in a DB still work fine. If you really need it, put a caching layer (memcached or redis or valkey or whatever) in front of it. Yes, then you've created cache invalidation problems for yourself, but it's still less annoying than JWT."

You just (somewhat handwavingly) described what Google and Facebook are doing. You might not need to build this globally highly available distributed session store, JWTs might be an ok solution for your use case too (because you are not Google or Facebook) - or not. It depends on what your requirements are. AuthN across services is somewhat complex in any case, I don't think there is an easy way around it without making tradeoffs somewhere. JWTs are a great tool to consider here.

MrDarcy · on May 27, 2024

> If we're talking about a web session, time-limited randomly-generated session tokens that are stored in a DB still work fine

This works fine for a single service but you’re replying to a thread about the middle ground of multiple services. It’s an anti pattern to have every service talk to the same database just to authenticate every request.

By the time you add a caching layer you’re truly better off using an off the shelf oidc id provider and validating the id token claims.

a_random_canuck · on May 28, 2024

In my experience for medium sized services it’s still better to have everything talk to the same authentication database.

Postgres has insanely good read performance. Most companies and services are never going to reach the scale where any of this matters, and developer time is usually the more precious resource.

My advice is always, don’t get your dev team bogged down supporting all this complicated JWT stuff (token revocation, blacklisting, refresh, etc) when you are not Facebook scale / don’t have concrete data showing your service really truly needs it.

stickfigure · on May 28, 2024

Alternatively, just don't worry about token revocation and all that complicated stuff? So you have a window of 5 minutes (or whatever your access token expiry is) that you can't revoke - is that a big deal?

A simple JWT implementation isn't that complicated, but you have to accept some limitations.

DEADMINCE · on June 1, 2024

If it only adds disadvantages, better not to use it though.

j16sdiz · on May 28, 2024

+1

For mostly-read flow like authentication, a centralized database can scale really well. You don't even need postgres for that.

If you have mutable state, JWT can't help you anyway.

JWT start make sense only when you are doing other hyperscaler stuffs and you can reuse part of those architecture

PaulHoule · on May 28, 2024

Funny, people used systems like JWT in the late 1990s. Back then you couldn’t really trust the session mechanism in your language because inevitably these had bugs and would toss your cookies for “no reason at all”.

I was inspired by https://philip.greenspun.com/panda/ circa 2001 to develop a complete user management framework based on that kind of cookie which had the advantage over other systems that the “authentication module” it took to get authentication working in a new language was maybe 40-100 lines of code. Software like PHPNuke that combined second or third rate implementations of apps all in the same codebase was the dominant paradigm then, the idea that you could pick “best of breed” applications no matter what language you were using was radical and different.

I used the framework for 10+ projects, some of which got 350,000+ active users. As an open source project it was a complete wash. Nobody got interested in user management frameworks (as opposed to writing your own buggy, insecure and hard-to-use auth system in a hurry) until around 2011 or so when frameworks based on external services all of a sudden popped up like mushrooms. Seemed like the feature I was missing was “needs to depend on an external service that will get shut down with the vendor gets acquired”

andrewmcwatters · on May 28, 2024

> It’s an anti pattern to have every service talk to the same database just to authenticate every request.

Bullshit.

mynameisvlad · on May 28, 2024

Do you want to expand on that? Because having a single point of failure certainly seems like a horrible practice when that single point goes down.

andrewmcwatters · on May 28, 2024

You’re already talking to stateful systems to do anything meaningful. A in-memory cache on top of session retrieval is so trivial and adds so few microseconds that it’s imperceptible even at large volumes of traffic.

If you’re having trouble with that, you’ve got bigger issues. Any regular work queries will take longer, and so it’s not even a meaningful area of concern if you broke down a request from end to end on a flame graph.

mynameisvlad · on May 28, 2024

> You’re already talking to stateful systems to do anything meaningful.

Yeah, so? They don’t have to be talking to the same system, and in fact it was literally what you called bullshit to originally.

> If you’re having trouble with that, you’ve got bigger issues.

That does absolutely nothing to changing the fact that a SPoF is still an anti-pattern that should be avoided.

For that matter…

> A in-memory cache on top of session retrieval is so trivial and adds so few microseconds that it’s imperceptible even at large volumes of traffic.

Also does absolutely nothing to change that fact. You have done nothing to actually elaborate on why it’s totally not a horrendous idea to have everything communicate to the same database. Just because there’s a caching layer does not mean that fresh data wouldn’t be available if a SPoF goes down, which, once again, is the whole point here.

andrewmcwatters · on May 29, 2024

> That does absolutely nothing to changing the fact that a SPoF is still an anti-pattern that should be avoided.

This statement that doesn't mean anything.

mynameisvlad · on May 29, 2024

You keep on saying this without actually backing it up with anything.

If you want people to believe what you say, you have to actually explain why you think something. Just saying it does not somehow make it true.

andrewmcwatters · on May 29, 2024

Stop talking, you're annoying.

DEADMINCE · on June 1, 2024

He isn't. It's certainly annoying to see someone continually dodge supporting a claim they made though.

SgtBastard · on May 30, 2024

What an incredibly impolite way to respond to someone who is trying to tease out the point you continually failed to make.

Be better.

makeitdouble · on May 27, 2024

As you point out, in most use cases a random token will be fine and it all comes down to how and where it is stored.

But that also means that you can have JWTs that are used as "random token" for most of your app, cost to produce them isn't high, and only make use of the additional capacities for instance when

- when you want to check signatures (e.g. reject before hitting your application layer)

- store non sensitive base64 data that you want before restoring the session

Creating and handling JWT is only as costly and complicated as you want it to be, so there's IMHO enough flexibility to have light use with very few penalities for it.

devjab · on May 28, 2024

You don’t need jwts to pass internal permissions. We don’t, but we still extract claims from a jwt token at the beginning of a user flow. Then later we only use the claims to determine which resource a user has access to.

It’s not necessarily easier than just passing the jwt, but with our internal setups where when you first pass through the authorisation system, our traffic on your behalf is secure it doesn’t really warrant a reason to decode your token multiple times rather than simply passing your access permission claims.

We do still pass your jwt between isolated “products” where your access request doesn’t pass through dapr, but rather back through our central access gateway and then into the other “product”. A product is basically a collection of related services which are restricted to be a business component. Like a range of services which handles our solar plants, and another business component which handles our investment portfolios, and so on.

crabmusket · on May 27, 2024

> But allow your users to create more than one, and make rotation and revocation easy

It's shocking how often this advice isn't followed. We often see it with non-tech companies who nonetheless deliver services over the internet.

jupp0r · on May 27, 2024

"have a distributed cache of JWTs that have been revoked. The nice thing about JWTs is that they expire, so you can sweep your cache and drop tokens that have expired every night or whenever."

Every cache has TTL, so you just set the TTL of the entry to the expiration date of the token you are caching. No need for nightly cleanups.

rezonant · on May 28, 2024

I'm not sure cache was the right word in the parent post-- you don't want to use a cache (at least one with LRU/bounded size) to store revocation without a backing store, or else the revocation could get pushed out of the cache and become ineffective. The backing store (likely a DB) would require such cleanups once the revocation record is no longer relevant.

sturgill · on May 28, 2024

Potentially you take both at once: use something like DynamoDB as the storage layer that also supports TTL natively.

jupp0r · on May 28, 2024

I would challenge your assumption. Unless you absolutely need to have 100% durable, consistent revocations for some reasons, something like memcached is perfect here as the worst case scenario in case of a failure is a slight, temporary degradation in security without any visible user impact or operations nightmare (ie restoring backups). This assumes that your token lifetime is reasonably short (at least for access tokens), refresh tokens are a different story but only need to be tracked at the authn service, not globally.

rezonant · on May 28, 2024

If the revocation use case is soft, then totally fair. But if the application is potentially dangerous and the user says "Sign out all devices", I think that should be a deterministically successful operation. Similarly, if there is a compromised account in an organization, I'd like to be confident that revoking all credentials was successful.

Revocation of tokens can be done for a simple logout operation, in which case the stakes are low, but more often it is the "pull the fire alarm and get that user out", and in that case it should be reliable.

nicoburns · on May 28, 2024

> don't cache your user/auth database, but have a distributed cache of JWTs that have been revoked

My understanding was that this is the ENTIRE benefit of JWTs (over plain session token): they allow you go from a allowlist to a blocklist, which is more efficient at really large scales because you only have to store expired sessions (until their time-limit expires) rather than every session).

And if you're not doing this then there's no point in using JWTs (which will be the case for most people).

Are there any other benefits I'm missing?

andrewmcwatters · on May 28, 2024

> you only have to store expired sessions (until their time-limit expires) rather than every session).

I don’t know of any companies that even do this. As far as I know, most use cases store nothing, except for of course the client storing the response.

torginus · on May 28, 2024

Honestly, do you even need support for revoke? If you have a token whose lifetime can be measured in 2-3 minutes, I don't think the abuse potential is huge, especially when some other security measures are in place.

Thing is, token refresh service can be stateless, but adding a revoke service basically kills JWTs main advantage, since every time we check its validity, we need a query to see if its been revoked.

nine_k · on May 28, 2024

Revocation is needed because you want to disable access to an intruder in the very second you detect unauthorized access using a stolen token. Same for certain kinds of banned users who must lose access immediately.

But since such a revocation list is going to be short (usually 0 entries, dozens at worst), it's trivial to replicate across all the auth service nodes (which can as well be worker nodes) or keep it in Redis replicated per DC, with sub-millisecond lookup times.

Things get harder if you want a feature like logging out other sessions, or just an explicit logout on a shared computer (think about business settings: stores, pharmacies, post offices), you may have to have larger revocation lists. This may still not be a problem: a million tokens is a few dozen megabytes, again, a per-DC replicated Redis cluster would handle it trivially and very cheaply.

torginus · on May 28, 2024

I still feel like the need for revocation kills the simplicity of JWT and thus the reason for its existence.

I'm of a more gradual opinion regarding this - say you operate a movie streaming service and control access to movies via JWT. It's not a problem if an attacker has access for two more minutes than intended.

If you are talking to a single client, I think checking the remote IP address and encoding it in the token might work to see if the token is not stolen, but don't quote me on that.

g15jv2dp · on May 28, 2024

It's a complicated problem. I don't see why it should have a simple solution.

stickfigure · on May 28, 2024

> Revocation is needed because you want to disable access to an intruder in the very second you detect

I get that this has conceptual appeal, but I doubt this makes any difference in real life. Unless you have some very sophisticated infrastructure, it takes many minutes to discover the issue and then many more minutes even to decide what to do about it. A few extra minutes to cut off access is probably not going to make a big difference one way or another.

nine_k · on May 28, 2024

An intruder might be not a sophisticated black hat hacker. It could be somebody who picked up an unlocked phone or keyboard.

When I had a chance to design a token-based authn/authz system, we had two types of tokens, general access (with hours of expiration, mostly read-only access) and privileged access, with expiration time set to a minute or so. All auto-refreshed on use, all separately revokable.

stickfigure · on May 29, 2024

Sure, but isn't it still going to take you N minutes/hours/days to discover the violation? Does it make a material difference that you can revoke access this hot second as opposed to up-to-5-minutes when the token expires?

Seems to me that for most applications, the irrevocable 5-minute token seems "good enough".

withinboredom · on May 28, 2024

All you really need for revocation in a revocation service are two fields: user id + inb (issued not before) and a bloom filter.

To revoke a token:

1. issue a new token to the revoker that is issued at current time (if business rules require revoker to be logged in).

2. set user inb to current time - 1 second with a TTL of longest issue * 1.5.

3. Add user to bloom filter.

4. upload bloom filter to s3, every service downloads this every 5 minutes.

5. Then on request, check bloom filter. If the user id is in the bloom filter, check with revocation service that inb > issued time.

This is probably less than five hundred lines of code and pretty easy to maintain.

littlecranky67 · on May 28, 2024

Often overlooked middle-ground that vastly simplifies your revocation logic: Just have a single field "not-issued-before" timestamp assigned to each user account. Instead of revoking a single token, you have a "log out from all devices" logic - i.e. you revoke all tokens at once based on their "iat" claim (issued at). No need for revocation lists alltogether, you just make sure any tokens "iat" is never before the "not-issued-before" associated with the user. Sure, this is not as perfect UX as being able to revoke individual tokens, but token revocation in generall is something only a fraction of your uses is ever going to need.

withinboredom · on May 28, 2024

Yeah, this works very well. A nice "log me out of everywhere, including this device" link is often all you need on the settings page.

It also makes e2e testing very easy since you should be logged out after pushing that button.

littlecranky67 · on May 28, 2024

No you don't, thats why i.e. even big players like Amazon with their AWS Cognito service (OAuth/OpenID Connect) don't even support revoking access tokens (only refresh tokens).

pico303 · on May 28, 2024

How about invalidate the user’s refresh token and the public signing key, which forces everyone to refresh and then logs out the hacked account. If it’s really serious, lock the account before doing this so the user can’t login again.

But yeah, if you have a revoke service, might as well just use session keys.

Edit: typo

bcrosby95 · on May 28, 2024

Why not just stick your auth token in the cache. It's supposed to expire anyways.

Back in the day we used memcached for our primary store for all sorts of ephemeral things. Including user sessions.

withinboredom · on May 28, 2024

Items are evicted from caches all the time for non-expired reasons. Memcached, in particular has "slabs" (spaces for objects of a certain size) and once those slabs are full, items are evicted to make space for new items.

Terr_ · on May 29, 2024

> I still don't think people in the middle need JWTs.

Actual current example: Small company acquired another with a complementary product, but on an entirely different tech-stack and cloud-hosting.

Product owners want customers logged into Service X to be able to follow a link to Service Y and not log in again, and vice-versa.

DeathArrow · on May 28, 2024

>If we're talking about a web session, time-limited randomly-generated session tokens that are stored in a DB still work fine.

How is this better than JWT if we have 30 microservices called from front-end?

nicoburns · on May 28, 2024

You have to be pretty big before "store the session information in Redis" doesn't work anymore.

bitexploder · on May 28, 2024

And most of these session mechanisms are easy to work with as middleware in common web app frameworks making it pretty simple to stick with simpler sessions if everyone can get to the session store. Everyone way over complicates authn and sometimes barely even think about authorization. I have seen many a web app with poor JWTs implementation and abusable authz get broken. Sometimes the apps warranted the JWT implementation but it is a lot harder than many devs think.

zaat · on May 28, 2024

That was a battle I fought with some developer consultancy not long ago. I won't tell the whole story, but I will say that if you have issue with JWT tokens that are too big due to the number of groups each user have, you probably do need to use JWTs and you are most definitely doing it wrong and should educate yourself or bring a consultant who at least get the difference between authentication and authorization.

bitexploder · on May 29, 2024

I made a lot of money in my life because I knew the difference. Oh and XSS, paid the bills for a couple decades :)

jupp0r · on May 28, 2024

Or just have infrastructure that needs to validate the session in different parts of the continent (world).

seabrookmx · on May 28, 2024

As far as system load sure. Not so much uptime.. keeping your session in Redis creates a single point of failure. HA/Clustered Redis exists but definitely has some associated complexity.

keredson · on May 28, 2024

or you have crappy code that can only handle a dozen RPS. [facepalm]

nicoburns · on May 28, 2024

I mean, if anything this just means that session storage won't be your bottleneck.

DeathArrow · on May 28, 2024

>Thank you. This middle ground between hyperscaler infrastructure and super simple web apps is where most of my career has been spent, yet the recent trend is to pretend like there are only two possible extremes: You’re either Facebook or you’re not doing anything complicated.

100% this. I am tired of you don't need microservices, you don't need JWT, you don't need Kubernetes, you don't need ElasticSearch, you don't need IAM, you don't need Redis, you don't need Mongo and everything should stay in one SQL database.

Things are being used not because they exist, because people want to be fancy or because they don't have something better to do. Things are being used because they solve problems and do so with least effort possible.

qaq · on May 28, 2024

"Things are being used because they solve problems and do so with least effort possible" in an ideal world sure in the real world there are many factors that influence technical decisions often having nothing to do with actual problem being solved

zimpenfish · on May 28, 2024

Having worked at many places over the last 30 years, yes, there is definitely "resume-driven development" where people pick something they want to put on their resume to solve a problem regardless of its suitability to the task in hand.

There's also "blinker-driven development" where people pick the solution based on their own personal set of hammers rather than, again, something more suitable.

(There's loads of these though - e.g. "optimisation-driven development" where the solution MUST GO BRRRRR even if the problem could be fixed by Frank typing in "Yes" once a week. "GOF-driven development" where everything has to rigidly fit into a GOF pattern regardless of whether it actually does. "Go-driven development" where everything has to be a interface and you end up reading a method called Validate which calls an interface method Validate which calls an interface method Validate which calls an interface method Validate and you wake up screaming every morning because why just wtf why please help me pleasehelp)

DeathArrow · on May 28, 2024

If I'd find myself in a place where they do "GOF-driven development" or "Go-driven development" I'd search for another job ASAP.

I don't say what you describe doesn't happen, but it's my impression that most people try to adopt solutions that minimize costs and development time (which also translates to money). 99% of the time it's not "do the best thing to satisfy solve this problem" but it's "solve this problem as fast as possible, without adding additional costs and using as few developers as possible".

zimpenfish · on May 28, 2024

> "solve this problem as fast as possible, without adding additional costs and using as few developers as possible"

Agree with that but from my experience that's more like 20% of the time. The rest is the various kinds of bullshit development where people are padding resumes, having boss's pet hobby horse forced on them, latest shiny flimflam, etc.

(A decent chunk of that 30 years has been contracting and that tends to be at places with problems which might be biasing my sample set.)

qaq · on May 28, 2024

Well in big orgs people get shuffled around teams so even if you joined a team that is aligned with the way you feel things should be done you might end up in totally different env. after a period of time.

ggregoire · on May 28, 2024

> there is definitely "resume-driven development" where people pick something they want to put on their resume to solve a problem regardless of its suitability to the task in hand.

Or people in this industry are geeks and curious and like to try new stuff and technologies just for fun?

That's the case for the majority of people I've worked with.

zimpenfish · on May 28, 2024

> like to try new stuff and technologies just for fun?

Sure but do that at home or on POCs. Not on production code.

fragmede · on May 29, 2024

I can't afford to have an H100 at home.

sunshinerag · on May 28, 2024

chuckle ... how far did you go into the validate rabbit hole

zimpenfish · on May 28, 2024

Thankfully it was only 3 interfaces down.

The whole codebase is riddled with the same kind of layering but we do now have guidance about doing stuff like that ("DON'T") and a plan to simplify some of the worst offenders (like the multi-layer `Validate` hell hole.)

rbanffy · on May 28, 2024

> everything should stay in one SQL database.

At least on different schemas.

That and don't let one concern access data from another, or you'll have to coordinate schema changes between those different concerns.

kayo_20211030 · on May 28, 2024

JWT's really can span this middle ground. They're helpful in answering the who-are-you question without resorting to elaborate db work. Even middle-ground monoliths are often deployed across more than one independently operating web server (say, JVM processes) and JWT's ensure that each server answers the who-are-you question with the same answer - the code is the same, and although the process space is different on each web server, the answer is the same. So chained requests, with API.REQ.1 to Server 1 and API.REQ.2 to Server 2 will actually work. Maybe session mechanics will work, but what if you don't actually have a session and just a bunch of API requests?

andrewmcwatters · on May 28, 2024

Querying a database for a session id isn’t elaborate work. It’s also trivial because as TFA mentioned, literally every major web framework ecosystem with has a solution for this.

God, how hard is SELECT * WHERE …, seriously.

You need to share a session across websites? Wow! Connect to the database holding the sessions.

Boring.

jupp0r · on May 28, 2024

Ever heard of speed of light? If you really think that "just connect to the same db" was an easy solution to the problem you describe in the general sense then you haven't done so in a moderately complex system yet. It can be a good solution for a very limited set of circumstances, but that's about it.

andrewmcwatters · on May 28, 2024

It’s bullshit. A majority of high volume systems can do this just fine. This is just engineering wank.

Standard solution is query a session table on from a single location, and once you actually start to need to trim request time, it’s not even the first place you look.

jupp0r · on May 28, 2024

Assumptions you are making:

* everything is located physically close together (good luck reading from a DB table in Singapore from a service in Europe for every request)

* you have few enough client services that want to do this that the number of db connections does not become a problem (this is 100 by default for Postgres, you might need to tune this or deploy proxies, etc)

* high availability is already taken care of (you don't want that one DB server to bring down everything in case of a failure)

leshenka · on May 28, 2024

several comments up this branch we were talking about middle ground, remember

jupp0r · on May 29, 2024

Yes. It's super common for SaaS companies that are <1% the size of Google to have infrastructure in multiple regions.

turtlebits · on May 27, 2024

It's not a trend. Those on the extreme ends of the spectrum are always the most vocal.

rescbr · on May 27, 2024

Specially since the middle is way larger than people think.