Give me /events, not webhooks

solatic · on Jan 8, 2022

Why are blog posts like these written as if they are the One True Way?

It's a question of state management. Sometimes, it's easier for the client to manage state. Sometimes, it's easier for the server to manage state. Sometimes, nobody really cares about holes in the state. Sometimes you need at-least-once delivery, maybe-once delivery, exactly-once delivery.

People can design communication architectures for the problems they actually have as long as they appreciate the trade-offs.

OP proposes that it's no big deal for long-polling to hold an open connection. At small scale, sure. At large scale, forget about it. It's a trade-off for not losing webhooks. But what if we don't care about dropped webhooks? If they're an optimization instead of a source of record? I guess OP is OK with that?

stickfigure · on Jan 8, 2022

Perhaps could be written better, but promulgating the idea of "you should consider /events instead of webhooks" is good. Most teams just do webhooks without considering /events.

IMO webhooks are overrated. They have trust issues (requiring signatures or shared secrets), they make code hard to test (because you have to mock the other end), and they bring up versioning issues (client can't advertise the version expected). When I consume webhooks I usually poll anyway and simply use the webhook notification that "Thing ID #123456 changed, poll it". Most of the time I'd be happier polling /events on a minutewise basis.

The bit in the article about long polling seems pointless. If you're accepting webhooks, you're accepting someone else's schedule and it's probably not that urgent. Everyone has a cron-equivalent, it's easy to poll statelessly.

solatic · on Jan 8, 2022

And polling has its own issues. You have no real control over how often clients will poll, so you need some kind of caching layer or some kind of anti-abuse mechanism that keeps track of API keys and returns 429 when a client polls too frequently (but still permit polling from clients that poll at a more respectful, slower rate). That caching or anti-abuse layer has its own engineering cost and represents its own trade-offs.

stickfigure · on Jan 8, 2022

You have no control over how often clients will poll even if you have webhooks. But it's easy enough to cache /events in cheap ephemeral storage, possibly even at the http level. Use Cloudflare if you want cheap. There shouldn't be much of an engineering cost here.

solatic · on Jan 8, 2022

Except that webhooks are often client-private. Consider, for example, GitHub repository webhooks, particularly for private repositories. Storing /events in a global CDN cache is a privacy nightmare. The GitHub API has strict usage quotas and I'm sure that the engineering effort for doing that at GitHub's scale is non-trivial.

stickfigure · on Jan 8, 2022

So you make the API /clients/CLIENTID/events. What's the issue?

Webhooks have engineering issues all their own, including job queues and failure notifications. Stashing events in a table and truncating it every now and then is relatively straightforward.

edem · on Jan 8, 2022

> Why are blog posts like these written as if they are the One True Way?

Because out of all the ways the OP knows this is the best for their use case and they are none the wiser. Happens all the time.

hrishi · on Jan 8, 2022

Because of how titles and attention spans work.

I've wondered this myself for a long time, before realising that titles like these get most of the attention, enough to rise to your feed.

If this was indeed 'Consider using /events', chances are you'll never see it.

achillean · on Jan 8, 2022

We provide both long-polling and webhooks but almost everybody chooses to use webhooks. From what I've seen, a reason is that a lot of other tools/ platforms that you want to integrate with don't support long-polling. They prefer to get notified when something happens instead of keeping a persistent connection.

tonyg · on Jan 8, 2022

A webhook is (ok, squint really hard) a bit like one direction of a TCP connection: state at the sender tracking what the receiver has seen, state at the receiver for rejecting duplicates, logic for retrying and acknowledging receipt.

The state management kind of sucks though, and it isn't sufficiently abstracted over, hence the original article.

So why can't we use actual TCP connections for long-lived relationships? Because kernels make TCP more fragile than it has to be. There's no particular reason TCP connections couldn't outlive process instances, machine reboots, etc., except for the way TCP is implemented in the kernel.

pixl97 · on Jan 8, 2022

> There's no particular reason TCP connections couldn't outlive

Are we forgetting about every awful firewall and piece of garbage middleware box out there?

tonyg · on Jan 8, 2022

I should add that the WebSub (previously PubSubHubBub) spec goes some way toward spackling over the gap between raw webhooks and what's needed for long-lived TCP-like relationships. https://www.w3.org/TR/websub/

dudeinjapan · on Jan 8, 2022

In TableCheck's API we do the same--both webhooks and events. Our clients (mostly Java/"enterprise") tend to choose polling for events.

We also retry webhooks with a backoff timer until we get an HTTP 200 response.

When any event has been successfully delivered (either by webhook or by a polling query) it is flagged as "delivered" and doesn't appear in subsequent polling, unless you add an include_delivered=true param.

tonyg · on Jan 8, 2022

Most of the time, we don't care about messages: we care about replicating (selected portions) of state among stateful entities. The language primitives we've settled on - variants on message passing - don't address this need directly enough. I've been exploring a programming model where state replication is first class. https://syndicate-lang.org/about/ Sorry about the prose style!

tommiegannert · on Jan 8, 2022

Nice! I think I'm working on the same problem, but attacking it from the storage system side (nothing public yet).

I want the abstraction of networked systems to be about where the data is available, rather than who passes which message where.

Are there more people working in this area?

kasperni · on Jan 8, 2022

Thanks, looks super interesting.

danjac · on Jan 8, 2022

I've been experimenting with webhooks using the WebSub protocol[1] for my podcatcher site https://jcasts.io to keep podcast RSS feeds up to date.

The protocol itself is OK (not particularly a fan of using GET to register a new feed, but whatever). However the Google pubsubhubbub service is quite unreliable: feeds often not pinging, with very little visibility (Google's service is something like 99% of WebSub feeds; the next biggest player, Superfeedr, was acquired by Medium a few years ago and has since fallen into disrepair and no longer answers support tickets). WebSub is a good idea but never really gained sufficient mindshare, and the fact that the only significant remaining player in this space is Google hardly inspires confidence in its long-term future.

There's a relatively new service, PodPing[2], which uses some blockchain protocol (Hive) to broadcast feed updates as events. You just connect to a node and listen for events. It works quite well from what I've seen, unfortunately it requires buy-in from the publishers and only a small % of feeds (and more importantly, very few popular feeds) support PodPing.

Ultimately went back to long-polling. I wish there was something better, it's wasteful and difficult to provide up-to-the-minute updates, but it just works.

[1] https://www.w3.org/TR/websub/

[2] https://podcastindex.org/

clone1018 · on Jan 9, 2022

I've never heard of WebSub until you mentioned it. Apparently I was so intrigued by it / your post that I ended up making a little service to provide an alternative to Google's pubsubhubbub. Got it mostly feature complete as of a couple of minutes ago: https://websubhub.com/

Thanks for sharing your struggle!

nyellin · on Jan 8, 2022

Devs prefer webhooks because it's easier to write edge triggered logic than to fetch the full state and calculate the diffs.

The webhook doesn't have to actually be edge triggered though, you just want it to look that way for the user.

I would compare this to tcp/udp vs ip. When writing apps it's convenient to have lower level protocols do the transmission and error handling for you.

I think about this a lot with robusta.dev where we run edge triggered webhooks and actions for Kubernetes. We're letting developers write edge triggered logic, but it's extremely desirable to provide higher level guarantees that triggers will run eventually even if the moment of transition was missed

wruza · on Jan 8, 2022

than to fetch the full state and calculate the diffs

It's rarely a thing. Long polling request usually includes an increasing reference point like last-id or last-event-time, based on which the event source filters events or, if there is none, decides to block until they happen. It's still the same event stream, but with a little persistent counter on a client's side. The idea is that if a client encounters a data loss, it doesn't beg their server to reset some "delivered" flags from yesterday. Iow, webhooks are stateful, event streams are usually stateless.

toomuchtodo · on Jan 8, 2022

Previous discussion: https://news.ycombinator.com/item?id=27823109

alexbouchard · on Jan 8, 2022

I'm all for /events and appreciate the platforms with good support. However, we live in a world where people build event-driven and serverless architecture. The use cases go beyond replication, and webhooks are here to stay.

The thing is, you can get the best of both worlds by using webhooks in conjunction with /events reconciliation. That might seem like a lot of work, but that's what tooling is for. Webhooks are complicated to handle reliably, but it's a problem that has good tools the same way sequin (and many others) is helping developers solve the replication problem.

For webhooks, hookdeck.com (disclaimer, I'm the founder) address entirely the problems stated and will soon offer automatic reconciliation (currently running our polling Beta on with Shopify API)

wruza · on Jan 8, 2022

Funny because in the current b2b fintech project I work at I designed all protocols with /events (long-poll), but later everyone asked for a webhook instead. And their API is full of webhooks. It feels like people who build these systems don't bother to implement event streams.

lloydatkinson · on Jan 8, 2022

What makes you think they don’t implement event streams? The webhook is just an entry point at the edge of their application. It could easily be writing to a queue or some message broker. It’s simply easier when doing eg serverless because it’s hard to keep that long lived connection for indefinite amounts of time versus simply exposing a stateless HTTP API.

wruza · on Jan 8, 2022

This is an interesting idea, but why don't they just use the outside service as a queue, because it works as one, and when it doesn't, this complete direction doesn't work either.

easier when doing eg serverless because it’s hard to keep that long lived connection for indefinite amounts of time versus simply exposing a stateless HTTP API

You mean "functions" or a similar thing? It may be the reason I think. I didn't know that serverless discourages using idle persistent connections.

lloydatkinson · on Jan 8, 2022

It’s not so much that it discourages it’s more of a case of it simply won’t work because most functions have a timeout and are stopped after a few minutes at most.

conradludgate · on Jan 8, 2022

I prefer the idea of using empty webhooks. The message body contains the id of what changed, and a status like "NAME_CHANGED", but not the data that changed. Then paired with the API to retrieve the real data.

Webhooks shouldn't be assumed to be reliable, and that should be clear in the API docs. You can perform your own lookup to these data APIs to confirm the source of truth in your own data

nsteel · on Jan 8, 2022

Strava has an API like that but combines it with an insanely low daily rate-limit which makes retrieving the changed data difficult for anything other than personal apps. But that's totally an issue with their brain-dead policy rather than the idea.

taion · on Jan 8, 2022

The argument here is correct as far as it goes, but the “yes but”s are somewhat severe.

For a minimal MVP deployment, as a client likely you’re not going to have anything like that Kafka setup, but instead have something like a single polling worker, which means you have similar issues with potentially falling behind if that worker can’t keep up with the data. By contrast, with webhooks, you can take advantage of your existing load balancing logic since you’re doing this for incoming HTTP calls anyway.

At scale it seems like you’d need to integrate some kind of ad hoc sharding to this `/events` API, as otherwise you have no ability to scale out the reader. Hopefully a non-issue with third-party API integrations, but there are limits on both ends.

gls2ro · on Jan 8, 2022

I think the answer is not so easy.

For example I would prefer webhooks over /events if the chance of something happening is low. It does not make a lot of sense to keep long pulling when chances of something being return are small.

I think a sane alternative is to have webhooks with ack for example. Where if the client does not ack the push, the API will retry a couple of times with exponential wait time.

Of course if the expectation is to have a lot of events it makes sense to have /events. But somehow I feel /events and webhooks solve different cases or at least as a client I have different expectations.

pictur · on Jan 8, 2022

Is it very difficult to set up a good event listening architecture? It doesn't seem very difficult from the outside, but I would like to read about bad experiences.

conradludgate · on Jan 8, 2022

A good event system is not hard, but a reliable one is hard. Reliability in making sure events are received, processed, not lost and not processed multiple times

alexbouchard · on Jan 8, 2022

That's right. In the context of pushed events, you have very little margin for error. It's definitely "solvable," but a bit part of the problem is that for most tech teams, it's not their bread and butter. Handling webhooks reliably is just overhead and work they aren't putting into their actual product. So you end up with a lot of not-so-reliable implementations.

lloydatkinson · on Jan 8, 2022

From my experience you’d usually have a message queue/broker for distributed and/or event driven architectures. Having incoming events is just a case of another producer in the system writing to the queue.

gremlinsinc · on Jan 9, 2022

Couldn't like...

X api have a cron that pings a webhook w/ a list of event_ids that haven't been 'claimed' and their timestamp of creation. The cron could easily maybe order the hooks by outstanding and ping those webhooks more that take longer to clear, and if there's some major bottleneck for a lengthy period...email/text the devs and let them know their access as been paused pending error review on their side.

e.g.

X pings Y ['events' => ['id', 'timetamp'], ['id' => '...', 'timestamp' => '...']] Y catches it, pops it onto a queue to process 1by1 or by batches. Y => pings X /fetchByIds ['id1', 'id2', 'id3'] Y => inserts data in db, pushes out any notifications, or other jobs.

Y doesn't catch it. X tries again in x mins. X pings Y, but the oldest timestamp > 24 hours, it's flagged as 'dead api', triggering notifications to stakeholders to fix their shit.

Saving headaches on both sides of the event/hooks.

Maybe I'm missing a few steps.. but I can't fathom polling ever being a good thing, except it does kinda work better than sockets in laravel livewire, but that's because it only polls while the tab is in focus, and then only at like 10 sec increments, which is better than leaving open thousands of socket connections.... but that's a totally different use case.

Mileage will vary on this, combining it with really good event sourcing and activity logging of all events should be considered to ensure data integrity - on both sides. The ability to replace events and see timelines can come in super handy when debugging.

mmcnl · on Jan 8, 2022

Ironically a webhook is closer to a true event than polling for events on the /events endpoint.

The idea described in this article seems entirely backwards.

mbbaig · on Jan 8, 2022

Seems like this events table should be used to make webhooks better instead using long polling.

It would be the choice of the implementors whether they want to their webhooks to be ephemeral or not. Sometimes the implementation doesn't call for a super reliable and repeatable webhook.

Although the events table seems like a good default method of webhook implementation.

timokoesters · on Jan 8, 2022

Unfortunately, mobile operating system like Android make the /events approach impossible because the process will just be killed for battery optimization. So pushing is definitely required for those.

The push does not need to contain a lot of data though, so a combination of both pushing and /events is possible too.

rognjen · on Jan 8, 2022

Web hooks are server to server. You can't use web hooks to an app.

wruza · on Jan 8, 2022

Well, exact terminology aside one may see web/app push as a webhook, it's the same principle of delivering messages, but with a different addressing scheme.

But I disagree with the Android part. Androids are the ones who spam my servers with websockets and other things when I restart them, because a user didn't bother to close a page in their browser. Never seen any iOS device doing that.

rzzzt · on Jan 8, 2022

You could use the server on the receiving end of the web hook to send a notification through the platform's push service. Apps can also be reached via server-sent events, a websocket connection or long polling.

rognjen · on Jan 8, 2022

Whenever I've made use of hooks I made basically an /events on my side and saved all of the data that was pushed.

Only after it was saved was it processed.

Besides being able to replay, have a log, etc. that also allowed me to progressively handle more events and then retroactively process them if needed.

kayodelycaon · on Jan 8, 2022

The one major issue I have with webhooks is how badly some places implement them.

One of the systems I deal with just throws a single http request over fence. No retry, no handling of errors, no guarantee their system actually sent it or not. And they rolled their own “basic auth” header. At least they allow ssl. I’m afraid to check the ciphers they support. Just going with the tried and true method of as long as I don’t look, there isn’t a problem.

I end up having to fall back to polling thousands of items one at a time every couple of hours. The only reason I use their webhooks at all is so I have a chance to get “real-time” updates on state changes.

At least it’s a step above sending csv over ftp.

ricardobeat · on Jan 8, 2022

> The advantage of long-polling over websockets is code reuse and simplicity

I really don’t get this. What is simpler in polling than websockets? They both establish a connection via the same route and protocols, only WS can “long poll” after the initial payload for free. There is little reason to replace that with a comet/long poll implementation which is undoubtedly more complex and prone to break.

And what about SSE? Now that is simple.

chrismorgan · on Jan 8, 2022

JMAP’s model (RFC 8620 <https://datatracker.ietf.org/doc/html/rfc8620>) is sound and very robust, though at the cost of not being the simplest possible in all situations. In essence:

• Every type of object has a state string. e.g. in the email domain (RFC 8621), you might have Email at state "12345" and Mailbox at state "678". Think Git commit IDs or SVN commit numbers. From an implementation perspective, you could use just one state string for absolutely everything if you wanted to, it’d just mean that you’d be querying for Mailbox changes when every Email came in; it’s an efficiency thing, not a correctness thing.

• You can ask the server what’s changed since a given time, e.g. Email/changes { sinceState: "12345" } (simplifying pseudosyntax used, the full JSON is longer), and it’ll tell you the IDs that have changed and the new state string, and then you can retrieve the changes for those IDs (in the same HTTP request if you want, using back-references).

• Even queries can have states (so long as the server implements that—all this state stuff is actually optional in practice, just an optimisation thing that you want for almost all types), so that you can efficiently update search results and only fetch records that match a query. (Think an email search getting new messages that match popping up at the top of the list, but without having had to download all the ones that didn’t match to see if they matched along the way.)

• It provides push (section 7) via both ~web hooks (section 7.2) and /events (section 7.3), telling the client “this type that you said you were interested in is now at state such-and-such” and leaving you to decide what to do, rather than the typical web hooks approach of sending a heavy object with some fields you use, some you don’t, and lacking some you wanted so that you probably end up talking to the server again anyway, for each and every entity that changed. (And for Event Source, the client can even send a Last-Event-ID so that the server might be able to skip even the first message of the states.) End result is that it generally takes a little more effort to implement a client, but is actually robust, allows linear processing, and can perform vastly better and more predictably.

• (In practice, that “don’t include any data, just type state metadata” principle PushSubscription goes for is a bit limited for mobile app purposes, where messages get delivered to apps but they may not be able to make requests; so Fastmail’s apps extend it to provide the necessary data about the emails—encrypted so the message broker can’t read it—to make email notifications work.)

JMAP’s model doesn’t apply cleanly to all domains, but it does work for most, and even where it doesn’t quite work I think it’s regularly still worth picking over and seeing if you can apply some of its concepts, because it has distilled some pretty solid practices in areas like Push.

One particularly aspect of the design of JMAP that will be a limitation in some domains is that it’s modelling objects, not events at all or even deltas all that much. JMAP is very thoroughly an object synchronisation protocol. You won’t get a customer.subscription.deleted event like in Stripe’s /events, but rather when it tells you that the CustomerSubscription state has changed and you ask which objects changed it’ll tell you {"destroyed": ["id"]}—or more likely {"updated": ["id"]} and the corresponding get (again, probably sent in the same HTTP request) will report that the field "deleted" has been set to true.

charrondev · on Jan 8, 2022

At $dayJob we do send out webhooks (and have extensive integration into Zapier to make it easier for people to create them).

Additionally though all of the events go into a time series DB that we use to drive our analytics (plus other events). In addition to configurable analytics dashboards in the product we have an API for querying them.

thrower123 · on Jan 8, 2022

The biggest problem I run into with webhook-based systems is that getting security exemptions to make them work is painful in some environments that we deploy into.

People are less paranoid about you making an outbound GET than being open to handle an incoming POST.

nikolay · on Jan 9, 2022

I can't agree more! Webhooks are unreliable and requires both parties to do a very serious job to avoid lost events! Having a feed is the most rational decision!

physicles · on Jan 8, 2022

I’m another dev who built a webhook solution and wishes he had built /events instead. Many of the reasons are similar those raised by the stripe dev who holds top comment in a earlier discussion: https://news.ycombinator.com/item?id=27823109

We also have the (mis?)fortune of needing to push a high volume of data to some subscribers. The tricks required here — gzipping request bodies, multiple push workers — significantly increase complexity and demands on the engineers writing the destination endpoint. It’s much simpler for me to just worry about making sure my own endpoint is fast enough.

To this I’ll add that while Postgres replication slots are pretty amazing, they come with two significant drawbacks (at least when used with 10.4+ logical replication):

1. AFAIK there is no way to specify a retention policy. This means that if a subscriber falls too far behind, the disk fills up with unread logs. If you’re using a SaaS database, this means the DB becomes completely unresponsive and you have to call support. This reason alone makes them too dangerous to use in prod.

2. The way you consume data from the replication slot is totally different from how you consume it in a query. This means maintaining two code paths.

barefeg · on Jan 8, 2022

There was a point about web hooks being ephemeral and the events not. One could argue that it’s easier to comply with GDPR with webhooks. The example shown in the post of a deleted record would have to be removed after some time if the user requests their data to be off the platform

nsteel · on Jan 8, 2022

In terms of GDPR/privacy, is there any issue when posting private data to a 3rd-party server (i.e. webhooks) vs having that 3rd-party query your server? Or can you just say that once a secure webhook subscription is established you are off the hook (pun not intentional)?