> Not sure what the state the art is in searchable encryption for db indexes, but just trying to do stuff that requires a scan becomes untenable due to having to read and decrypt on the client to find it or aggregate it.
There are a lot of different approaches, but the one CipherSweet uses is actually simple.
First, take the HMAC() of the plaintext (or of some pre-determined transformation of the plaintext), with a static key.
Now, throw away most of it, except a few bits. Store those.
Later, when you want to query your database, perform the same operation on your query.
One of two things will happen:
1. Despite most of the bits being discarded, you will find your plaintext.
2. With overwhelming probability, you will also find some false positives. This will be significantly less than a full table scan (O(log N) vs O(N)). Your library needs to filter those out.
This simple abstraction gives you k-anonymity. The only difficulty is, you need to know how many bits to keep. This is not trivial and requires knowing the shape of your data.
I was reading your reply and started thinking, this sounds a lot like what I did to do encrypted search with Bloom Filters and indexes. I click on the first link and find the exact website I used when researching and building our encrypted search implementation for a health care startup. It worked fabulously well, but it definitely requires a huge amount of insight into your data (and fine-tuning if your data scales larger than your initial assumptions).
That's awesome that AWS has now rolled it into their SDK. I had to custom build it for our Node.JS implementation running w/ AWS's KMS infrastructure.
Are you the author of the paragonie website? The coincidence was startling. If so, I greatly thank you for the resource.
Edit
After going back and re-reading the blog post, looks like you are the author. Again thank you, you were super helpful .
Thanks for providing this, it is interesting vulnerability to read about.
In terms of JWT vs other ways of doing this, is there any evidence that JWTs are more vulnerable that other approaches? Clearly there are vulnerabilities is other approaches as well.
I buy the statement that bearer authentication JWTs are much worse than proof of possession JWTs, but are bearer authentication JWTs worse than other bearer authentication approaches? What data would you need to argue that position
> In terms of JWT vs other ways of doing this, is there any evidence that JWTs are more vulnerable that other approaches? Clearly there are vulnerabilities is other approaches as well.
Contrast JWTs with PASETO implementations when you make that sort of analysis.
i.e., pick any that support v3/v4 and try to attack them the same way that JWT implementations have been vulnerable, or worse ways: https://paseto.io
Thanks for sharing this. I do a lot of work in this area and I had not come across PASETO before. It is an exciting project.
The nonce is especially nice because it makes the token high entropy enough that if only the signature leaks an attacker can't brute force the full token. This isn't always true in OIDC JWTs.
> Packagist is mostly a graveyard of libraries that people have coded up for their exact niche use-case 8 years ago (with no updates since), with little flexiblity beyond that (like you would see for libraries in other ecosystems).
Could you provide some examples? I wrote a lot of PHP libraries over the years, and while some are definitely stale because nobody uses them (and thus I have no incentive to keep the lights on), the only time I see dead packages, they're actually forks of other open source software that people contributed exactly one commit to (to change the package name).
> One insurance company pays, they all benefit. Companies would simply wait for someone else to pay. There would need to be some kind of government agency forcing all of them to pay in.
Sure, with some caveats: You can scope it down to only very large companies very easily.
> They would pass the costs to customers. In essence, your suggestion leads to a Cyber tax collected by the IRS.
That's a bit oversimplified, I think. To the companies affected by the regulation, sure. Not every company would need to buy in. (At least, I hope not. Mom and pop shops aren't exactly flush with cash!)
> Next they have to divvy up the funding to pay OSS developers. Now you have the same problem you started with.
It's the spirit of the same problem, but the distribution is different.
Before, it's "make the US government funnel taxpayer dollars into OSS directly". Now it's "the US government forces megacorps to buy insurance, and the insurance companies figure out how to minimize risk by investing in the supply chain" one layer removed.
One reason why this might be better than the original version of the problem is that you can simply (but not easily; nothing in politics is ever easy) reproduce the same regulations and insurance business models in other countries, and the load is now balanced across the globe. Then the whims of individual countries' leadership is no longer a single point of failure.
In the abstract, you are correct. But the details matter.
Of course, I could be wrong. I'm not an expert on policy, economics, or law.
> What I don’t understand is why you are proposing a model (insurance) that doesn’t work in practice, and is susceptible to high levels of corruption. Why this would work differently in the case of open source.
The legal/regulation problems here are valid concerns, but the model is a bit different.
The primary corruption of insurance has a lot to do with their ability to deny paying for what they ought to cover. That's a problem. The incentives at play pretty much guarantee it will always be a problem, for which strong regulations are necessary. We don't have strong regulations in the USA for e.g., health insurance, so I can understand why the word "insurance" is unattractive.
> Why this would work differently in the case of open source.
Great question.
The very incentive that makes insurance highly corrupt is what I'm proposing be leveraged to benefit open source developers.
A hypothetical insurance company would want to minimize their downside (paying money out), in order to maximize profits, because that's the economical system we live in today. The model I'm proposing is that investing in "the supply chain" would provide resources to offset risk.
On the other side, companies will want to minimize their spend on insurance. An insurance provider may offer reduced rates for companies that demonstrate some measurable commitment to security and responsible data handling practices (a.k.a. not collecting data they don't need in the first place, in case a breach does occur).
This insurance provides a currently absent mechanism for security assessors to affect positive change that protects the rest of us even if the company doesn't want to actually put in the effort.
I've argued in a blog post [1] that we need to delineate between "open source developer" and "supplier". If we don't do that, calling thankless unpaid volunteers and hobbyists a "supply chain" is kind of insulting [2].
I don't believe that "identity verification" for F/OSS developers is a good idea. Suppliers? Sure. That can be a contract negotiation when you decide how much you pay for it.
Also, I don't think identity verification helps when your adversary is a nation state, which can just falsify government identification if it suits them.
Just because it can by beaten doesn’t mean making it harder isn’t useful. This person/team used a VPN. Masking your location is a big red flag for just dev work like this. These things could be exposed in UI.
People are so used to see artificial bureaucratic structures as more real than their real counterparts that they constantly invent such naive solutions. “Just make the gub'ment provide an official paper (with a stamp) that Joe Random Dude is a real developer, a father of two, not a fan of satanic metal music, and the project will be safe”.
The VPN is just part of the picture (sock puppet accounts complaining about speed of dev, no meaningful history of other contributions from the dev, no trusted "personal network" for the dev, etc) that in hindsight should have raised red flags.
If they constantly are on a VPN and not willing to disclose a real location or IP then I fail to see why they should be trusted when they don’t provide anything trustworthy themselves.