Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For the first two, git is based on content addressable storage, so it makes sense that anything that is ever public will never disappear.

I can sympathize with someone who gets bit by it, as it might not have occurred to them, but it’s part of the model.

The third strikes me as counter-intuitive and hard to reason about.

P.S. If you publish your keys or access tokens for well known services to GitHub and you are prominent enough, they will be found and exploited in minutes. The idea that deleting the repository is a security measure is not really worth taking seriously.



> For the first two, git is based on content addressable storage, so it makes sense that anything that is every public will never disappear.

this isn't quite right

content addressable storage is just a mean of access it does

- not imply content cannot be deleted

- not imply content cannot be access managed

you could apply this to a git repo itself (like making some branches private and some not) but more important forks are not git ops, they are more high level github ops and could very well have appropriate measurements to make sure this cannot happen

e.g. if github had implemented forks like a `git clone` _non of this vulnerabilities would have been a thing_

similar implemented different access rights for different subsets of fork networks (or even the same git repo) technically isn't a problem either (not trivial but quite doable)

and I mean commits made to private repositories being public is always a security vulnerability no matter how much github claims it's intended


You're right that I shouldn't have given the impression that content addressed storage means as a technical matter that public content must never disappear. The phrasing was a bit sloppy. GitHub could, as a technical matter, choose to hide content that had previously been made public.

Nonetheless, given that GitHub exists to facilitate both anonymously pulling the entire history of the repository, and given that any forks would contain the full contents of that repository, it is very natural that GitHub would take the "once public always public" line.

> and I mean commits made to private repositories being public is always a security vulnerability no matter how much github claims it's intended

I specifically said the third use case was different, because it is the one that doesn't involve you explicitly choosing to publish the commits that contain your private information. I did not and would not defend GitHub on that point.


> it is very natural that GitHub would take the "once public always public" line

I don’t think that follows at all. Purging hashes without a link to a commit/repository would be pretty natural.


I agree the 3rd is by far the worst of the offenders. But even the first two should have more visibility. For example, by notifying users during deletion of forked repos that data will still be available.

The exact UX here is debatable, but I don't think security warnings buried in the docs is enough. They should be accounting for likely misunderstandings of the model.


Even if it wasn't forked, it could be cloned. Should that be part of the warning?

I wouldn't mind a disclaimer when you delete a repository that any information that repository ever contained is likely to have already been downloaded and stored. Per the comment I added, I'm not sure it would really help that much, but it would not be harmful.


> Should that be part of the warning?

It couldn't hurt, but that isn't the misunderstanding I'm worried about.

As described in the first example of the article, you can make a fork, commit to it, delete your entire fork, and yet the data will still be accessible via the parent repo, even though no one ever forked or cloned or saw your fork. That is not intuitive at all.

You can say "Well just consider any data that has ever been public compromised forever", and indeed you should, but this behavior is still surprising and could bite devs even if they know they should follow the advice in that quote.

Consider a situation like this...

Dev forks, accidentally pushes a secret or some proprietary code in a commit, and immediately deletes the fork. They figure it was only up for a very short time, now it's gone, risk someone saw it is low. They don't bother rotating, because that would be a major operational pain (and yes, it shouldn't be, but for many orgs it is).

Is this dev making a mistake? Of course. That's not good security thinking. But their assessment of the risk being low might actually be correct if their very reasonable mental model of deletion were correct. But the unintuitive way GH works means that the actual risk is much higher than their reasoning led them to believe.


> It couldn't hurt, but that isn't the misunderstanding I'm worried about.

I think lots of warnings lead to people ignoring the warnings. So it could hurt by making people less aware of other warnings.


> As described in the first example of the article, you can make a fork, commit to it, delete your entire fork, and yet the data will still be accessible via the parent repo, even though no one ever forked or cloned or saw your fork. That is not intuitive at all.

But isn't that only the third vulnerability, that private forks are implicitly made public?

As I said, I won't defend that decision.


> For the first two, git is based on content addressable storage, so it makes sense that anything that is ever public will never disappear.

No one can, with a straight face, say that they don’t restrict access because “this is just how the technology works”. Doesn’t matter if it is content addressable or an append-only FS or whatever else.

Even for some technology where the data lives forever somewhere (it doesn’t according to Git; GitHub has a system which keeps non-transitively referenced commits from being garbage collected), the non-crazy thing is to put access policy logic behind the raw storage fetch.


> git is based on content addressable storage, so it makes sense that anything that is every public will never disappear.

No. That doesn't make sense. It only sounds vaguely plausible at first because content addressable storage often means a distributed system where hosting nodes are controlled by multiple parties. That's not the case here, we're only talking about one host.

Imagine we were talking about a (hypothetical) NetFlix CDN where it's content addressed rather than by UUID. Would anyone say "they forgot to check auth tokens for Frozen for one day, therefore it makes sense that everyone can watch it for free forever"?


Since Netflix neither allows anonymous users to fully download Frozen without DRM, nor allows authorized users to upload derivative works that are then redistributed to the public, I think there may be some relevant differences here.


They do remove content when their licence expires, though. So imagine instead Netflix allowing users to find and watch expired series by hash, then telling the copyright owners they can't fully delete the series because something something content-addressing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: