Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Here's apenwarr's description of the same data structure from 2009: https://apenwarr.ca/log/20091004 .

Here's a post to the git mailing list from Martin Uecker describing the same from 2005: https://lore.kernel.org/git/20050416173702.GA12605@macavity/ . From the tone of the email it sounds like he didn't consider the idea new at that point:

> The chunk boundaries should be determined deterministically from local properties of the data. Use a rolling checksum over some small window and split the file it it hits a special value (0). This is what the rsyncable patch to zlib does.

He calls it a merkle hash tree.

Edit: here's one that's one day earlier from C. Scott Ananian: https://lore.kernel.org/git/Pine.LNX.4.61.0504151232160.2763...

> We already have the rsync algorithm which can scan through a file and efficiently tell which existing chunks match (portions of) it, using a rolling checksum. (Here's a refresher: http://samba.anu.edu.au/rsync/tech_report/node2.html ). Why not treat the 'chunk' as the fundamental unit, and compose files from chunks?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: