Ignoring bulk change commits with Git blame (2019)

BlackFingolfin · on June 30, 2021

I heartily recommend using `tig blame` from https://jonas.github.io/tig/ -- it allows you to interactively navigate the blame history; navigate to the line of interest, press one key to see details of the commit that last touched it. Not the right one? Press `,` to move to before that commit. Moving past bulk changes and finding the actual origin of a line of code is usually a breeze with this tool.

I don't use tig much otherwise, but for this purpose I've not yet seen a better (and faster!) tool.

cassonmars · on June 30, 2021

This is a dangerous feature for git repository hosts like GitHub to support. It makes it all too easy for subtle vulnerabilities to be introduced and omitted from visible blame history, with reviewers presuming trust based on historic content. Example: codebase went through an audit, was approved, then someone sneaks in some subtle changes to some files under the guise of a bulk change, adds the commit to the ignore list, continues making more commits to other files. Pulling a report to see historic changes could result in the file made vulnerable being skipped because it appeared to occur before the last audit. Hard pass.

AceJohnny2 · on June 30, 2021

Meh.

In such a case, you'll find that the older code (skipping the bulk change commit) does not have the (mis)feature you're looking for, and you'll have to go the extra step of looking through the ignored commits.

It makes the case of finding changes in some commits a bit harder (but not hard), in exchange for reducing the noise from those same commits. Having been long hampered by ugly codebases that I didn't dare touch for fear of "blame" poisoning, I think this is a net positive overall.

AceJohnny2 · on June 30, 2021

I actually encountered a similar situation years ago, where I tried to bisect for a regression. Problem was, old code wasn't testable for my regression, and I had to apply a patch to it to be able to automatically test it. I kept getting nonsensical results out of the bisect however. I later realized that my patch to make old code testable introduced the very regression I was looking for!

lilyball · on June 30, 2021

`git blame` is not the tool you use to see if a file changed. It’s the tool you use to view per-line changes. If you want to see historic changes you just view the log, which is unaffected by this.

leriksen · on June 30, 2021

That's not a problem caused by git, it's a problem caused by undisciplined developers.

aitchnyu · on June 30, 2021

Black for Python compares the AST before and after formatting as an additional measure. Linters could ship with an option to verify if two inputs (past and current version of file) are same in behavior but different in formatting, import order etc.

jakub_g · on June 30, 2021

Simple solution: whenever a rev is skipped, add a UI warning somewhere: "3 revisions were skipped due to ignore-rev file"

yissp · on June 30, 2021

The one feature of perforce (which my current employer uses) that I miss going back to git is the "time lapse" view. It's like git blame but you also have a slider that can scroll through the revisions to a file. This is sorta similar to that, but I still think p4's version is still a better UX.

acemarke · on June 30, 2021

There's a very nice Git History extension for VS Code that will do pretty much what you're describing:

https://marketplace.visualstudio.com/items?itemName=pomber.g...

srijan4 · on June 30, 2021

Emacs' magit blame mode also supports recursive blaming - you can effectively step back and forward in time.

koyote · on June 30, 2021

The inferiority of git tooling is the first thing I noticed when I moved from perforce to git for large, old, commercial production code (when using git for personal projects, you're basically only using 1% of the features as you're typically the only committer).

I enjoy git's way of handling version control better than perforce but there's no denying that p4v is an amazing tool for browsing through code that has a lot of history.

sdesol · on June 30, 2021

This will be a feature that I will be bringing to Git navigation in the future. For example, if you go to

https://public-001.gitsense.com/insights/github/repos?p=comm...

and click on the eslintrc.json file, you'll see all the versions for this file for the last year. And to quickly iterate through them, you can click on the version number on the left hand side.

In the future, I want to create a slider, among other things to make navigating history insanely easy, since blame, as this blog post points out, can result in a lot of context being left out.

Disclaimer: I'm the creator of the tool that I linked to above

rendaw · on June 30, 2021

It took me a while to find but you can do `gitk file` and it shows the file and revisions that modified that file, and you can use arrow keys to move between revisions.

forrestthewoods · on June 30, 2021

Yup. P4 timelapse view is a killer feature.

grandinj · on June 30, 2021

What I do, is I run loop like

(1) git blame <file> (2) git blame <uninterestinghash>^ <file>

which tells blame to go back to before the hash and do the blame.

I run that loop repeatedly, picking the most recent hash each time (by date/time), until I get back to the change that interests me.

jacobmischka · on June 30, 2021

This seems strictly worse than the method described in the OP?

grandinj · on June 30, 2021

It's more flexible, in that I can ignore all sorts of other commits that have no bearing on the bug I'm chasing.

Effectively, I am doing a manual implementation of displaying a "timeline" of commits that apply that code.

But whatever works for you :-)

Liquid_Fire · on June 30, 2021

You may wish to look into whether your editor/IDE's git integration already offers this feature (I believe git-gui also has it). It's much more efficient than having to copy/paste hashes back and forth and scrolling through the file every time.

Tempest1981 · on June 30, 2021

Also discussed here a few days ago:

https://news.ycombinator.com/item?id=27643608

rektide · on June 30, 2021

current $job has been switching to prettier & among many other changes we've been doing the (one true wayl tabs to (ignoble cowardly top-down-force) spaces.

I added generating a .gitignorerevs file during these conversion to our process document. what sucks egg though is that there's no standard for what to name these files. there's no out of box support. there was a confusing nasty mess of a thread for how to modify vscode's ultra-popular git lens plugin to add a git ignore revs file but 70% of the advertised "works for me" threads were wrong. in multiple ways often. we never found how to open the gui screenshots folks were showing to modify blame arguments; we had to resort to json modification which scared the crap out of some weakling junior devs with shitty constitution. the use of ${workingDir} or whatever in the config never worked, as other people latter pointed out. it wasn't clear what we needed in the json. it was a bloody nightmare. just for the very first medium grade engineer I tried getting this working with.

holy shit I wish git would normalize a --ignore-revs-file so bad.

to those wringing pearls about how this is a vector for people to sneak changes in: tool up. build safeguards. holy shit the complaining & fear & uncertainty & doubt over basic stuff, as if we can't see these changes happening, as if everyone is a hapless idiot who can't notice... I fear the society of losers your attitude suggests. just get better. write tools to notice these changes. which you would already, if you look at your git pulls. I do.

WalterGR · on June 30, 2021

We need semantic diff.

AceJohnny2 · on June 30, 2021

Indeed, but parsing has remained a hard (to implement properly) problem [1], and you likely don't want a language-agnostic tool like git to be tightly coupled with a language parser.

Maybe the popularity of language-servers can help bring this forward.

[1] Even smart people have trouble with it: "Amazingly, surprisingly, counterintuitively, the indentation problem is almost totally orthogonal to parsing and syntax validation." http://steve-yegge.blogspot.com/2008/03/js2-mode-new-javascr...

WorldMaker · on June 30, 2021

I found great results using syntax highlighter token streams for diffs. In my PoC I was using Pygments. It's a great compromise for "almost semantic" diffs. Syntax highlighting tokenizers have great language support, are blazing fast (we use them constantly in real time in IDEs), and work far better in "degenerate" cases that don't entirely parse/compile yet such as work-in-progress code (again because we use them all the time in text editors).

https://github.com/WorldMaker/tokdiff

sdesol · on June 30, 2021

I was looking at https://github.com/github/semantic for providing semantic diffs a while back and I still think it would be a good fit.

By looking at the diff between trees, you can ignore a lot of the extra noise like indentations, spaces and other styling changes.

WalterGR · on June 30, 2021

> I ndeed, but parsing has remained a hard (to implement properly) problem

Indeed. But there’s a perfect one (for some definition of perfect) built into every compiler. It’s a shame that more compilers don’t do things like make the AST available.

.NET Compiler Platform (“Roslyn”) is a great counter-example.

pabs3 · on June 30, 2021

There is cregit:

https://github.com/cregit/cregit https://lwn.net/Articles/698425/

noway421 · on June 29, 2021

A friend of mine recently been mentioning that it is possible to hide bulk changes when introducing a linter to a repository. This is how! Thought it's very useful and not well known.

remirk · on June 30, 2021

If you're interested in who added, modified or deleted a line you could also use the git pickaxe.

> git log -S"Hello World"

shows you all commits that either introduced, or deleted 'Hello World'.

makeitdouble · on June 30, 2021

This looks like a worthwile change, helping in specific situations, when used sparsely.

Personally I see bulk changes as something that is legitimate change and should appear in the blame history, the same way refactorings are supposed to be functionnaly equivalent but we all know it’s not that easy and need to keep an eye on when they were done.

Having massive commits clutter history also looks like a good incentive to avoid massive commits in the first place.

_ugfj · on June 30, 2021

Or you can just run git blameall and see the complete history at once. http://1dan.org/git-blameall/ still have no idea why this is not more popular / core.

erik_seaberg · on June 30, 2021

I’m glad this tooling is improving, but it’s better not to need it. If a trusted contributor has a good reason to rewrite a bunch of code, that is the appropriate time to reformat it; the best bots don’t compare to anyone on my team.

gregoryl · on June 30, 2021

Having a look around, this seems a bit unsupported by the common git tools (github, gitlab, ado etc), but otherwise incredibly useful, going to push this into a repo we recently hammered with a `prettier` format!