I don't get the point. Point it at your relevent files ask it to review discuss the update refine it's understanding and then tell it to go.
I have found that more context comments and info damage quality on hard problems.
I actually for a long time now have two views for my code.
1. The raw code with no empty space or comments.
2. Code with comments
I never give the second to my LLM. The more context you give the lower it's upper end of quality becomes. This is just a habit I've picked up using LLMs every day hours a day since gpt3.5 it allows me to reach farther into extreme complexity.
I suppose I don't know what most people are using LLMs for but the higher complexity your work entails the less noise you should inject into it. It's tempting to add massive amounts of xontext but I've routinely found that fails on the higher levels of coding complexity and uniqueness. It was more apparent in earlier models newer ones will handle tons of context you just won't be able to get those upper ends of quality.
Compute to informatio ratio is all that matters. Compute is capped.
> I have found that more context comments and info damage quality on hard problems.
There can be diminishing returns, but every time I’ve used Claude Code for a real project I’ve found myself repeating certain things over and over again and interrupting tool usage until I put it in the Claude notes file.
You shouldn’t try to put everything in there all the time, but putting key info in there has been very high ROI for me.
Disclaimer: I’m a casual user, not a hardcore vibe coder. Claude seems much more capable when you follow the happy path of common projects, but gets constantly turned around when you try to use new frameworks and tools and such.
Agreed, I don't love the CLAUDE.md that gets autogenerated. It's too wordy for me to understand and for the model to follow consistently.
I like to write my CLAUDE.md directly, with just a couple paragraphs describing the codebase at a high level, and then I add details as I see the model making mistakes.
Setting hooks has been super helpful for me, you can reject certain uses of tools (don’t touch my tests for this session) with just simple scripting code.
Git lint hook has been key. No matter how many times I told it, it lints randomly. Sometimes not at all. Sometime before rubbing tests (but not after fixing test failures).
> 1. The raw code with no empty space or comments. 2. Code with comments
I like the sound of this but what technique do you use to maintain consistency across both views? Do you have a post-modification script which will strip comments and extraneous empty space after code has been modified?
As I think more on how this could work, I’d treat the fully commented code as the source of truth (SOT).
1. SOT through a processor to strip comments and extra spaces. Publish to feature branch.
2. Point Claude at feature branch. Prompt for whatever changes you need. This runs against the minimalist feature branch. These changes will be committed with comments and readable spacing for the new code.
3. Verify code changes meet expectations.
4. Diff the changes from minimal version, and merge only that code into SOT.
1. Run into a problem you and AI can't solve.
2. Drop all comments
3. Restart debug/design session
4. Solve it and save results
5. Revert code to have comments and put update in
If that still doesn't work:
Step 2.5 drop all unrelated code from context
> I have found that more context comments and info damage quality on hard problems.
I'm skeptical this a valid generalization over what was directly observed. [1] We would learn more if they wrote a more detailed account of their observations. [2]
I'd like to draw a parallel to another area of study possibly unfamiliar to many of us. Anthropology faced similar issues until Geertz's 1970s reform emphasized "thick description" [3] meaning detailed contextual observations instead of thin generalization.
[1]: I would not draw this generalization. I've found that adding guidelines (on the order of 10k tokens) to my CLAUDE.md has been beneficial across all my conversations. At the same time, I have not constructed anything close to study of variations of my approach. And the underlying models are a moving target. I will admit that some of my guidelines were added to address issues I saw over a year ago and may be nothing more than vestigial appendages nowadays. This is why I'm reluctant to generalize.
[2]: What kind of "hard problems"? What is meant by "more" exactly? (Going from 250 to 500 tokens? 1000 to 2000? 2500 to 5000? &c) How much overlap exists between the CLAUDE.md content items? How much ambiguity? How much contradiction?
IMO within the documentation .md files the information density should be very high. Higher than trying to shove the entire codebase into context that is for sure.
You deffinetly don't just push the entire code base. Previous models required you to be meticulous about your input. A function here a class there.
Even now if I am working on REALLY hard problems I will still manually copy and paste code sections out for discussion and algorithm designs. Depends on complexity.
This is why I still believe open ai O1-Pro was the best model I've ever seen. The amount of compute you could throw at a problem was absurd.
Genuinely curious — how did you isolate the effect of comments/context on model performance from all the other variables that change between sessions (prompt phrasing, model variance, etc)? In other words, how did you validate the hypothesis that "turning off the comments" (assuming you mean stripping them temporarily...) resulted in an objectively superior experience?
What did your comparison process look like? It feels intuitively accurate and validates my anecdotal impression but I'd love to hear the rigor behind your conclusions!
I was already in the habit of copy pasting relevent code sections to maximize reasoning performance to squeeze earlier weaker models performance on stubborn problems. (Still do this on really nasty ones)
It's also easy to notice LLMs create garbage comments that get worse over time. I started deleting all comments manually alongside manual snippet selection to get max performance.
Then started just routinely deleting all comments pre big problem solving session. Was doing it enough to build some automation.
Maybe high quality human comments improve ability? Hard to test in a hybrid code base.
I have found that more context comments and info damage quality on hard problems.
I actually for a long time now have two views for my code.
1. The raw code with no empty space or comments. 2. Code with comments
I never give the second to my LLM. The more context you give the lower it's upper end of quality becomes. This is just a habit I've picked up using LLMs every day hours a day since gpt3.5 it allows me to reach farther into extreme complexity.
I suppose I don't know what most people are using LLMs for but the higher complexity your work entails the less noise you should inject into it. It's tempting to add massive amounts of xontext but I've routinely found that fails on the higher levels of coding complexity and uniqueness. It was more apparent in earlier models newer ones will handle tons of context you just won't be able to get those upper ends of quality.
Compute to informatio ratio is all that matters. Compute is capped.