The purchasing itself can important for it to jump on a great price. Maybe it finds what you're looking for at 1a while you're sleeping for example. Also if this were a business and you were going to resell it the AI could also create the listing as soon as the item is purchased.
This reminds me so much of an old World of Warcraft addon i used in 2005-6 or so… I believe it was called ‘bottom feeder’ or something.
Basically, you would leave your character logged in sitting at the auction house. It would observe auctions for a while, and generate pricing data and sales data. Then, you would enable automatic mode, and it would automatically bid/buy any item that someone put up for sale if the price was much lower than normal.
You would leave it running overnight, or whatever, then come back to go pick up all the items it bought, and then you would go back to the auction house and sell all your items you bought at the correct price.
Basically, you would see buy auctions created by people who didn’t realize what the correct price should be and sold too cheaply. Since this was an automated system, you could beat any human to take advantage of the deal.
I made a ton of in game currency doing this.
After a few months they changed the auction rules to prevent this… add-ons could no longer directly bid on items, and you had to sit there and click “buy” whenever the script found a good deal. This severely limited the amount you could make with the script.
Basically this mirrors the eBay timeline, with the same reasons I am guessing… eBay (like WoW) doesn’t want bots collecting arbitrage.
Yes! Sounds fun to me, not sure what you're getting at, it's not supposed to be a business idea!
Just a tool I would enjoy having because I sometimes like to buy used things.
This is just "buy low, sell high" but automated. It is no different than what many humans do every single day, just at a much faster clip and with better processing power. Used car dealerships are a great example. If you think its dumb that humans try to find price mismatches in order to make money...well you may hate the idea of capitalism, which is probably a fair take.
It's definitely real that a lot of smart productive people don't get good results when they use AI to write software.
It's also definitely real that a lot of other smart productive people are more productive when they use it.
These sort of articles and comments here seem to be saying I'm proof it can't be done. When really there's enough proof it can be that you're just proving you'll be left behind.
I agree with the idea that true agentic AI is far from perfect and is overused in a lot of low or negative ROI contexts... I'm not convinced that where the ROI is there, even if the error rate is high, that it isn't still worthwhile.
Augmented coding as Kent beck puts it is filled with errors but more and more people are starting to find to be a 2x+ improvement for most cases.
People are spending too much time arguing that the the extreme hype is extremely hyped and what can't be done and aren't looking at the massive progress in terms of what can be done.
Also no one I know uses any of the models in the article at this point. They called out a 50% improvement in models spaced 6 months apart... that's also where some of the hype comes from.
I stumbled into Agentic Coding in VS Code Nightlys with co-pilot using Claude Sonnet 4 and I've been silly productive. Even when half my day is meetings, you wouldn't be able to tell from my git history.
My thinking now is removed from the gory details and is a step or two up. How can I validate the changes are working? Can I understand this code? How should it be structured so I can better understand it? Is there more we can add to the AI conventions markdown in the repo to guide the Agent to make fewer mistaken assumptions?
Last night I had a file with 38 mypy errors. I turned it over to the agent and went and had a conversation with my wife for 15 minutes. I came back, it summarized the changes it made and why, I debated one of the changes with it but ultimately decided it was right.
Mypy passed. Good to go.
I'm currently trying to get my team to really understand the power here. There's a lot of skeptics and the AI still isn't perfect and people who are against the AI era will latch onto that as validation but it's exactly opposite the correct reaction. It's really validation because as a friend of mine says
"Today is the worst day you will have with this technology for the rest of your life."
> AI discourse would be more effective if we could all see the actual work one another is doing with it
Yes, this is a frequent problem both here and everywhere else. The discussions need to include things like exact model version, inference parameters, what system prompt you used, what user prompt, what code you gave it, what exactly it replied and so much more details, as currently almost every comment is "Well, I used Sonnet last week and it worked great" without any details. Not to mention discussions around local models missing basic stuff like what quantization (if any) and what hardware you're running it on. People just write out "Wow fast model" or stuff like that, and call it a day.
Although I understand why, every comment be huge if everyone always add sufficient context. I don't know the solution to this, but it does frustrate me.
There's many examples of exactly what you're asking for, such as Kenton Varda's Cloudlfare oauth provider [1] and Simon Willison's tools [2]. I see a new blog post like this with detailed explanations of what they did pretty frequently, like Steve Klabnik's recent post [3], which while it isn't as detailed has a lot of very concrete facts. There's even more blog posts from prominent devs like antirez who talk about other things they're doing with AI like rubber ducking [4], if you're curious about how some people who say "I used Sonnet last week and it was great" are working, because not everyone uses it to write code - I personally don't because I care a lot about code style.
Maybe I should have been more specific, I was talking specifically about discussions and comments on forums like HN and r/localllama, not that people who are writing blogposts aren't specific enough in their blogposts.
> The discussions need to include things like exact model version, inference parameters, what system prompt you used, what user prompt, what code you gave it, what exactly it replied and so much more details, as currently almost every comment is "Well, I used Sonnet last week and it worked great" without any details...Not to mention discussions around local models missing basic stuff like what quantization (if any) and what hardware you're running it on.
While I agree with "more details", the amount of details you're asking for is ... ridiculous. This is a HN comment, not a detailed study.
I'm not asking for anything, nor providing anything as "a solution", just stating a problem. The second paragraph in my comment is quite literally about that.
I feel like that would get tiresome to write, read, and sort through. I don't like everyone's workflow, but if I notice someone making a claim that indicates they might be doing something better than me, then I'm interested.
Maybe keeping your HN profile/gist/repo/webpage up to date would be better.
I don’t know about fixing python types, but fixing typescript types can be very time consuming. A LOT of programming work is like this —- not solving anything interesting or difficult, but just time-consuming drudgery.
These tools have turned out to be great at this stuff. I don’t think I’ve turned over any interesting problems to an LLM and had it go well, but by using them to take care of drudgery, I have a lot more time to think about the interesting problems.
I would suggest that instead of asking people to post their work, try it out on whatever bullshit tasks you’ve been avoiding. And I specifically mean “tasks”. Stuff where the problem has already been solved a thousand times before.
For me comments are for discussions, not essays - from my perspective you went straight into snark about the parent's coding abilities, which kinda kills any hope of a conversation.
I trust it more with Rust than Python tbh, because with Python you need to make sure it runs every code path as the static analysis isn't as good as clippy + rust-analyzer.
I agree, had more luck with various models writing Rust than Python, but only in the case where they have tools available so one way or another it can run `cargo check` and see the nice errors, otherwise it's pretty equal between the two.
I think the excellent error messages in Rust also help as much humans as it does LLMs, but some of the weaker models get misdirected by some of the "helpful" tips, like some error message suggest "Why don't you try .clone here?" when the actual way to address the issue was something else.
That's true typed languages seem to handle the slop better. One thing I've noticed specifically with rust is that agents tend to overcomplicate things though. They tend to start digging into the gnarlier bits of the language much quicker than they probably need to.
Whats your workflow? Ive been playing with Claude Code for personal use. Usually new projects for experimentation. We have Copilot licenses through work so I've been playing around with VS Code agent mode for the last week. Usually using 3.5, 3.7 Sonnet or 04-mini. This is in a large Go project. Its been abysmal at everything other than tests. I've been trying to figure out if I'm just using the tooling wrong but I feel like I've tried all the "best practices" currently. Contexts, switching models for planning and coding, rules, better prompting. Nothings worked so far.
Switch to using Sonnet 4 (it's available in VS Code Insiders for me at least). I'm not 100% sure but a Github org admin and/or you might need to enable this model in the Github web interface.
Write good base instructions for your agent[0][1] and keep them up to date. Have your agent help you write and critique it.
Start tasks by planning with your agent (e.g. "do not write any code."), and have your agent propose 2-3 ways to implement what you want. Jumping straight into something with a big prompt is hit or miss, especially with increased task complexity. Planning also gives your agent a chance to read and understand the context/files/code involved.
Apologies if I'm giving you info you're already aware of.
This is exactly what I was looking for. Thanks! Im trying to give these tools a fair shot before I judge them. Ive had success with detailed prompts and letting the agent jump straight in when working on small/new projects. Ill give more planning prompts a shot.
Do you change models between planning and implementation? I've seen that recommended but it's been hard to judge if that's made a difference.
Sometimes I do planning in stronger models like Gemini 2.5 Pro (started giving o3 a shot at this the past couple days) with all the relevant files in context, but often times I default to Sonnet 4 for everything.
A common pattern is to have the agent write down plans into markdown files (which you can also iterate on) when you get beyond a certain task size. This helps with more complex tasks. For large plans, individual implementation-phase-specific markdown files.
Maybe these projects can provide some assistance and/or inspiration:
I really don't get it. I've tested some agents and they can generate boilerplate. It looks quite impressive if you look at the logs, actually seems like an autonomous intelligent agent.
But I can run commands on my local linux box that generate boilerplate in seconds. Why do I need to subscribe to access gpu farms for that? Then the agent gets stuck at some simple bug and goes back and forth saying "yes, I figured out and solved it now" and it keeps changing between two broken states.
The rabid prose, the Fly.io post deriding detractors... To me it seems same hype as usual. Lots of words about it, the first few steps look super impressive, then it gets stuck banging against a wall. If almost all that is said is prognostication and preaching, and we haven't seen teams and organizations racing ahead on top of this new engine of growth... maybe it can't actually carry loads outside of the demo track?
It can be useful. Does it merit 100 billion dollar outlays and datacenter-cum-nuclear-powerplant projects? I hardly think so.
What commands/progs on your local Linux box? Would love to be able to quantify how inaccurate the LLMs are compared to what people already use for their boilerplate stuff.
I've found the agents incredibly hit and miss. Mostly miss. The likes of Claude Code occasionally does something surprising and it actually works (usually there's a public example it's copied wholly when you research the code it gave you, especially for niche stuff), but then the rest of the time you spend hours wrestling it into submission over something you could do in minutes, all whilst it haemorrhages context sporadically. Even tried adding an additional vector database to the likes of Claude Code to try and get around this, but it's honestly a waste of time in my experiences.
Is it "useless"? For me, yes, probably. I can't find any valid use for an LLM so far in terms of creating new things. What's already been done before? Sure. But why an LLM in that case?
The strangest thing I've seen so far is Claude Code wanting a plugin to copy values from a metadata column in WordPress to then read, which is triggered by a watcher every five minutes—instead of just reading the value when relevant. It could not be wrangled into behaving over this and I gave up.
Took me 2 minutes to do the whole thing by hand, and it worked first try (of course—it's PHP—not complicated compared to Verilog and DSP, at which it is spectacularly bad in its output).
It does very odd things in terms of secrets and Cloudflare Workers too.
The solutions it gives are frequently nonsensical, incomplete, mixes syntax from various languages (which sometimes it catches itself on before giving you the artifact), and almost always wholly in how inefficient the pointless steps to a simple task are.
Giving Claude Code tutorials, docs, and repos of code is usually a shitshow too. I asked their customer support for a refund weeks ago and have heard nothing. All hype and no substance.
I can see how someone without much dev experience might be impressed by its output, especially if they're only asking it to do incredibly simplistic stuff, for which there's plenty of examples and public discourse on troubleshooting bad code, but once you get into wanting to do new things, I just don't see how anyone could think this is ever going to be viable.
I mucked around with autonomous infrastructure via Claude Code too, and just found that it did absolutely bizarre things that made no sense in terms of managing containers relative to logs, suggesting configurations et al. Better off with dumb scripts with your env vars, secrets et al.
make sure it writes a requirements and design doc for the change its gonna make, and review those. and, ask it to ask you questions about where there's ambiguity, and to record those responses.
when it has a work plan, track the workplan as a checklist that it fills out as it works.
you can also atart your conversations by asking it to summarize the code base
My experiments with copilot and Claude desktop via mcp on the same codebase suggest that copilot is trimming the context much more than desktop. Using the same model the outputs are just less informed.
> Is there more we can add to the AI conventions markdown in the repo to guide the Agent to make fewer mistaken assumptions?
Forgive my ignorance, but is this just a file you're adding to the context of every agent turn or this a formal convention in the VS code copilot agent? And I'm curious if there's any resources you used to determine the structure of that document or if it was just a refinement over time based on mistakes the AI was repeating?
I just finished writing one. It is essentially the onboarding doc for your project.
It is the same stuff you'd tell a new developer on your team: here are the design docs, here are the tools, the code, and this is how you build and test, and here are the parts you might get hung up on.
In hindsight, it is the doc I should have already written.
Wow a lot of criticism. I'm considering a similar business. I think this is too expensive when printing this is so easy these days. But charging some small about per printable coloring book would be very attractive.
The printing aspect of this wasn't too easy... I wish I could charge less but as it stands (especially with surprising API costs) I'm barely making a profit on this.
have something similar in the building pipeline, yeah the API costs caught me off-guard too. I knew it was going to be expensive but this is still pretty high
Not all fiction is equivalent. When you are actively world building making explicit revisions is useful for those trying to follow along. If you'd prefer a different word you should say so.
I think that’s because the change wasn’t between TOS and TNG. It was between TOS series and TOS films; new style Klingons were in Star Trek 3 (1984) and Star Trek 4 (1986) before TNG started.
Imagine for the sake of argument TOS and TNG are 99% aligned on storytelling and need a patch to make the 1% agree, and that patch is provided in TNG retroactively.
Versus a new show comes along that is 20% aligned and would need an 80% patch to bring it into alignment.
As an executive trying to revitalize a property where fans are complaining about lack of alignment, do you understand why you might just erase the 20% rather than create 80% more of an otherwise-failed project?
>As an executive trying to revitalize a property where fans are complaining about lack of alignment, do you understand why you might just erase the 20% rather than create 80% more of an otherwise-failed project?
If you're an executive, you understand that the purpose of these properties is to make money. Discovery makes money. It has fans. It sells merchandise. You don't erase money because of the rancor of some pedantic nerds, most people do not care.
Cancellation is not the same as "erasure from canon." Lot of popular shows get cancelled all the time - Lower Decks got cancelled as well. Despite the narrative, Discovery was popular. It was a success.
Discovery is still canon. The premise of this thread is a conspiracy theory which is not factual. Star Trek Discovery is still a part of the Star Trek franchise. You can still purchase official Discovery memorabilia. You can still stream it. It has an official Blu-ray release.
My friend has been playing Dead Space on it for the last few days and has been quite happy. I think they released a patch recently. Give it another go if you’re interested!
Oh nice! And thank you, I actually meant to take a look every week at the Steam page, and you're right, they have added the Steam Deck compatible box, and it's been tested / verified now. Amazing.
Yes. The point I took away from this is that this is not at all a focus of most academic settings. This ends up leaving a huge gap and leaving candidates with an academic DS background woefully unprepared and undesirable.
That seems strange to me. People on forums like this often describe Data Science practitioners as "statisticians that can code". If academic Data Science programs aren't emphasizing data engineering as part of their curriculum, what differentiates a Data Science program from statistics or business intelligence?
> If academic Data Science programs aren't emphasizing data engineering as part of their curriculum, what differentiates a Data Science program from statistics or business intelligence?
In my experience, they're emphasizing software-based data work like machine learning, but not the (vital) peripherals like
cleaning/studying/loading data or monitoring and sanity-checking outputs.
A data science student might get a process-first task like making predictions from data using KNN, regressions, t-tests, or neural nets, choosing a method and optimizing based on performance. A statistics student might focus on theory, choosing an appropriate analysis method in advance based on the dataset, and reasoning about the effects of error instead of just trying to reduce it.
But the data scientist could still be training on a clean, wholly-theoretical dataset or a highly predictable online-training environment. The result is a lot of entry-level data scientists who are mechanically talented but stymied by real-world hurdles. Issues handling dirty or inconstant data, for one. But there are a lot of others: a tendency to do analysis in a vacuum, without taking advantage of knowledge about the domain and data source; or judging output effectiveness based on training accuracy, without asking whether the dataset is (and will stay) well-matched to the actual task.
I don't mean that to sound dismissive; there are lots of people who do all of that well, even newly-trained. But it does seem to be a common gap in a lot of data science education.
4th year EE undegraduate student here, taking both "Data Analysis/Pattern Rec" and "Computer Vision" electives this term. My early courses prepared me more for a path focused in circuit design, but I jumped ship through exposure to wonderful, wonderful DSP. A lot of what I'm learning now is very new to me, so, I appreciate comments like yours that give a sense of potential gaps in my learning. Thank you.
I'm currently working on an assignment for CV in which we extract Histogram of Oriented Gradient features from the CIFAR-10 dataset using python, then use them to train one of three classifiers (SVM, Gaussian Naive Bayes, Logistic Regression). I had asked about preprocessing, but was told it was outside the scope of this assignment, so we're just using the dataset as-is. :(
The nice bit is, I have a research internship coming up in a lab that will have me working on actual datasets, rather than toy examples. And, there's a data science club on campus that has an explicit focus on cleaning data which I plan on regularly attending. So... hopefully I'm on the right track!
Don't worry, when you have real problems you will have time to learn. Most of the time is not even data cleaning, but debugging, getting into the details of the data or code written by somebody else to understand why something is not working (and there's always something that's not working :) ). The main differentiator is whether you have interest / patience for that or not.
I'm not familiar with academic Data Science programs but I've worked with statisticians for over fifteen years and they are usually very involved on the data engineering side. If they aren't running the systems then they are working closely with those people to test and confirm inputs and outputs before running analyses.
> they are working closely with those people to test and confirm inputs and outputs before running analyses
In terms of data science training, at least, this is often a missing element. It's easy to create classroom tasks that focus on teaching how to do analyses and neglect practical aspects like validating data and sanity-checking results. People pick it up on the job, of course, but I wouldn't be surprised if statisticians get a better academic grounding from things like reasoning about uncertainty.
(It's not a problem specific to data science, either. I've heard plenty of complaints about new engineers who are so used to made-up problems that they don't balk at ludicrous data or results when they start doing real work.)
> It's not a problem specific to data science, either. I've heard plenty of complaints about new engineers who are so used to made-up problems that they don't balk at ludicrous data or results when they start doing real work.
This is one of the reasons I think we need to better integrate technology (and general data analysis follows the same reasoning) across the curriculum: an increasing share of work (and a more rapidly increasing share of good paying work) is knowledge-based work that involves both data analysis and working with people who are doing automation, on top of that which is primarily automation or data analysis. But we don't trash other knowledge skills in relation to automation and analysis, which leaves people specialized in automation and analysis and people specialized in domain skills talking to each other over a wide gap too often, with a lot falling through the cracks.