Outside of disciplines that use LaTeX, the ability of authors to do typesetting is pretty limited. And there are other typesetting requirements that no consumer tool makes particularly easy; for instance, due to funding requirements, many journals deposit biomedical papers with PubMed Central, which wants them in JATS XML. So publishers have to prepare a structured XML version of papers.
Accessibility in PDFs is also very difficult. I'm not sure any publishers are yet meeting PDF/UA-2 requirements for tagged PDFs, which include things like embedding MathML representations of all mathematics so screenreaders can parse the math. LaTeX only supports this experimentally, and few other tools support it at all.
At least in my experience, grad students don't pay submission fees. It usually comes out of an institutional finances account, typically assigned to the student's advisor (who is generally the corresponding author on the submission). (Not that the waiver isn't a good idea — I just don't think the grad students are the ones who would feel relieved by that arrangement.)
Also, I'm pretty sure my SIG requires LaTeX submissions anyway... I feel like I remember reading that at some point when I submitted once, but I'm not confident in that recollection.
> Outside of disciplines that use LaTeX, the ability of authors to do typesetting is pretty limited.
Since this is obviously true, and yet since most journals (with some exceptions) demand you follow tedious formatting requirements or highly restrictive templates, this suggests, in fact, that journals are outsourcing the vast majority of their typesetting and formatting to submitters, and doing only the bare minimum themselves.
Most of the tedious formatting requirements do not match what the final typeset article looks like. The requirements are instead theoretically to benefit peer reviewers, e.g., by having double-spaced lines so they can write their comments on the paper copy that was mailed to them back when the submission guidelines were written in the 1950s.
The smarter journals have started accepting submissions in any format on the first round, and then only require enough formatting for the typesetters to do their job.
For my area, everybody uses LaTeX styles that more or less produce PDFs identical to the final versions published in proceedings. Or, at least, it's always looked close enough to me that I haven't noticed any significant differences, other than some additional information in the margins.
It didn't "survey" devs. It paid them to complete real tasks while they were randomly assigned to use AI or not, and measured the actual time taken to complete the tasks vs. just the perception. It is much higher quality evidence than a convenience sample of developers who just report their perceptions.
Sure, if you're learning to write and want lots of examples of a particular style, LLMs can generate that for you. Just don't assume that is a normal writing style, or that it matches a particular genre (say, workplace communication, or academic writing, or whatever).
Our experience (https://arxiv.org/abs/2410.16107) is that LLMs like GPT-4o have a particular writing style, including both vocabulary and distinct grammatical features, regardless of the type of text they're prompted with. The style is informationally dense, features longer words, and favors certain grammatical structures (like participles; GPT-4o loooooves participles).
With Llama we're able to compare base and instruction-tuned models, and it's the instruction-tuned models that show the biggest differences. Evidently the AI companies are (deliberately or not) introducing particular writing styles with their instruction-tuning process. I'd like to get access to more base models to compare and figure out why.
Go vibe check Kimi-K2. One of the weirdest models out there now, and it's open weights - with both "base" and "instruct" versions available.
The language it uses is peculiar. It's like the entire model is a little bit ESL.
I suspect that this pattern comes from SFT and RLHF, not the optimizer or the base architecture or the pre-training dataset choices, and the base model itself would perform much more "in line" with other base models. But I could be wrong.
Goes to show just how "entangled" those AIs are, and how easy it is to affect them in unexpected ways with training. Base models have a vast set of "styles" and "language usage patterns" they could draw from - but instruct-tuning makes a certain set of base model features into the "default" persona, shaping the writing style this AI would use down the line.
I definitely know what you mean, each model definitely has it's own style. I find myself mentally framing them as like horses with different personalities and riding quirks.
Still, perhaps saying "copy" was a bit misleading. Influence would have been more precise way of putting it. After all, there is no such thing as a "normal" writing style in the first place.
So long as you communicate with anything or anyone, I find people will naturally just absorb the parts they like without even noticing most of the time.
I don't think the AI companies are systematically working to make their models sound more human. They're working to make them better at specific tasks, but the writing styles are, if anything, even more strange as they advance.
Comparing base and instruction-tuned models, the base models are vaguely human in style, while instruction-tuned models systematically prefer certain types of grammar and style features. (For example, GPT-4o loves participial clauses and nominalizations.) https://arxiv.org/abs/2410.16107
When I've looked at more recent models like o3, there are other style shifts. The newer OpenAI models increasingly use bold, bulleted lists, and headings -- much more than, say, GPT-3.5 did.
So you get what you optimize for. OpenAI wants short, punchy, bulleted answers that sound authoritative, and that's what they get. But that's not how humans write, and so it'll remain easy to spot AI writing.
That's interesting. I had not heard that. I wonder if making them sound more human and making them better at specific tasks though are mutually exclusive. (Or if perhaps making them sound more human is in fact also a valid task.)
In our studies of ChatGPT's grammatical style (https://arxiv.org/abs/2410.16107), it really loves past and present participial phrases (2-5x more usage than humans). I didn't see any here in a glance through the lightfastness section, though I didn't try running the whole article through spaCy to check. In any case it doesn't trip my mental ChatGPT detector either; it reads more like classic SEO writing you'd see all over blogs in the 20-teens.
edit: yeah, ran it through our style feature tagger and nothing jumps out. Low rate of nominalizations (ChatGPT loves those), only a few present participles, "that" as subject at a usual rate, usual number of adverbs, etc. (See table 3 of the paper.) No contractions, which is unusual for normal human writing but common when assuming a more formal tone. I think the author has just affected a particular style, perhaps deliberately.
Tangent, but I'm curious about how your style feature tagger got "no contractions" when the article is full of them. Just in the first couple of paras we have it's, that's, I've, I'd...
Probably because the article uses the Unicode right single quotation mark instead of apostrophes, due to some automated smart-quote machinery. I'll have to adjust the tagger to handle those.
If the output is interpreting sources rather than just regurgitating quotes from them, you need to exert judgment to verify they support its claims. When the LLM output is about some highly technical subject, it can require expert knowledge just to judge whether the source supports the claims.
Courts have always had the power to compel parties to a current case to preserve evidence. (For example, this was an issue in the Google monopoly case, since Google employees were using chats set to erase after 24 hours.) That becomes an issue in the discovery phase, well after the defendant has an opportunity to file a motion to dismiss. So a case with no specific allegation of wrongdoing would already be dismissed.
The power does not extend to any of your hypotheticals, which are not about active cases. Courts do not accept cases on the grounds that some bad thing might happen in the future; the plaintiff must show some concrete harm has already occurred. The only thing different here is how much potential evidence OpenAI has been asked to retain.
> Courts have always had the power to compel parties to a current case to preserve evidence.
Not just that, even without a specific court order parties to existing or reasonably anticipated litigation have a legal obligation that attaches immediately to preserve evidence. Courts tend to issue orders when a party presents reason to believe another party is out of compliance with that automatic obligation, or when there is a dispute over the extent of the obligation. (In this case, both factors seem to be in play.)
Lopez v. Apple (2024) seems to be a recent and useful example of this; my lay understanding is that Apple was found to have failed in its duty to switch from auto-deletion (even if that auto-deletion was contractually promised to users) to an evidence-preservation level of retention, immediately when litigation was filed.
Perhaps the larger lesson here is: if you don't want your service provider to end up being required to retain your private queries, there's really no way to guarantee it, and the only real mitigation is to choose a service provider who's less likely to be sued!
So if Amazon sues Google, claiming that it is being disadvantaged in search rankings, a court should be able to force Google to log all search activity, even when users delete it?
Maybe you misunderstood. The data is required to be retained, but there is no requirement to make it accessible to the opposition. OpenAI already has this data and presumably mines it themselves.
Courts generally require far more data to be retained than shared, even if this ask is much more lopsided.
If Amazon sues Google, a legal obligation to preserve all evidence reasonably related to the subject of the suit attaches immediately when Google becomes aware of the suit, and, yes, if there is a dispute about the extent of that obligation and/or Google's actual or planned compliance with it, the court can issue an order relating to it.
>At Google's scale, what would be the hosting costs of this I wonder. Very expensive after a certain point, I would guess.
Which would be chump change[0] compared to the costs of an actual trial with multiple lawyers/law firms, expert witnesses and the infrastructure to support the legal team before, during and after trial.
> It can be just anonymised search history in this case.
Depending on the exact issues in the case, a court might allow that (more likely, it would allow only turning over anonymized data in discovery, if the issues were such that that there was no clear need for more) but generally the obligation to preserve evidence does not include the right to edit evidence or replace it with reduced-information substitutes.
We found that one was a bad idea in the earliest days of the web when AOL thought "what could the harm be?" about turning over anonymised search queries to researchers.
That sounds impossible to do well enough without being accused of tampering with evidence.
Just erasing the userid isn’t enough to actually anonymize the data, and if you scrubbed location data and entities out of the logs you might have violated the court order.
Though it might be in our best interests as a society we should probably be honest about the risks of this tradeoff; anonymization isn’t some magic wand.
So then the courts need to find who is setting their chats do be deleted and order them to stop. Or find specific infringing chatters and order OpenAI to preserve these specified users’ logs. OpenAI is doing the responsible thing here.
OpenAI is the custodian of the user data, so they are responsible. If you wanted the court (i.e., the plaintiffs) to find specific infringing chatters, first they'd have to get the data from OpenAI to find who it is -- which is exactly what they're trying to do, and why OpenAI is being told to preserve the data so they can review it.
So the courts should start ordering all ISPs, browsers, and OSs to log all browsing and chat activity going forward, so they can find out which people are doing bad things on the internet.
However, if the ISP, for instance, is sued, then it (immediately and without a separate court order) becomes illegal for them to knowingly destroy evidence in their custody relevant to the issue for which they are being sued, and if there is a dispute about their handling of particular such evidence, a court can and will order them specifically to preserve relevant evidence as necessary. And, with or without a court order, their destruction of relevant evidence once they know of the suit can be the basis of both punitive sanctions and adverse findings in the case to which the evidence would have been relevant.
If those entities were custodians in charge of the data at hand in the court case, the court would order that.
This post appears to be full of people who aren’t actually angry at the results of this case but angry at how the US legal system has been working for decades, possibly centuries since I don’t know when this precedent was first set
What privacy specifically? The courts have always been able to compel people to recount things they know which could include a conversation between you and your plumber if it was somehow related to a case.
The company records and uses this stuff internally, retention is about keeping information accurate and accessible.
Lawsuits allow in a limited context the sharing of non public information held by individuals/companies in the lawsuit. But once you submit something to OpenAI it’s now there information not just your information.
I think that some of the people here dislike (or are alarmed by) the way that the court can compel parties to retain data which would otherwise have vanished into the ether.
> I think that some of the people here dislike (or are alarmed by) the way that the court can compel parties to retain data which would otherwise have vanished into the ether.
Maybe so, but this has always been the case for hundreds of years.
After all, how on earth do you propose having getting fair hearing if the other party is allowed to destroy the evidence you asked for in your papers?
Because this is what would happen:
You: Your Honour, please ask the other party to turn over all their invoices for the period in question
Other Party: We will turn over only those invoices we have
*Other party goes back to the office and deletes everything.
The thing is, once a party in a suit asks for a certain piece of evidence, the other party can't turn around and say "Our policy is to delete everything, and our policy trumps the orders of this court".
I think your points are all valid, but… On the other hand, this sort of preservation does substantially reduce user privacy, disclosing personal information to unauthorized parties, with no guarantees of security, no audits, and few safeguards.
This is much more concerning (from a privacy perspective) than a company using cookies to track which pages on a website they’ve visited.
> On the other hand, this sort of preservation does substantially reduce user privacy,
Yes, that's by design and already hundreds of years old in practice.
You cannot refuse a court evidence to protect your or anyone else's privacy.
I see no reason to make an exception for rich and powerful companies.
I don't want a party to a suit having the ability to suppress evidence due to privacy concerns. There is no privacy once you get to a civil court other than what the court, at its discretion, allows, such as anonymisation.
I disagree because the information has already been recorded and users don’t have a say in who at the company or some random 3rd party the company sells that data to is “authorized” to view data.
It’s the collection itself that’s the problem not how soon it’s deleted as economically worthless.
> with no guarantees of security, no audits, and few safeguards.
The courts pay far more attention to that stuff than profit maximizing entities like OpenAI.
I agree that your assessment of the legal state-of-play is likely accurate. That said it is one thing for data to be cached in the short-term, and entirely different for it to be permanently stored and then sent out to parties which the user has only a distant and likely adversarial relationship with.
There are many situations in which the deletion/destruction of ‘worthless’ data is treated as a security protection. The one that comes to mind is how some countries destroy fingerprint data after it has been used for the creation of a biometric passport. Do you really think this is a futile act?
>”The courts pay far more attention to that stuff than profit maximizing entities like OpenAI.”
I would be interested to see evidence of this. The courts claim to value data security, but I have never seen an audit of discovery-related data storage, and I suspect there are substantial vulnerabilities in the legal system, including the law firms. Can a user hold the court or opposing law firm financially accountable if they fail to safeguard this data? I’ve never seen this happen.
> That said it is one thing for data to be cached in the short-term
Cashed data isn’t necessarily available for data retention to apply in the first place. Just because an ISP has parts of a message in some buffer doesn’t mean it’s considered as a recording of that data. If Google never stores queries beyond what’s needed to serve a response then it likely wouldn’t qualify.
Also, it’s on the entity providing data for the discovery process to do redaction as appropriate. The only way it ends up at the other end is if it gets sent in the first pace. There can be a lot of back and forth here and as to evidence that the courts care: https://www.law.cornell.edu/rules/frcp/rule_5.2
That is helpful, thanks, but I think it is not practical to redact LLM request information beyond the GDPR personally identifiable standards without just deleting everything. My (admittedly quick) read of those rules is that their ‘redacted’ information would still be readily identifiable anyway (not directly, but using basic data analysis). Their redaction standards for CC# and SIN are downright pathetic, and allow for easy recovery with modern techniques.
Its not an “invasion of privacy” for a company who already had data to be prohibited from destroying it when they are sued in a case where that data is evidence.
Yeah, sure. But understanding the legal system tells us the players and what systems exist that we might be mad at.
For me, one company obligated to retain business records during civil litigation against another company, reviewed within the normal discovery process is tolerable. Considering the alternative is lawlessness. I'm fine with it.
Companies that make business records out of invading privacy? They, IMO, deserve the fury of 1000 suns.
If you cared about your privacy, why are you handing all this stuff to Sam Altman? Did he represent that OpenAI would be privacy-preserving? Have they taken any technical steps to avoid this scenario?
> So the courts should start ordering all ISPs, browsers, and OSs to log all browsing and chat activity going forward, so they can find out which people are doing bad things on the internet.
Not "all", just the ones involved in a current suit. They already routinely do this anway (Party A is involved in a suit and is ordered to retain any and all evidence for the duration of the trial, starting from the first knowledge that Party A had of the trial).
You are mischaracterising what happens; you are presenting it as "Any court, at any time can order any party who is not involved in any suit in that sourt to forever hold user data"
Or you didn't read what was written by the other comment, or are just arguing in bad faith, what's even weierder because the guy was only explaining how the the system always worked
> So then the courts need to find who is setting their chats do be deleted and order them to stop.
No, actually, it doesn't. Ordering a party to stop destroying evidence relevant to a current case (which is its obligation even without a court order) irrespective of whether someone else asks it to destroy that evidence is both within the well-established power of the court, and routine.
> Or find specific infringing chatters and order OpenAI to preserve these specified users’ logs.
Under this theory, if a company had employees shredding incriminating documents at night, the court would have to name those employees before ordering them to stop.
That is ridiculous. The company itself receives that order, and is IMMEDIATELY legally required to comply - from the CEO to the newest-hired member of the cleaning staff.
Time does not need user logs to prove such a thing if it was true. Times can show that it is possible so they can show how their own users can access the text. Why would they need other user's data?
Because its a copyright infringement case, so existence and the scale of the infringement is relevant to both whether there is liability and, if so, how much; the issue isn't that it is possible for infringement to occur.
> We don't even know if Times uses AI to get information from other sources either
which is irrelevant at this stage. Its a legal principle that both sides can fairly discover evidence. As finding out how much openAI has infringed copyright is pretty critical to the case, they need to find out.
After all, if its only once or twice, thats a couple of dollars, if its millions of times, that hundreds of millions
For the most part (there are a few exceptions), in the US lawsuits are not based on "possible" harm but actual observed harm. To show that, you need actual observed user behavior.
The argument seems to be that for an expert programmer, who is capable of reading and understanding AI agent code output and merging it into a codebase, AI agents are great.
Question: If everyone uses AI to code, how does someone become an expert capable of carefully reading and understanding code and acting as an editor to an AI?
The expert skills needed to be an editor -- reading code, understanding its implications, knowing what approaches are likely to cause problems, recognizing patterns that can be refactored, knowing where likely problems lie and how to test them, holding a complex codebase in memory and knowing where to find things -- currently come from long experience writing code.
But a novice who outsources their thinking to an LLM or an agent (or both) will never develop those skills on their own. So where will the experts come from?
I think of this because of my job as a professor; many of the homework assignments we use to develop thinking skills are now obsolete because LLMs can do them, permitting the students to pass without thinking. Perhaps there is another way to develop the skills, but I don't know what it is, and in the mean time I'm not sure how novices will learn to become experts.
> Question: If everyone uses AI to code, how does someone become an expert capable of carefully reading and understanding code and acting as an editor to an AI?
Well, if everyone uses a calculator, how do we learn math?
Basically, force students to do it by hand long enough that they understand the essentials. Introduce LLMs at a point similar to when you allow students to use a calculator.
> Well, if everyone uses a calculator, how do we learn math?
Calculators have made most people a lot worse in arithmetic. Many people, for instance, don't even grasp what a "30%" discount is. I mean other than "it's a discount" and "it's a bigger discount than 20% and lower than 40%". I have seen examples where people don't grasp that 30% is roughly one third. It's just a discount, they trust it.
GPS navigation has made most people a lot worse at reading maps or generally knowing where they are. I have multiple examples where I would say something like "well we need to go west, it's late in the day so the sun will show us west" and people would just not believe me. Or where someone would follow their GPS on their smartphone around a building to come back 10m behind where they started, without even realising that the GPS was making them walk the long way around the building.
Not sure the calculator is a good example to say "tools don't make people worse with the core knowledge".
GPS has also ruined our city-level spatial awareness.
Before, you had the map. So you were aware that Fitzroy was to the west of Collingwood and both were south of Clifton Hill and so on. I had dozens of these suburbs roughly mapped out in my mind.
Driving down an unfamiliar road, one could use signs to these suburbs as a guide. I might not know exactly where I was, but I had enough of an idea to point me in the right direction.
>Before, you had the map. So you were aware that Fitzroy was to the west of Collingwood and both were south of Clifton Hill and so on. I had dozens of these suburbs roughly mapped out in my mind.
>Driving down an unfamiliar road, one could use signs to these suburbs as a guide. I might not know exactly where I was, but I had enough of an idea to point me in the right direction.
Reading those sentences feels like I am dreaming.
The exploration...
The possibilities...
Serendipitously finding you way through and getting temporarily lost at night in a big friendly suburban area with trees and in summer...
But how important is the core knowledge if it isn't necessary to achieve the outcomes people actually value? People only cared about map reading skills to the extent that it got them where they want to go. Once GPS became a thing, especially GPS on mobile phones, getting them where they want to go via map reading became irrelevant. Yes, there are corner cases where map reading or general direction finding skills are useful, but GPS does a vastly better and quicker job in the large majority of cases so our general way-finding experience has improved.
This is especially true because the general past alternative to using GPS to find some new unfamiliar place wasn't "read a map" it was "don't go there in favor of going some place you already knew" in a lot of cases. I remember the pre-GPS era, and my experience in finding new stuff is significantly better today than it was back then.
Using map reading skills as a proxy for this is a bit of a strawman. People who use GPS habitually have worse navigational and spatial awareness skills.
If you habitually use a calculator for all arithmetic, could the result not be similar? What if you reach to an LLM for all your coding, general research, etc.? These tools may vastly speed up some workflows, but your brain is a muscle.
I think you're missing the point, which is to say "those tools make us more productive, but less knowledgeable".
And you answer by saying "it's okay to be less knowledgeable (and hence depend on the tool), as long as you are more productive". Which is a different question.
But to me it's obviously not desirable: if AI allows people to completely lose all sense of critical thinking, I think it's extremely dangerous. Because whoever controls the AI controls those people. And right now, look at the techbros who control the AIs.
So the original question is: is it the case that AI reduces the skills of the people who use them? The calculator and the GPS are examples given to suggest that it doesn't sound unlikely.
At the end of the day, it's the average productivity across a population that matters.
So GPS makes people worse at orienteering -- on average, does it get everyone where they need to go, better / faster / easier?
Sometimes, the answer is admittedly no. Google + Facebook + TikTok certainly made us less informed when they cannibalized reporting (news media origination) without creating a replacement.
But on average, I'd say calculators did make the population more mathematically productive.
After all, lots of people sucked at math before them too.
> After all, lots of people sucked at math before them too.
A calculator doesn't do maths, it does arithmetic. People sucked at maths, but I'm pretty sure they were better with arithmetic.
> At the end of the day, it's the average productivity across a population that matters.
You're pushing my example. My point is that AI may actually make the average developer worse. Sure, also more productive. So it will reinforce this trend that has been in the software industry for more than a decade: produce more but worse software.
Productivity explains why we do it. It doesn't mean it is desirable.
> And no, that's not some slight of verbal hand in measuring "productive" -- they are able to ship more value, faster.
Ship more value faster is exactly a verbal slight of hand. That's the statement used by every bad product manager and finance asshole to advocate for shipping out broken code faster. It's more value because more code is more content, but without some form of quality guard rails you run into situations where everything breaks. I've been on teams just like that where suddenly everything collapses and people get mad.
Do you think compilers helped teams ship more value faster from worse developers? IDEs with autocomplete? Linters?
At the end of the day, coders are being paid money to produce something.
It's not art -- it's a machine that works and does a thing.
We can do that in ways that create a greater or lesser maintenance burden, but it's still functional.
LLM coding tools detractors are manufacturing reasons to avoid using another tool that helps them write code.
They need to get over the misconception of what the job is. As another comment previously quipped 'If you want to write artisanal, hand-tuned assembly that's beautiful, do that on your own time for a hobby project.'
> Do you think compilers helped teams ship more value faster from worse developers? IDEs with autocomplete? Linters?
I'm tired of engaging with this false equivalence so I won't. Deterministic systems are not the same.
> It's not art -- it's a machine that works and does a thing.
That's right. But what you need to understand is that the machines we create can and do actively harm people. Leaking secure information, creating software that breaks systems and takes down critical infrastructure. We are engineers first and foremost and artists second. And that means designing systems to be robust and safe. If you can't understand that then you shouldn't be an engineer and should kindly fuck off.
There is a big difference with compilers. With compilers, the developer still needs to write every single line of code. There is a clear an unambiguous contract between the source code and what gets executed (if it's ambiguous, it is a bug).
The thread here was talking about:
> Well, if everyone uses a calculator, how do we learn math?
The question being whether or not AI will make developers worse at understanding what their code is doing. You can say that "it's okay if a website fails every 100 times, the user will just refresh and we're still more profitable". But wouldn't you agree that such a website is objectively of worse quality? It's cheaper, for sure.
Said differently: would you fly in a plane for which the autopilot was vibe coded? If not, it tells you something about the quality of the code.
Do we always want better code? I don't know. What I see is that the trend is enshittification: more profit, worse products. I don't want that.
> [With compilers] There is a clear an unambiguous contract between the source code and what gets executed
Debatable in practice. You can't tell me you believe most developers understand what their compiler is doing, to a level of unambiguity.
Whether something gets unrolled, vectorized, or NOP-padded is mysterious. Hell, even memory management is mysterious in VM-based languages now.
And yes (to the inevitable follow-up) still deterministic, but those are things that developers used to have to know, now they don't, and the world keeps spinning.
> You can say that "it's okay if a website fails every 100 times, the user will just refresh and we're still more profitable". But wouldn't you agree that such a website is objectively of worse quality? It's cheaper, for sure.
I would say that's the reality we've been living in since ~2005. How often SaaS products have bugs? How frequently mobile apps ship a broken feature?
There are two components here: (1) value/utility & (2) cost/time.
There are many websites out there that can easily take a 1 in 100 error rate and still be useful.
But! If such a website, by dint of its shitty design, can be built with 1/100th of the resources (or 100x websites can be built with the same), then that might be a broader win.
Not every piece of code needs to fly in space or run nuclear reactors. (Some does! And it should always have much higher standards)
> Said differently: would you fly in a plane for which the autopilot was vibe coded? If not, it tells you something about the quality of the code.
I flew in a Boeing 737 MAX. To the above, that's a domain that should have called for higher software standards, but based on the incident rate I had no issue doing so.
> Do we always want better code? I don't know. What I see is that the trend is enshittification: more profit, worse products. I don't want that.
The ultimate tradeoff is between (expensive/less, better code) and (cheaper/more, worse code).
If everything takes a minimum amount of cost/effort, then some things will never be built. If that minimum cost/effort decreases, then they can be.
You and I are of like mind regarding enshittification and declining software/product standards, but I don't think standing in front of the technological advancement train is going to slow it.
If a thing can be built more cheaply, someone will do it. And then competitors will be forced to cheapen their product as well.
Imho, the better way to fight enshittification is creating business models that reward quality (and scale).
> You and I are of like mind regarding enshittification and declining software/product standards, but I don't think standing in front of the technological advancement train is going to slow it.
Note that I'm well aware that I won't change anything. I'm really just saying that AI will help the trend of making most software become worse. It sucks, but that's how it is :-).
The glass half-full would be that effective AI coding tools (read: more competent than a minimal cost human) may actually improve average software quality!
Suppose it depends on how quickly the generative effectiveness improves.
> I'm suggesting you consider it from an objective perspective.
What is objective? That profitability is good? We're destroying our environment to the point where many of us will die from it for the sake of profitability. We're over-using limited natural resources for the sake of profitability. In my book that's not desirable at all.
Companies are profit-maximising machines. The path to more profitability tends to be enshittification: the company makes more money by making it worse for everybody. AI most definitely requires more resources and it seems like those resources will be used to do more, but of lower quality.
Surely that's profitable. But I don't think it is desirable.
I'm unconvinced that calculators have made most people a lot worse in arithmetic. There have always been people who are bad at math. It's likely there are fewer people who can quickly perform long division on paper, but it's also possible the average person is _more_ numerate because they can play around with a calculator and quickly build intuition.
Arithmetic is also near-useless if you have access to a calculator. It's also a completely different skill thab reasoning about numbers, which is a very useful skill.
But, logically, you need to spend time thinking about numbers to be good reasoning about them, and the calculator is about reducing that time.
I feel there's a bit of a paradox, with many subjects, where we all know the basics are the absolute most important thing, but when we see the basics taught in the real world, it seems insultingly trivial.
I understand what you're saying, but I legitimately am unconvinced learning long division is necessary to learn by hand to master division. If anything, perhaps we should be asking children to derive arithmetic from use of a calculator.
I think it’s pretty hard to reason about numbers without having mastered arithmetic. Or at least beat your brain against it long enough that you understand the concepts even if you don’t have all the facts memorized.
I disagree; i think the focus on arithmetic actually enables people saying they're "bad at math" when symbolic reasoning is a completely different (and arguably much easier) skill. You an easily learn algebra without knowing long division.
Hell, if I had to do long division today without a computer I'd have to re-derive it.
I don't think it's so much about doing a long division. To me, it's more about having an intuition that 30/100 is roughly "one third", and that you can put three thirds in the full thing.
And I don't mean specifically those numbers, obviously. Same goes with 20/100, or understanding orders of magnitudes, etc.
Many people will solve a "maths problem" with their calculator, end up with a result that says that "the frog is moving at 21km/s" and not realise that it doesn't make any sense. "Well I applied the recipe, the calculator gave me this number, I assume this number is correct".
It's not only arithmetic of course, but it's part of it. Some kind of basic intuition about maths. Just look at what people were saying during Covid. I have heard so many people say completely wrong stuff because they just don't have a clue when they see a graph. And then they vote.
I agree you can learn algebra without knowing (or being good at) long division on paper, but you need to have a good conceptual understanding of what division is and I don't think a lot of people get that without the rote process of doing it over and over in elementary school.
I can do plenty of arithmetic much faster than I could type it on a calculator keypad. That's like saying hardware keyboards are near-useless if you have access to a touchscreen.
Would you be able to do your numerical work without understanding what an addition or a subtraction is?
I feel like arithmetic is part of the basics to build abstraction. If I say "y = 3x + a", somewhere I have to understand what "3 times x" means and what the "+" means, right?
Or are you saying that you can teach someone to do advanced maths without having a clue about arithmetic?
Sure there have always been people bad at math. But basic arithmetic is not really math. We used to drill it into kids but we no longer do so and I can usually see the difference between generations. For example, women in my mother’s generation were not prioritised for education but they often are pretty quick at arithmetic. But kids and young adults I come across pull out their phones for basic additions and divisions. And I find myself pulling out my phone more and more often.
I mean it’s not the end of the world and as you’ve said the raw number of people of numerate people are rising thanks to technology. But technology also seem to rob people of motivation to learn somewhat useful skills and even more so with LLMs.
For instance, you can certainly say that 381/7 is a positive number. And if I say "381/7 = 198", you can easily say that it is clearly wrong, e.g. because you immediately see that ~200 is roughly half of ~400, so it cannot be anywhere close to 1/7th.
I believe that this is an acquired skill that requires basic arithmetic. But if you need a calculator to realise that 381 is roughly twice as big as 198, then you can't do any of the reasoning above.
One may say "yeah but the point of the calculator is to not have to do the reasoning above", but I disagree. In life, we don't go around with a calculator trying to find links between stuff, like "there are 17 trees in this street, 30 cars, what happens if I do 17+30? Or 30-17? Or 30*17?". But if you have some intuition about numbers, you can often make more informed decisions ("I need to wait in one of those lines for the airport security check. This line is twice as long but is divided between three officers at the end, whereas this short line goes to only one officer. Which one is likely to be faster?").
I see what you're saying, but I just don't care that much about numbers to draw any conclusions you did about the figure you presented. I just see a string of digits.
Try standing in line at a grocery store and listening to people get upset because the amount is much higher than they thought it would be. You will hear statements like "But how is it $43? I didn't buy anything that costs more than $5"
People that failed to grasp arithmetic cannot reason about numbers to a useful degree.
> People that failed to grasp arithmetic cannot reason about numbers to a useful degree.
I think you're extrapolating far too much from such a simple interaction, which doesn't imply anything about ability to reason about numbers, just their ability to compute addition. If you say "if a is larger than b, and b is larger than c, is a larger than c?", you're testing numerical reasoning ability.
I'm not confused. A calculator does arithmetic, not maths. The question was:
> Well, if everyone uses a calculator, how do we learn math?
Which doesn't make much sense, because a calculator doesn't do maths. So I answered the question that does make sense: if everyone uses a calculator, do we still learn arithmetic? And I believe we don't.
And then, if we suck at basic arithmetic, it makes it harder to be good at maths.
But somehow I was born in the age of GPS and yet I ended up with a strong mental map and navigation skills.
I suspect there will be plenty of people who grow up in the age of LLMs and maybe by reading so much generated code, or just coding things themselves for practice, will not have a hard time learning solid coding skills. It may be easy to generate slop, but it’s also easy to access high quality guidance.
If calculators were unreliable... Well, we'd be screwed if everyone blindly trusted them and never learned math.
They'd also be a whole lot less useful. Calculators are great because they always do exactly what you tell them. It's the same with compilers, almost: imagine if your C compiler did the right thing 99.9% of the time, but would make inexplicable errors 0.1% of the time, even on code that had previously worked correctly. And then CPython worked 99.9% of the time, except it was compiled by a C compiler working 99.9% of the time, ...
But bringing it back on-topic, in a world where software is AI-generated, and tests are AI-generated (because they're repetitive, and QA is low-status), and user complaints are all fielded by chat-bots (because that's cheaper than outsourcing), I don't see how anyone develops any expertise, or how things keep working.
While I agree with your suggestion, the comparison does not hold: calculators do not tell you which numbers to input and compute. With an LLM you can just ask vaguely, and get an often passable result
Then figure out how to structure the assignment to make students show their work. If a student doesn't understand the concept, it will show in how they prompt AI.
For example, you could require that students submit all logs of AI conversations, and show all changes they made to the code produced.
IE, yesterday I asked ChatGPT how to add a copy to clipboard button in MudBlazor. It told me the button didn't exist, and then wrote the component for me. That saved me a bunch of research; but I needed to refactor the code for various reasons.
So, if this was for an assignment, I could turn in both my log from ChatGPT, and then show the changes I made to the code ChatGPT provided.
> a novice who outsources their thinking to an LLM or an agent (or both) will never develop those skills on their own. So where will the experts come from?
Well, if you’re a novice, don’t do that. I learn things from LLMs all the time. I get them to solve a problem that I’m pretty sure can be solved using some API that I’m only vaguely aware of, and when they solve it, I read the code so I can understand it. Then, almost always, I pick it apart and refactor it.
Hell, just yesterday I was curious about how signals work under the hood, so I had an LLM give me a simple example, then we picked it apart. These things can be amazing tutors if you’re curious. I’m insatiably curious, so I’m learning a lot.
Junior engineers should not vibe code. They should use LLMs as pair programmers to learn. If they don’t, that’s on them. Is it a dicey situation? Yeah. But there’s no turning back the clock. This is the world we have. They still have a path if they want it and have curiosity.
I agree, and it sounds like you're getting great results, but they're all going to do it. Ask anyone who grades their homework.
Heck, it's even common among expert users. Here's a study that interviewed scientists who use LLMs to assist with tasks in their research: https://doi.org/10.1145/3706598.3713668
Only a few interviewees said they read the code through to verify it does what they intend. The most common strategy was to just run the code and see if it appears to do the right thing, then declare victory. Scientific codebases rarely have unit tests, so this was purely a visual inspection of output, not any kind of verification.
> Junior engineers should not vibe code. They should use LLMs as pair programmers to learn. If they don’t, that’s on them. Is it a dicey situation? Yeah. But there’s no turning back the clock. This is the world we have. They still have a path if they want it and have curiosity.
Except it's impossible to follow your curiosity when everything in the world is pushing against it (unless you are already financially independent and only programming for fun). Junior developers compete in one of the most brutal labor markets in the world, and their deliverables are more about getting things done on time than doing things better. What they "should" do goes out the window once you step out of privilege and look at the real choices.
There is absolutely a thing where self-motivated autodidacts can benefit massively more from these new tools than people who prefer structured education.
Paradoxically, those self-motivated autodidacts will have to be without the stress and pressure of delivering things on time, and thus get largely limited to recreational programmers who don't have as much skin in the game in the first place.
Trust me, I am under enormous stress and pressure right now, more than at any other time in my life. I’m not someone sitting on a mountaintop, free of all cares. I’m someone trapped in a box that’s sinking under the waves, desperately trying to find a way to escape.
I approach problems with curiosity because I know that this is the only way I’ll find a way to survive and thrive again.
This reminds me of Isaac Asimov's "Profession" short story. Most people receive their ability (and their matching assigned profession, thus the name) from a computer. They then are able to do the necessary tasks for their job, but they can't advance the art in any way. A few people aren't compatible with this technology, and they instead learn to do things themselves, which is fortunate because it's the only way to advance the arts.
Deliberate practice, which may take a form different from productive work.
I believe it's important for students to learn how to write data structures at some point. Red black trees, various heaps, etc. Students should write and understand these, even though almost nobody will ever implement one on the job.
Analogously electrical engineers learn how to use conservation laws and Ohm's law to compute various circuit properties. Professionals use simulation software for this most of the time, but learning the inner workings is important for students.
The same pattern is true of LLMs. Students should learn how to write code, but soon the code will write itself and professionals will be prompting models instead. In 5-10 years none of this will matter though because the models will do nearly everything.
I agree with all of this. But it's already very difficult to do even in a college setting -- to force students to get deliberate practice, without outsourcing their thinking to an LLM, you need various draconian measures.
And for many professions, true expertise only comes after years on the job, building on the foundation created by the college degree. If students graduate and immediately start using LLMs for everything, I don't know how they will progress from novice graduate to expert, unless they have the self-discipline to keep getting deliberate practice. (And that will be hard when everyone's telling them they're an idiot for not just using the LLM for everything)
You're talking about students, but the question was about seniors. You don't go to school to become a senior dev, you code in real-world settings, with real business pressures, for a decade or two to become a senior. The question is how are decent students supposed to grow into seniors who can independently evaluate AI-produced code if they are forced to use the magic box and accept its results before being able to understand them?
> Question: If everyone uses AI to code, how does someone become an expert capable of carefully reading and understanding code and acting as an editor to an AI?
LLMs are very much like pair programmers in my experience. For the junior engineer, they are excellent resources for learning, the way a senior engineer might be. Not only can they code what the junior can’t, they can explain questions the junior has about the code and why it’s doing what it’s doing.
For senior devs, it is a competent pair programmers, acting as an excellent resource for bouncing ideas off of, rubber ducking, writing boilerplate, and conducting code reviews.
For expert devs, it is a junior/senior dev you can offload all the trivial tasks to so you can focus on the 10% of the project that is difficult enough to require your expertise. Like a junior dev, you will need to verify what it puts together, but it’s still a huge amount of time saved.
For junior devs specifically, if they are not curious and have no interest in actually learning, they will just stop at the generated code and call it a day. That’s not an issue with the tool, it’s an issue with the dev. For competent individuals with a desire to learn and grow, LLMs represent one of the single best resources to do so. In that sense, I think that junior devs are at a greater advantage than ever before.
> That’s not an issue with the tool, it’s an issue with the dev.
Hard disagreeing here. It's a difference to work on a task because you feel it brings you tangible progress or because it's an artificial exercise that you could really do with one sentence to Claude if it weren't for the constraints of the learning environment. This feeling is actually demotivating for learning.
I don’t know about you, but I use LLMs as gateways to knowledge. I can set a deep research agent free on the internet with context about my current experience level, preferred learning format (books), what I’m trying to ramp up on, etc. A little while later, I have a collection of the definitive books for ramping up in a space. I then sit down and work through the book doing active recall and practice as I go. And I have the LLM there for Q&A while I work through concepts and “test the boundaries” of my mental models.
I’ve become faster at the novice -> experienced arc with LLMs, even in domains that I have absolutely no prior experience with.
But yeah, the people who just use LLMs for “magic oracle please tell me what do” are absolutely cooked. You can lead a horse to water, but you can’t make it drink.
Arguments are made consistently about how this can replace interns or juniors directly. Others say LLMs can help them learn to code.
Maybe, but not on your codebase or product and not with a seniors knowledge of pitfalls.
I wonder if this will be programmings iPhone moment where we start seeing a lack of deep knowledge needed to troubleshoot. I can tell you that we’re already seeing a glut of security issues being explained by devs as “I asked copilot if it was secure and it said it was fine so I committed it”.
> I can tell you that we’re already seeing a glut of security issues being explained by devs as “I asked copilot if it was secure and it said it was fine so I committed it”.
And as with Google and Stack Overflow before, the Sr Devs will smack the wrists of the Jr's that commit untested and unverified code, or said Jr's will learn not to do those things when they're woken up at 2 AM for an outage.
That's assuming the business still employs those Sr Devs so they can do the wrist smacking.
To be clear, I think any business that dumps experienced devs in favor of cheaper vibe-coding mids and juniors would be making a foolish mistake, but something being foolish has rarely stopped business types from trying.
The way the responses to this subthread show the classical "the problem doesn't exist - ok, it does exist but it's not a big deal - ok, it is a big deal but we should just adapt to it" progression makes me wonder if we found one of the few actually genuine objections to LLM coding.
Nail on head. Before, innovations in code were extensions of a human's capabilities. The LLM-driven generation could diminish the very essence of writing meaningful code, to the point where they will live in the opposite of a golden era. The dead internet theory may yet prevail.
I think a large fraction of my programming skills come from looking through open source code bases. E.g. I'd download some code and spend some time navigating through files looking for something specific, e.g. "how is X implemented?", "what do I need to change to add Y?".
I think it works a bit like pre-training: to find what you want quickly you need to have a model of coding process, i.e. why certain files were put into certain directories, etc.
I don't think this process is incompatible with LLM use...
If I were a professor, I would make my homework start the same -- here is a problem to solve.
But instead of asking for just working code, I would create a small wrapper for a popular AI. I would insist that the student use my wrapper to create the code. They must instruct the AI how to fix any non-working code until it works. Then they have to tell my wrapper to submit the code to my annotator. Then they have to annotate every line of code as to why it is there and what it is doing.
Why my wrapper? So that you can prevent them from asking it to generate the comments, and so that you know that they had to formulate the prompts themselves.
They will still be forced to understand the code.
Then double the number of problems, because with the AI they should be 2x as productive. :)
For introductory problems, the kind we use to get students to understand a concept for the first time, the AI would likely (nearly) nail it on the first try. They wouldn't have to fix any non-working code. And annotating the code likely doesn't serve the same pedagogical purpose as writing it yourself.
Students emerge from lectures with a bunch of vague, partly contradictory, partly incorrect ideas in their head. They generally aren't aware of this and think the lecture "made sense." Then they start the homework and find they must translate those vague ideas into extremely precise code so the computer can do it -- forcing them to realize they do not understand, and forcing them to make the vague understanding concrete.
If they ask an AI to write the code for them, they don't do that. Annotating has some value, but it does not give them the experience of seeing their vague understanding run headlong into reality.
I'd expect the result to be more like what happens when you show demonstrations to students in physics classes. The demonstration is supposed to illustrate some physics concept, but studies measuring whether that improves student understanding have found no effect: https://doi.org/10.1119/1.1707018
What works is asking students to make a prediction of the demonstration's results first, then show them. Then they realize whether their understanding is right or wrong, and can ask questions to correct it.
Post-hoc rationalizing an LLM's code is like post-hoc rationalizing a physics demo. It does not test the students' internal understanding in the same way as writing the code, or predicting the results of a demo.
> They will still be forced to understand the code.
But understanding is just one part of the learning process, isn't it? I assume everybody has had this feeling: the professor explains maths on the blackboard, and the student follows. The students "understands" all the steps: they make sense, they don't feel like asking a question right now. Then the professor gives them an exercise slightly different and asks to do the same, and the students are completely lost.
Learning is a loop: you need to accept it, get it in your memory (learn stuff by heart, be it just the vocabulary to express the concepts), understand it, then try to do it yourself. Realise that you missed many things in the process, and start at the beginning: learn new things by heart, understand more, try it again.
That loop is still there. They have to get the AI to write the right code.
And beyond that, do they really need to understand how it works? I never learned how to calculate logarithms by hand, but I know what they are for and I know when to punch the button on the calculator.
I'll never be a top tier mathematician, but that's not my goal. My goal is to calculate things that require logs.
If they can get the AI to make working code and explain why it works, do they need to know more than that, unless they want to be top in their field?
> If they can get the AI to make working code and explain why it works, do they need to know more than that, unless they want to be top in their field?
Making working code is the easy part. Making maintainable code is a completely different story.
And again, being able to explain why something works requires superficial knowledge. This is precisely why bugs pass through code reviews: it's hard to spot a bug by reading code that looks like it should work.
I find these tools incredibly useful. But I constantly edit their output and frequently ask for changes to other peoples' code during review, some of which is AI generated.
But all of that editing and reviewing is informed by decades of writing code without these tools, and I don't know how I would have gotten the reps in without all that experience.
So I find myself bullish on this for myself and the experienced people I work with, but worried about training the next generation.
Yes I feel the same way. But I worry about my kids. My 15-year old son wanted to go into software engineering and work for a game studio. I think I'll advocate civil engineering, but for someone who will still be working 50 years from now its really hard to know what will be a good field right now.
They won't, save for a relative minority of those who enjoy doing things the hard way or those who see an emerging market they can capitalize on (slop scrubbers).
I wrote this post [1] last month to share my concerns about this exact problem. It's not that using AI is bad necessarily (I do every day), but it disincentivizes real learning and competency. And once using AI is normalized to the point where true learning (not just outcome seeking) becomes optional, all hell will break loose.
> Perhaps there is another way to develop the skills
Like sticking a fork in a light socket, the only way to truly learn is to try it and see what happens.
I dont know if im convinced by this. Like if we were talking about novels, you don't have to be a writer to check grammar and analyze plot structure in a passable way. It is possible to learn by reading instead of doing.
Sure, you could learn about grammar, plot structure, narrative style, etc. and become a reasonable novel critic. But imagine a novice who wants to learn to do this and has access to LLMs to answer any question about plots and style that they want. What should they do to become a good LLM-assisted author?
The answer to that question is very different from how to become an author before LLMs, and I'm not actually sure what the answer is. It's not "write lots of stories and get feedback", the conventional approach, but something new. And I doubt it's "have an LLM generate lots of stories for you", since you need more than that to develop the skill of understanding plot structures and making improvements.
So the point remains that there is a step of learning that we no longer know how to do.
I've had a lot of success using LLMs to deepen my understanding of topics. Give them an argument, and have them give the best points against it. Consider them, iterate. Argue against it and let it counter. It's a really good rubber duck
> The expert skills... currently come from long experience writing code
Do they? Is it the writing that's important? Or is it the thinking that goes along with it? What's stopping someone from going through LLM output, going back and forth on design decisions with the LLM, and ultimately making the final choice of how the tool should mold the codebase after seeing the options
I mean of course this requires some proactive effort on your part.. but it always did
The key point I think though is to not outsource your thinking. You can't blindly trust the output. It's a modern search engine
HIM: AI is going to take all entry level jobs soon.
ME: So the next level one up will become entry level?
HIM: Yes.
ME: Inductively, this can continue up to the CEO. What about the CEO?
HIM. Wait...
I simply don’t believe all the jobs will go away; it feels much more like the field will just be significantly pared back. There will be more opportunities for juniors eventually if it turns out to be too high of a barrier to entry and elder programmers start to retire.
This is such a non issue and so far down the list of questions. Weve invented AI that can code, and you're asking about career progression? Thats the the top thing to talk about? Weve given life to essentially an alien life form
"What is this going to do to humans?" is probably the #1 question that should be on the mind of every engineer, every day. Being toolmakers for civilization is the entire point of our profession.
I'll take the opposite view of most people. Expertise is a bad thing. We should embrace technological changes that render expertise economically irrelevant with open arms.
Take a domain like US taxation. You can certainly become an expert in that, and many people do. Is it a good thing that US taxes are so complicated that we have a market demand for thousands of such experts? Most people would say no.
Don't get my wronf, I've been coding for more years of being alive than I haven't by this point, I love the craft. I still think younger me would have far preferred a world where he could have just had GPT do it all for him so he didn't need to spend his lunch hours poring over the finer points of e.g. Python iterators.
By the same logic we should allow anyone with an LLM to design ships, bridges, and airliners.
Clearly, it would be very unwise to buy a bridge designed by an LLM.
It's part of a more general problem - the engineering expectations for software development are much lower than for other professions. If your AAA game crashes, people get annoyed but no one dies. If your air traffic control system fails, you - and a large number of other poeple - are going to have a bad day.
The industry that has a kind of glib unseriousness about engineering quality - not theoretical quality, based on rules of thumb like DRY or faddy practices, but measurable reliability metrics.
The concept of reliability metrics doesn't even figure in the LLM conversation.
> We should embrace technological changes that render expertise economically irrelevant with open arms.
To use your example, is using AI to file your taxes actually "rendering [tax] expertise economically irrelevant?" Or is it just papering over the over-complicated tax system?
From the perspective of someone with access to the AI tool, you've somewhat eased the burden. But you haven't actually solved the underlying problem (with the actual solution obviously being a simpler tax code). You have, on the other hand, added an extra dependency on top of an already over-complicated system.
This. And most of the time the code isn't that complex either. The complexity of a software product often isn't in the code, it's in the solution as a whole, the why's of each decision, not the how.
However whenever I've faced actual hard tasks, things that require going off the beaten path the AI trains on, I've found it severely lacking, no matter how much or little context I give it, no matter how many new chats I make, it just won't veer into truly new territory.
I never said anything about using AI to do your taxes.
I was drawing an analogy. We would probably be better off with a tax system that wasn't so complicated it creates its own specialized workforce. Similarly we would be better off with programming tools that make the task so simple that professional computer programmers feel like a 20th century anachronism. It might not be what we personally want as people who work in the field, but it's for the best.
> I never said anything about using AI to do your taxes. I was drawing an analogy.
Yeah, I was using your analogy.
> It might not be what we personally want as people who work in the field, but it's for the best.
You're inventing a narrative and borderline making a strawman argument. I said nothing about what people who work in the field "personally want." I'm talking about complexity.
> Similarly we would be better off with programming tools that make the task so simple that professional computer programmers feel like a 20th century anachronism.
My point is that if the "tools that make the task simple" don't actually simplify what's happening in the background, but rather paper over it with additional complexity, then no, we would not "be better off" with that situation. An individual with access to an AI tool might feel that he's better off; anyone without access to those tools (now or in the future) would be screwed, and the underlying complexity may still create other (possibly unforeseen) problems as that ecosystem grows.
The question then becomes whether or not it's possible (or will be possible) to effectively use these LLMs for coding without already being an expert. Right now, building anything remotely complicated with an LLM, without scouring over every line of code generated, is not possible.
Nowhere do I suggest using AI to do your taxes. My point was, if you think it's bad taxes are complicated enough that many people need to hire a professional to do it, you should also think it's bad programming is complicated enough that many people need to hire a professional to do it.
For what it's worth, this is a uniquely American view of copyright:
> The ONLY reason to have any law prohibiting unlicensed copying of intangible property is to incentivize the creation of intangible property.
In Europe, particularly France, copyright arose for a very different reason: to protect an author's moral rights as the creator of the work. It was seen as immoral to allow someone's work -- their intellectual offspring -- to be meddled with by others without their permission. Your work represents you and your reputation, and for others to redistribute it is an insult to your dignity.
That is why copyrights in Europe started with much longer durations than they did in the United States, and the US has gradually caught up. It is not entirely a Disney effect, but a fundamental difference in the purpose of copyright.
I'm not sure about other intellectual property rights. I do know there's a similar dichotomy for privacy: Americans tend to view it in terms of property rights (ownership of your image, data, etc.) while Europeans view it as a matter of protecting personal dignity. That leads to different decisions: for instance, there's a famous French case of an artist making a nude sketch of a portrait subject, and being unable to sell the sketch because it violated the subject's privacy rights, despite being the owner of the intellectual property.
Peter Baldwin's Copyright Wars is a good overview of the European vs American attitudes to copyright in general.
Maybe, but I'm not sure how much the style is deliberate vs. a consequence of the post-training tasks like summarization and problem solving. Without seeing the post-training tasks and rating systems it's hard to judge if it's a deliberate style or an emergent consequence of other things.
But it's definitely the case that base models sound more human than instruction-tuned variants. And the shift isn't just vocabulary, it's also in grammar and rhetorical style. There's a shift toward longer words, but also participial phrases, phrasal coordination (with "and" and "or"), and nominalizations (turning adjectives/adverbs into nouns, like "development" or "naturalness"). https://arxiv.org/abs/2410.16107
How is "development" an adverb or adjective turned into a noun??
It comes from a French word (développement) and that in turns was just a natural derivation of the verb "développer"... no adverbs or adjectives (English or otherwise) seem to come into play here
Sorry, I should have said adjectives or verbs, as it's "develop" turned into a noun. Just like "discernment" or "punishment". The etymology isn't relevant for classifying it as a nominalization, only the grammatical function.
Accessibility in PDFs is also very difficult. I'm not sure any publishers are yet meeting PDF/UA-2 requirements for tagged PDFs, which include things like embedding MathML representations of all mathematics so screenreaders can parse the math. LaTeX only supports this experimentally, and few other tools support it at all.
reply