One of my students recently came to me with an interesting dilemma. His sister had written (without AI tools) an essay for another class, and her teacher told her that an "AI detection tool" had classified it as having been written by AI with "100% confidence". He was going to give her a zero on the assignment.
Putting aside the ludicrous confidence score, the student's question was: how could his sister convince the teacher she had actually written the essay herself? My only suggestion was for her to ask the teacher to sit down with her and have a 30-60 minute oral discussion on the essay so she could demonstrate she in fact knew the material. It's a dilemma that an increasing number of honest students will face, unfortunately.
I wouldn't mind seeing education return to its roots of being about learning instead of credentialization. In an age where having a degree is increasingly meaningless in part due to many places simply becoming thinly veiled diploma treadmills (which are somehow nonetheless accredited), this is probably more important than ever. This is doubly so if the AI impact extremists end up being correct.
So why is the issue you described an issue? Because it's about a grade. And the reason that's relevant is because that credential will then be used to determine where she can to to university which, in turn, is a credential that will determine her breadth of options for starting her career, and so on. But why is this all done by credentials instead of simple demonstrations of skill? What somebody scored in a high school writing class should matter far less than the output somebody is capable of producing when given a prompt and an hour in a closed setting. This is how you used to apply to colleges. Here [1], for instance, is Harvard's exam from 1869. If you pass it, you're in. Simple as that.
Obviously this creates a problem of institutions starting to 'teach the test', but with sufficiently broad testing I don't see this as a problem. If a writing class can teach somebody to write a compelling essay based on an arbitrary prompt, then that was simply a good writing class! As an aside this would also add a major selling point to all of the top universities that offer free educational courses online. Right now I think 'normal' people are mostly disinterested in those because of the lack of widely accepted credentials, which is just so backwards - people are actively seeking to maximize credentials over maximizing learning.
This is one of the very few places I think big tech in the US has done a great job. Coding interviews can be justifiably critiqued in many ways, but it's still a much better system than raw credentialization.
I still don't understand why standardized testing gets so much pushback. Having the students do their work in a controlled environment is the obvious solution to AI and many other problems related to academic integrity.
Its also the only way that students can actually be held to the same standards. When I was a freshman in college with a 3.4 highschool GPA, I was absolutely gobsmacked by how many kids with perfect >= 4.0 GPAs couldn't pass the simple algebra test that the university administered to all undergraduates as a prerequisite for taking any advanced mathematics course.
Well, for one thing, people learn differently and comparing a "standard" test result just measures how much crap someone has been able to cram into their brain. I compare it to people memorizing trivia for Jeopardy. Instead what needs to be tested and taught is critical thinking. Yes a general idea of history and others is important, but again its teaching people to think about those subject, not just memorizing a bunch of dates that will be forgotten the day after the test.
you cannot possibly do any higher level of analysis of any subject if you dont even know the base facts. its the equivalent of saying you dont need to know your times tables to do physics. Like, theoretically its possible to look up 4x6 every time you need to do arithmetic but why would you not just memorize it.
If you dont even know that the american civil war ended in 1865 how could you do any meaningful analysis on its downstream implications or causes and its relationship to other events.
More important than knowing what 4x6 is, is understanding what multiplication is, why division is really the same operation, understanding commutative, associative, distributive properties of operations, etc. All of this comes as a result of repeated drilling of multiplication problem sets. Once this has been assimilated, you can move on to more abstract concepts that build on that foundation, and at that point sure you can use a calculator to work out the product of two integers as a convenience.
That's an argument against ranking students in a way which can potentially determine their entire lives, not an argument against standardized testing. The alternative to standardized testing is GPA which is an extremely subjective metric masquerading as some objective datapoint. Kids from different schools don't even have the same curricula let alone the same grading standards.
Also the teachers have a vested interest in giving the highest grades they can to as many students as they can without making it obvious that they aren't actually grading them fairly. i don't mean this as an accusation against anybody or some sort of insult against teachers as a whole, I merely mean to point out that this is what they are incentized to do by virtue of the fact that they are indirectly grading themselves by grading their students.
Nah. Goodhart's law is literally just "if you play a matrix game don't announce your pick in advance". It is not a real law, or not different from common sense. (By matrix game I mean what wiki calls "Normal form game[0]", e.g. rock-paper-scissors or prisoner's dilemma.)
In education, regarding exams, Goodhart's law just means that you should randomize your test questions instead of telling the students the questions before the exam. Have a wide set of questions, randomize them. The only way for students to pass is to learn the material.
A randomized standardized test is not more susceptible to Goodhart's law than a randomized personal test. The latter however has many additional problems.
That's not even remotely true. A randomized standardized test will still have some domain that it chooses its questions from and that domain will be perfectly susceptible to Goodhart's Law. It is already the case that no one is literally teaching "On the SAT you're going to get this problem about triangle similarity and the answer is C." When a fresh batch of students sits down in front of some year's SATs the test is still effectively "randomized" relative to the education they received. But that randomization is relative to a rigid standardized curriculum and the teaching was absolutely Goodhart'd relative to that curriculum.
"The only way for students to pass is to learn the material."
Part of Goodhart's law in this context is precisely that it overdetermines "the material" and there is no way around this.
I wish Goodhart's law was as easy to dodge as you think it is, but it isn't.
I do not believe schooling is purely an exercise in knowledge transfer, especially grade school.
School needs to provide opportunities to practice applying important skills like empathy, tenacity, self-regulation, creativity, patience, collaboration, critical thinking, and others that cannot be assessed using a multiple choice quiz taken in silence. When funding is tied to performance on trivia, all of the above suffers.
> In an age where having a degree is increasingly meaningless
I wish I would agree with you, but I think that having a degree (or rather the right degree) is more important than ever.
Basically grades exist to decide who gets a laid back high paying job, and who has to work 2 low paying labor intensive job just to live paycheck to paycheck.
As one teacher told me once: we could have all of you practice chess, make a big tournament and you get to choose your university based on your chess ranking. It wouldn't be any less stupid than the current system.
> Basically grades exist to decide who gets a laid back high paying job, and who has to work 2 low paying labor intensive job just to live paycheck to paycheck.
Rather:
Grades exist to decide who gets a stressful, but rather high-paying job, and who has to work 2 low paying labor intensive job just to live paycheck to paycheck.
>What somebody scored in a high school writing class should matter far less than the output somebody is capable of producing when given a prompt and an hour in a closed setting
Right, in an ideal world we'd peer into the minds of people and compute what they know. But if we did that, our eyes would probably catch on fire like that lady in Kingdom of the Crystal Skull.
We need some way to distill the unbelievable amount of data in human brains into something that can be processed in a reasonable amount of time. We need a measurement - a degree, a GPA, something.
Imagine if in every job interview they could assume absolutely nothing. They know nothing about your education. They might start by asking you to recite your ABCs and then, finally at sunset, you might get to a coding exam. Which still won't work, because you'll just AI cheat the coding exam.
We require gatekeepers to make the system work. If we allow the gatekeepers to just rubber stamp based off of if stuff seems correct, that tells us nothing about the person itself. We want the measurement to get close to the real understanding.
That means AI papers have to be given a 0, which means we need to know if something is AI generated. And we want to catch this at the education level, not above.
I did have interviews with a government agency many years ago that, among other things, involved a battery of tests including what I assume were foreign civil service exams. I got an offer though I didn't take it.
But assuming in-person day long batteries of tests for universities and companies is probably not very practical.
You can argue whether university is a very efficient use of time or money but it presumably does involve some learning and offers potential employers some level of a filter that roughly aligns with what they're looking for.
No, they used to do IQ tests which were found discriminatory (and don't correlate with job performance). Tests and problems directly or indirectly related to the job are not only legal, but are commonly used (leetcode).
In a world where some but not all programs are “diploma treadmills,” you would expect that the reputation of the bad credentials would go down and the good credentials would go up. In some sense if the credentials were really being used (and not just as a perfunctory first pass elimination), you’d expect the most elite programs to have the highest signal to noise ratio. But the market doesn’t seem to respond to changes in credentialing capability (by hiring more from programs that start focusing on the “right” things to test). Instead it’s really just a background check.
> you would expect that the reputation of the bad credentials would go down and the good credentials would go up.
We should expect this if employers can efficiently and objectively evaluate a candidate's skills without relying on credentials. When they're unable to, we should worry about this information asymmetry leading to a "market for lemons" [0]. I found an article [1] about how this could play out:
> This scenario leads to a clear case of information asymmetry since only the graduate knows whether their degree reflects real proficiency, while employers have no reliable way to verify this. This mirrors the classic “Market for Lemons” concept introduced by economist George Akerlof in 1970, where the presence of low-quality goods (or in this case, under-skilled graduates) drives down the perceived value of all goods, due to a lack of trustworthy signals.
In the US, it's also because there are so many options that it's not feasible to have a clear ranking of schools outside of the extreme ends of the spectrum.
> But the market doesn’t seem to respond to changes in credentialing capability
I mean, it certainly seems to. I've been in hiring roles in tech for 20-ish years, and have definitely seen changes in how college hire patterns based on credential values. Some schools have gone way up in how much we value their credentials (Waterloo), some have gone somewhat down (MIT), etc.
> This is one of the very few places I think big tech in the US has done a great job. Coding interviews can be justifiably critiqued in many ways, but it's still a much better system than raw credentialization.
Just so we're clear, the coding tests are in addition to credentialisation. I'll never forget when I worked at Big Tech (from Ireland) and I would constantly hear recruiters talk about the OK school list (basically the Ivy league). Additionally, I remember having to check the University a candidate had attended before she had an interview with one of our directors.
He was fine with her, because she had gone to Oxford. Honestly, I'm surprised that I was able to get hired there given all this nonsense.
My experience with big tech has been the polar opposite - nobody has ever cared and I've never tried to hide it either. Which one was it if you don't mind me asking?
I'm a drop out (didn't finish BSc) from a no name Northern European university and I've worked at or gotten offers from:
- Meta
- Amazon
- Google
- Microsoft
- Uber
- xAI
+ some unicorns that compete with FAANG+ locally.
I didn't include some others that have reached out for interviews which I declined at the time. The lack of a degree has literally never come up for me.
Once you have a relevant work history, a degree matters much less. It still does to some employers, however, for whom it's a simple filter on applicants: No degree? Resume into the bin.
Hiring is still a pretty non-uniform thing despite attempts to make it less so - I'm sure there are some teams and orgs at all these large companies that do it well, and some that do it les well. I think it is pretty well accepted that university brand is not a good signal, but it is an easy signal and if the folks in the hiring process are a bit lazy and pressed for time, a bit overwhelmed by the number of inbound candidates, or don't really know how to evaluate for the role competencies, I think it's a tool that is still reached for today.
In a way, I think the hiring process at second-tier (not FAANG) companies is actually better because you have to "moneyball" a little bit - you know that you're going to lose the most-credentialed people to other companies that can beat you dollar for dollar, so you actually have to think a little more deeply about what a role really needs to find the right person.
If anything, it will get worse. There was a deficit of tech workers, from now on, there will be an excess. Which means that differentiators will be even more important.
Always stunned by how much teachers can accuse without proof and invert the "innocent until proven guilty".
Honestly, students should have a course in "how the justice system works" (or at least should work). So should the teachers.
Student unions and similar entities should exist and be ready to intervene to help students in such situations.
This is nothing new, AI will just make this happen more often, revealing how stupid so many teachers are. But when someone spent thousands for a tool, which purports to be reliable, and is so quick to use, how can an average person resist it? The teacher is as lazy as the cheaters they intend to catch.
Student unions tend to focus on all sorts of other issues, I wouldn't trust them to handle cases like this.
The only way to reliably prevent the use of AI tools without punishing innocent students is to monitor the students while they work.
Schools can either do that by having essays be written on premise, either by hand or by using computers managed by the school.
But students that are worried that they will be targeted can also do this themselves, by setting up their phone to film them while working.
And if they do this, and the teacher tries to punish someone who can prove they wrote the essay themselves, either the teacher or the school should hopefully learn that such tools can't be trusted.
It's also the case that even pre-Web and certainly pre-LLMs, different schools and even departments within schools had different rules about working with other students on problem sets. In some cases, that was pretty much the norm, in others strictly verboten.
It’s strange watching people put so much faith in these so called “AI detection tools”. Nobody really knows how they work yet they’re treated like flawless judges. In practice they’re black boxes that quietly decide who gets flagged for “fraud”, and because the tool said so everyone pretends it must be true. The result is a neat illusion that all the “cheaters” were caught, when in reality the system is mostly just picking people at random and giving the process a fake sense of certainty.
I hope this could be a "teachable moment" for all involved: have some students complete their assignments in person, then submit their "guaranteed to be not AI written" essays to said AI detection tool. Objectively measure how many false positives it reports.
When I was in college, there was a cheating scandal for the final exam where somehow people got their hands on the hardest question of the exam.
The professor noticed it (presumably via seeing poor "show your work") and gave zero points on the question to everyone. And once you went to complain about your grade, she would ask you to explain the answer there in her office and work through the problem live.
I thought it was a clever and graceful way to deal with it.
I think this kind of approach is the root of (the US's) hustle culture. Instead of receiving a fair score, you get a zero and need to "hustle" and challenge your teacher.
The teacher effectively filtered out the shy boys/girls who are not brave enough to "hustle." Gracefully.
Nah, the professor wasn't American (as is often the case) and she had a tricky situation. She had strong reasons to believe people were cheating and had to sort out who did and who did not in a swift way.
This has nothing to do with American Hustle culture and just with that professor's judgment.
They had to challenge her first. So, yes, challenging her was the only way to get better grade. And you still knew im advance what questions are giing to be.
Cheaters and non cheaters were punished in exactly the same way. Effectively cheating gave you an advantage and being shy gave you disadvantage.
Only if she advertised it somehow. The dick version is, of course, to tell the class that “you know, until now, if you had come in to challenge your grade I would have let you fix it. Too late now!”
Except they did not learned to not be shy. There was no such lesson. This is like saying that stealing from a student is ok, because it is teaching them thieves exist.
They learned that cheating gives advantage to the cheating individual. They also learned that reporting cheating harms them and non cheaters.
Lol, in 3rd grade algebra, a teacher called 2 of us in for cheating. She had us take the test again, I got the same exact horribly failing score (a 38%) and the cheater got a better score, so the teacher then knew who the cheater was. He just chose the wrong classmate to cheat of of.
I assume that the cheating student didn't know that he was copying answers from someone who was doing poorly. It was third graders after all; one wouldn't necessarily expect them to be able to pick the best target every time.
Oh. That would have never crossed my mind! So the cheater student was copying from GP who had worse results, and when they both redid it all by themselves the cheater answered correctly, and GP did not.
> Which, in a subject like algebra, is extremely suspicious ("how could both of them get the exact same WRONG answer?").
In Germany, the traditional sharp-tongued answer of pupils to the question "How could both of you get the exact same WRONG answer (in the test)?" is: "Well, we both have the same teacher." :-)
My son is learning algebra in 2nd grade. They don’t call it “algebra” yet nor mention “variables”, but they’re working on questions like solving “4 + ? = 9”.
He just goes to our local public elementary school.
Yeah I guess technically that's algebra but at that age it is based on memorization (you just learn that 4 + 5 = 9) and you're not actually using algebra to solve the problem e.g. "subtract 4 from both sides of the equation."
Except the power imbalance: position, experience, social, etc. meant that the vast majority just took the zero and never complained or challenged the prof. Sounds like your typical out-of-touch academic who thought they were super clever.
It's an incredible abuse of power to intentionally mark innocent students' answers wrong when they're correct. Just to solve your own problem, that you may very well be responsible for.
Knowing the way a lot of professors act, I'm not surprised, but it's always disheartening to see how many behave like petty tyrants who are happy to throw around their power over the young.
If you cheat, you should get a zero. How is this controversial.
Since high school, the expectation is that you show your work. I remember my high school calculus teacher didn't even LOOK at the final answer - only the work.
The nice thing was that if you made a trivial mistake, like adding 2 + 2 = 5, you got 95% of the credit. It worked out to be massively beneficial for students.
The same thing continued in programming classes. We wrote our programs on paper. The teacher didn't compile anything. They didn't care much if you missed a semicolon, or called a library function by a wrong name. They cared if the overall structure and algorithms were correct. It was all analyzed statically.
I understand both that this is valuable AND how many (most?) education environments are (supposed) to work, but 2 interesting things can happen with the best & brightest:
1. they skip what are to them the obvious steps (we all do as we achieve mastery) and then get penalized for not showing their work.
2. they inherently know and understand the task abut not the mechanized minutia. Think of learning a new language. A diligent student can work through the problem and complete an a->b translation, then go the other way, and repeat. Someone with mastery doesn't do this; they think within one language and then only pass the contextual meaning back and forth when explicitly required.
"showing your work" is really the same thing as "explain how you think" and may be great for basics in learning, but also faces levels of abstraction as you ascend towards mastery.
It's like with the justice system: if you have to choose between the risk of jailing an innocent and the risk letting a guilty person go free, you choose to let a guilty person go free. All the time.
Unless you're 100% sure that a student cheated, you don't punish them. And you don't ask them to prove they're innocent.
It's not great for the teacher though. They're the ones who will truly suffer from the proliferation of AI - increased complexity of work around spotting cheating 'solved' by a huge increase in time pressure. Faced with that teachers will have three options: accept AI detection as gospel without appeals and be accused of unfairness or being bad at the job by parents, spend time on appeals to the detriment of other duties leading to more accusations of being bad at the job, or leave teaching and get an easier (and probably less stressful and higher paid) job. Given those choices I'd pick the third option.
4. Use AI to talk to the student to find out if they understand.
Tests were created to save money, more students per teacher, we're just going back to the older, actually useful, method of talking to people to see if they understand what they've been taught.
You weren't asked to write an essay because someone wanted to read your essay, only to intuit that you've understood something
I really believe this is the way forward, but how do you make sure the AI is speaking to the student rather than to another AI impersonating the student? You could make it in person but that's a bit sad.
Both can be true at the same time. You outlined the objective, the money is an extra constraint (and let's be honest, when isn't money an extra constraint?)
> Tests are a way of largely seeing if a response to a question was memorized.
Some tests require memorized knowledge, like what is the stall speed of your airplane. Some tests require reasoning skills, like what is the stress in this beam.
option 4b: resolve the teacher from being the gatekeeper who has to "prove" knowledge has been imparted, accepted and consolidated? It's your idea, but with explicit candor and not a sly wink :)
I agree. Most campuses use a product called Turnitin, which was originally designed to check for plagiarism. Now they claim it can detect AI-generated content with about 80% accuracy, but I don’t think anyone here believes that.
I had Turn It In mark my work as plagiarism some years ago and I had to fight for it. It was clear the teacher wasn’t doing their job and blindly following the tool.
What happened is that I did a Q&A worksheet but in each section of my report I reiterated the question in italics before answering it.
The reiterated questions of course came up as 100% plagiarism because they were just copied from the worksheet.
This matches my experience pretty well. My high school was using it 15 years ago and it was a spotty, inconsistent morass even back then. Our papers were turned in over the course of the semester, and late into the year you’d get flagged for “plagiarizing” your own earlier paper.
80% accuracy could mean 0 false negatives and 20% false positives.
My point is that accuracy is a terrible metric here and sensitivity, specificity tell us much more relevant information to the task at hand. In that formulation, a specificity < 1 is going to have false positives and it isn't fair to those students to have to prove their innocence.
That's more like the false positive rate and false negative rate.
If we're being literal, accuracy is (number correct guesses) / (total number of guesses). Maybe the folks at turnitin don't actually mean 'accuracy', but if they're selling an AI/ML product they should at least know their metrics.
It depends on their test dataset. If the test set was written 80% by AI and 20% by humans, a tool that labels every essay as AI-written would have a reported accuracy of 80%. That's why other metrics such as specificity and sensitivity (among many others) are commonly reported as well.
Just speaking in general here -- I don't know what specific phrasing TurnItIn uses.
The promise (not saying that it works) is probably that 20% of people who cheated will not get caught. Not that 20% of the work marked as AI is actually written by humans.
I suppose 80% means you don't give them a 0 mark because the software says it's AI, you only do so if you have other evidence reinforcing the possibility.
you're missing out on the false positives though; catching 80% of cheaters might be acceptable but 20% false positives (not the same thing as 20% of the class) would not be acceptable. AI generated content and plagarism are completely different detection problems.
Had a professor use this but it was student-led. We had to run it through ourselves and change our stuff enough to get a high enough mark to pass TurnItIn. Avoided the false allegations problems at least.
If they are serious they should realize that "80% accuracy" is almost meaningless for this kind of classifier. They should publish a confusion matrix if they haven't already.
There have always been problems like this. I had a classmate who wrote poems and short stories since age 6. No teacher believed she wrote those herself. She became a poet, translator and writer and admitted herself later in life that she wouldn't have believed it herself.
My son recently told me his teacher used him as an example for the class as someone who wrote a good piece himself. Teacher accused all the other students of using AI.
He also told me that he had in fact used AI, but asked AI multiple times to simplify the text, and he had entered the simplified version. He liked the first version best, but was aware his teacher would consider it written by AI.
Exactly this. It really is this easy. You have the full class period to write an easy on the economic causes of the civil war .. or on gender roles in pride and prejudice, or on similarities and differences on morality from the stoic ideals to Christianity in the Roman Empire. Kind of like most of my 90’s era college experience.
Nothing. Word is getting around about how to do this. I anticipate that in another couple of years it'll have diffused to everyone, except the constant crew of new younglings who have to find out and be told about it from their older siblings and such.
"AI detection" wasn't even a solution in the short term and it won't be going forward. Take-home essays are dead, the teachers are collectively just hoping some superhero will swoop in and somehow save them. Sometimes such a thing is possible, but it isn't going to happen this time.
Yes. The ship has sailed and in fact it sailed away many, many years ago. Modernity now has to reckon with Brandolini's law at the scale of these AI systems, which, depending on the system you're inside of, can vary from "this is easy to refute" to "don't bother, assume it's all bullshit."
I wonder if doing this would actually be a step closer to learning (from not doing anything at all). To put it in your own style, you are forced to read the output and probably understand the basic concepts of what the LLM provides
Probably so, assuming that what it spits out is actually real and not some hallucination, but that's not at all a given. And I also assume that the people most inclined to regurgitating what an LLM spits out are also heavily overlapped with the people who are least likely to verify that the information is correct, or verify primary sources, or even think to ask for sources in the first place.
My high school history teacher gave me an F on my term paper. I asked him why, and he said it was "too good" for a high school student. The next day I dumped on his desk all the cited books, which were obscure and in my dad's extensive collection. He capitulated, but disliked me ever since.
> language models are more likely to suggest that speakers of [African American English] be assigned less-prestigious jobs, be convicted of crimes and be sentenced to death.
This one is just so extra insidious to me, because it can happen even when a well-meaning human has already "sanitized" overt references to race/ethnicity, because the model is just that good at learning (bad but real) signals in the source data.
Family law judges, in my small experience, are so uninterested in the basic facts of a case that I would actually trust an LLM to do a better job. Not quite what you mean, but maybe there is a silver lining.
We are already (in the US) living in a system of soft social-credit scores administered by ad tech firms and non-profits. So “the algorithms says you’re guilty” has already been happening in less dramatic ways.
Write it in something like Google docs that tracks changes and then share the link with the revision history.
If this is insufficient, then there are tools specifically for education contexts that track student writing process.
Detecting the whole essay being copied and pasted from an outside source is trivial. Detecting artificial typing patterns is a little more tricky, but also feasible. These methods dramatically increase the effort required to get away with having AI do the work for you, which diminishes the benefit of the shortcut and influences more students to do the work themselves. It also protects the honest students from false positives.
Thought it is a good idea at first, but can easily be defeated with typing out AI contents. One can add pauses/deletions/edits or true edits from joining ideas different AI outputs.
> Detecting artificial typing patterns is a little more tricky, but also feasible.
Keystroke dynamics can detect artificial typing patterns (copying another source by typing it out manually). If a student has to go way out of their way to make their behavior appear authentic then it's decreasing advantage of cheating and less students will do it.
If the student is integrating answers from multiple AI responses then maybe that's a good thing for them to be learning and the assessment should allow it.
Not 0 time, but yes, integrity preservation is an arms race.
The best solutions are in student motivations and optimal pedagogical design. Students who want to learn, and learning systems that are optimized for rate of learning.
In some narrow contexts that is easy, but in many other contexts that is not easy, or doesn't actually solve it.
online programs, limited infrastructure, dishonest students exploiting accessibility programs, are some examples where it's easier to say than do what you're suggesting.
Also AI can help students cheat in class too. Smart glasses, pens with cameras and LED screens on them (yes really), or just regular smart phones. Even switching to pen and paper won't reduce the ease of access.
Instructors don't want to police cheating, they want to teach (or do research). Either way, they don't want to police.
Students cheat when they think what they're learning is low value, the learning process is too clunky, or they place too high a value on the grade. All these imbalances can be improved with better pedagogy.
The only enduring way to actually solve the cheating crisis isn't to make it harder, it's to reduce the value of cheating. Everything else is either temporary or performative.
Depends how you work. I've rarely (never?) drafted anything and almost all of the first approach ended up in the final result. It would look pretty close to "typed in the AI answer with very minor modifications after". I'm not saying that was a great way to do it, but I definitely wouldn't want to be failed for that.
There is a fractal pattern between authentic and inauthentic writing.
Crude tools (like Google docs revision history) can protect an honest student who engages in a typical editing process from false allegations, but it can also protect a dishonest student who fabricated the evidence, and fail to protect an honest student who didn't do any substantial editing.
More sophisticated tools can do a better job of untangling the fractal, but as with fractal shaped problems the layers of complexity keep going and there's no perfect solutions, just tools that help in some situations when used by competent users.
The higher Ed professors who really care about academic integrity are rare, but they are layering many technical and logistical solutions to fight back against the dishonest students.
I don't mean formal multiple drafts. Even just editing bits, moving stuff around.
I guess some people can type out a 5,000 word assignment linearly from start to finish in 2 hours at 40wpm but that's both incredibly rare and easy to verify upon further investigation.
Not really, also the timing of the saves won't reflect the expected work needing to be put in. Unless you are taking the same amount of time to feed in the AI output as a normal student used to actually write / edit the paper, at which point cheating is meaningless
> My only suggestion was for her to ask the teacher to sit down with her and have a 30-60 minute oral discussion on the essay so she could demonstrate she in fact knew the material.
This sounds like, a good solution? It’s the exception case, so shouldn’t be constant (false positives), although I suppose this fails if everyone cheats and everyone wants to claim innocence.
You hinted to it but at what point are you basically giving individual oral exams to the entire class for every assignment? There are surveys where 80% of high school students self report using AI on assignments.
I guess we could go back to giving exams soviet Russia style where you get a couple of questions that you have to answer orally in front of the whole class and that’s your grade. Not fun…
You don't need oral exams, you just need in-person. So a written test in the classroom, under exam conditions, would suffice.
In this particular resolution example, it would be quicker to ask the student some probing questions versus have them re-write (and potentially regurgitate) an essay.
My current idea for this is to have AI administer the 1:1 oral exam. I’m quite confident this would work through grade school at least.
For exams you’d need a proctored environment of some sort, say a row of conference booths so students can’t just bring notes.
You’d want to have some system for ephemeral recording so the teachers can do a risk-based audit and sample some %, eg one-two questions from each student.
Honestly for regular weekly assignments you might not even need the heavyweight proctoring and could maybe allow notes, since you can tell if someone knows what they are talking about in conversation , it’s impossible to crib-sheet your way to fluent conversational understanding.
2. Speaking about your work in front of 1-2-5 people is one thing, but being tested in front of an entire class (30 people?) is a totally different thing.
In high school English, someone (rotating order) had to give a 5-10 minute talk about something in front of the class every week/class. Seems like a pretty good idea in general.
The oral discussion does not scale well in large classes. The solution is to stop using essays for evaluation, relying on (supervised) examinations instead.
Of course, there will be complaints from many students. However, as a prof for decades, I can say that some will prefer an exam-based solution. This includes the students who are working their way through university and don't have much time for busy-work, along with students who write their essays themselves and get lower grades than those who do not.
It's not that hard to prove that you did the work and not an AI. Show your work. Explain to the teacher why you wrote what you did, why that particular approach to the narrative appealed to you and you chose that as the basis for your work. Show an outline on which the paper was based. Show rough drafts. Explain how you revised the work, where you found your references, and why you retained some sources in the paper and not others.
To wit, show the teacher that YOU did the work and not someone else. If the teacher is not willing to do this with every student they accuse of malfeasance, they need to find another job. They're lazy as hell and suck at teaching.
Computer, show "my" work and explain to the teacher why "I" wrote what "I" did, describe why that particular approach to the narrative appealed to "me" and "I" chose that as the basis of "my" work. Produce an outline on which the paper could have been based and possible rough drafts, then explain how I could have revised the work to produce the final result.
And if you do all of that, and memorize it well enough to have an in-person debate with the teacher over whether or not you did the work, then maybe that's close enough to actually doing the work?
Doesn't google docs have fairly robust edit history? If I was a student these days I'd either record my screen of me doing my homework, or at least work in google docs and share the edit history.
Yeah that was my thought. Although, I went a bit more paranoid with it.
If it looks like AI cheating software will be a problem for my children (and currently it has not been an issue), then I'm considering recording them doing all of their homework.
I suspect school admin only has so much appetite for dealing with an irate parent demanding a real time review of 10 hours of video evidence showing no AI cheating.
Not really, document editors save every few moments. Someone cheating with AI assistance will not have a similar saved version pattern as someone writing and editing themselves. And if one did have the same pattern, it would defeat the purpose of cheating because it would take a similar amount of time to pull off
honest q: what would it look like from your perspective if someone worked in entirely different tools and then only moved their finished work to google docs at the end?
In this case, the school was providing chromebooks so Google Docs were the default option. Using a different computer isn’t inherently a negative signal - but if we are already talking about plagiarism concerns, I’m going to start asking questions that are likely to reveal your understanding of the content. If your understanding falters, I’m going to ask you to prove your abilities in a different way/medium/etc.
In general, I don’t really understand educators hyperventilating about LLM use. If you can’t tell what your students are independently capable of and are merely asking them to spit back content at you, you’re not doing a good job.
> In general, I don’t really understand educators hyperventilating about LLM use. If you can’t tell what your students are independently capable of and are merely asking them to spit back content at you, you’re not doing a good job.
I've had the same problem online for years, when I translate something people presume I am using Google Translate (even though in one case said language isn't on Google Translate — I checked!)... Or got the answer off Wikipedia.
One of the funniest things was being accused of plagiarising Wikipedia, when I'd actually written most of the Wikipedia article on said subject. The irony... Wikipedia doesn't just use unpaid labour, it ends up undermining the people who wrote it.
> when I'd actually written most of the Wikipedia article on said subject. The irony... Wikipedia doesn't just use unpaid labour, it ends up undermining the people who wrote it.
Surely it would be relatively easy to offer to show the edit history to prove that you actually contributed to the article? And, by doing so, would flip the situation in your favour by demonstrating your expertise?
The fact that you should have to is pretty annoying but also fairly edge case. And if a teacher or institute refuses to review that evidence then I don't think the credential on the table worth the paper it's printed on anyway.
The new trick being used by some professors in college classes is to mandate a specific document editor with a history view. If the document has unusual copy/paste patterns or was written in unusual haste then they may flag it. That being said, they support use of ai in the class and have confidence the student is not able to one shot the assignment with ai.
"Please take this finished essay and write me a rough first draft version of it that looks like something someone might type in on the fly before editing"
Seems like this could be practically addressed by teachers adopting the TSA's randomized screening. That is, roll some dice to figure out which student on a given assignment comes in either for the oral discussion or-- perhaps in higher grades-- to write the essay in realtime.
It should be way easier than TSA's goal because you don't need to stop cheaters. You instead just need to ensure that you seed skills into a minimal number of achievers so that the rest of the kids see what the real target of education looks like. Kids try their best not to learn, but when the need kicks in they learn way better spontaneously from their peers than any other method.
Of course, this all assumes an effective pre-K reading program in the first place.
> Of course, this all assumes an effective pre-K reading program in the first place.
Pre-k is preschool aka kindergarten?
Is this really needed? It's really stressful for kids under 5 or 6 to read and is there a big enough statistical difference in outcome enough to rob them of some of their early youth?
I started reading around 6 years old and I was probably ahead of the vast majority of kids within 6 months.
Kids starting around 6 years old have much better focus and also greatly enhanced mental abilities overall.
It creates another layer of arbitrary gate-keeping. I experience this in interviews too. If I need to think about the low latency response I can give to derive djikstras algo verbatim, I get accused of reading notes on another monitor, not someone who studied for success.
It is beginning to become an awful situation where these companies are selling tools that undermine the student. Education is suppose to be the great equalizer in society, not another toggle or tool for oppression.
Years ago in the 90's my brother wrote a short story for a fourth grade assignment and the teacher accused him of plagiarism because how could a 9 year old come up with such a vivid and elaborate story so he received a zero. I forget the details but my mother went to the school making a big stink and eventually had the zero changed to an A.
The real problem here is (in this case) lazy teachers. These kind of tools should only be used to flag potential AI generation. If the teacher read the essay and thought it reflected standard work for this student, then all would be fine. Instead they are just running the tool first to be lazy and taking the tool as gospel.
This reminds me of when GPS routing devices first came onto the scene. Lots of people drove right into a lake or ocean because the device said keep going straight. (because of poorly classified multi-modal routing data)
This stuff is getting more pervasive too. I'm working on my Master's degree right now and any code I submit, I make sure it has spelling mistakes and make it god awful because I don't want to get flagged by some 3rd party utility that checks if it was AI generated or copied from someone else.
That's an interesting point. It seems it makes cheaper to provide knowledge but more expensive to have individual assessments.
I think AI got me some brain rot as I concern to finish stuff on time and I can't bare to spend brain energy on that (and spend on it anyway because AI sucks)
I suspect this is going in the wrong direction. Telling a sandboxed AI to have a long conversation with a student to ensure they actually know what they're talking about, while giving minimal hints away, seems like the scalable solution. Let students tackle the material however they will, knowing that they will walk into class the next day and be automatically grilled on it, unaided. There's no reason a similar student couldn't have their essay fed into the AI and then asked questions about what they meant on it.
Once this becomes routine the class can become e.g. 10 minutes conversation on yesterday's topic, 40 minutes lecturing and live exercises again. Which is really just reinventing the "daily quiz" approach, but again the thing we are trying to optimize for is compliance.
Yeah, this, but also as an adult; When you are a non-native speaker and you use AI to make things more concise and correct. The detector will go off. People may find some wording "AI-ish" (even though I replaced em-dashes with commas and told it to "avoid American waiter-like enthusiasm"). My reaction is: Ok. you want my original? Which is much harder to read and uses 2x the amount of words? Fine.
I mean, what is the problem? It's my report! I know all the ins and outs, I take full responsibility for it. I'm the one taking this to the board of directors who will grill me on all the details. I'm up for it. So why is this so "not done"? Why do you assume I let the AI do the "thinking"? I'm appalled by your lack of trust in me.
I routinely see people accuse any writing they don't like the style or as being AI generated. There is no possible evidence for this being the case, pple are just dicks.
I've intentionally changed my writing style to be less AI-like due to people thinking I'm just pasting my emails from ChatGPT.
Perhaps it's an artifact of LLMs being trained on terabytes of autistic internet commenters like me. Maybe being detected as AI by Turnitin even has some diagnostic value.
I guess you've never read the English of a Dutch person ;) During my PhD defense I was told I "should have checked with a native speaker." Pre-LLMs, I'd go to my American colleague and she'd mostly remove text and rewrite some bit to make texts much more readable.
Nowadays, often I put my text into the LLM, and say: Make more concise, include all original points, don't be enthusiastic, use business style writing.
And then it will come with some lines of which I think: Yes! That is what I meant!
I can't imagine you'd rather read my Dunglish. Sure, I could have "studied harder", but one simply is just much more clever in their native tongue, I know more words, more subtleties etc. Over time, and I believe due to LLM use I do get better at it myself! It's a language model after all, not a facts model. I can trust it to make nice sentences.
I am telling you my own preferences, as a native speaker of English. I would rather read my coworkers' original output in their voice than read someone else's writing (including a machine edit of their own text).
I doubt that very strongly and would like to talk to you again after going though 2 versions (with and without LLM) of my 25-pager to UMC management on HPC and Bioinformatics :)
I understand the sentiment, even appreciate it, but there are books that draw you into a story when your eyes hit the paper, and there are books that don't and induce yawning instead (on the same topic). That is a skill issue.
Perhaps I should add that using the LLM does not make me faster in any way, maybe even slower. But it makes the end results so much more pleasant.
"If I Had More Time, I Would Have Written a Shorter Letter". Now I can, but in similar time.
As they said, they are telling you their preference, there is nothing to doubt.
Recently there was a non-native english speaker heavily using an LLM to review their answers on a Show HN post, and it was incredibly annoying. The author did not realize (because of their lack of skills in the language) but the AI-edited version felt fake and mechanical in tone. In that case yes, the broken original is better because it preserves the humanity of the original answers, mistakes and all.
Ok, well it depends on the context then and the severity of the AIness (which I always try to reduce in the prompt, sometimes I’ll ask it to maintain my own style for example).
You know maybe it is annoying for native speakers to pick up subtle AI signals, but for non-natives it can be annoying to find the correct words that express what you want to say as precisely as in your mother tongue. So don’t judge too much. It’s an attempt at better communication as well.
I wrote a paper about building web applications in 10th grade a long time ago. When class was out the teacher asked me to stay for a minute after everybody left. He asked in disbelief, “did you really write that paper?”
I could see why he didn’t, so I wasn’t offended or defensive and started to tell him the steps required to build web apps and explained it in a manner he could understand using analogies. Towards the end of our conversation he could see I both knew about the topic and was enthusiastic about it. I think he was still a bit shocked that I wrote that paper, but he could tell from the way I talked about it that it was authentic.
It will be interesting to see how these situations evolve as AI gets even better. I suspect assessment will be more manual and in-person.
Accept to be unjustly marked as a cheater, or submit to a 30-60 minutes interrogation where the teacher think you're guilty and you need to prove that you're innocent.
It's only an obvious choice if you have total faith that your teacher will be fair, which you might doubt if the situation starts with "You're a cheater unless you prove me otherwise". In the worse case scenario you'll be grilled for one hour and still be marked as a cheater because you didn't convince the teacher.
I seriously think the people selling AI detection tools to teachers should be sued into the ground by a coalition of state attorneys general, and that the tools should be banned in schools.
The funny part is that Googe has all the edit history data. In other words, it's a piece of cake for them to train a model that mimics human editing process.
The only thing prevents them from doing so is the fact Google is too big to sell a "plagiarism assistant."
So the model is going to spend hours hesitantly typing in a google doc, moving paragraphs around, cutting and pasting, reworking sentences, etc so that the timestamped history matches up with something a human could realistically have done?
I’m very tempted to write a tool that emulates human composition and types in assignments in a human-like way, just to force academia to deal with their issues sooner.
Putting aside the ludicrous confidence score, the student's question was: how could his sister convince the teacher she had actually written the essay herself? My only suggestion was for her to ask the teacher to sit down with her and have a 30-60 minute oral discussion on the essay so she could demonstrate she in fact knew the material. It's a dilemma that an increasing number of honest students will face, unfortunately.