The Stanford study showed mixed results, and you can stratify the data to show that AI failures are driven by process differences as much as circumstantial differences.
The MIT study just has a whole host of problems, but ultimately it boils down to: giving your engineers cursor and telling them to be 10x doesn't work. Beyond each individual engineer being skilled at using AI, you have to adjust your process for it. Code review is a perfect example; until you optimize the review process to reduce human friction, AI tools are going to be massively bottlenecked.
The MIT study just has a whole host of problems, but ultimately it boils down to: giving your engineers cursor and telling them to be 10x doesn't work. Beyond each individual engineer being skilled at using AI, you have to adjust your process for it. Code review is a perfect example; until you optimize the review process to reduce human friction, AI tools are going to be massively bottlenecked.