Hacker Newsnew | past | comments | ask | show | jobs | submit | theodorewiles's commentslogin

Cool idea. Note that sometimes syllables depend on context. So syllable count I think needs to be a range.

Blessed vs “bless-ed” for example

Camera can be said cam-ra or cam-er-a for example.


It's also not limited to words pronounced poetically. Some words where both variants are common, like "wicked", have different numbers of syllables depending on meaning. e.g.

Beads of sweat wicked through the wicked witch's black robes a hot summer day


https://www.nature.com/articles/s41598-025-97652-6

This isn't study mode, it's a different AI tutor, but:

"The median learning gains for students, relative to the pre-test baseline (M = 2.75, N = 316), in the AI-tutored group were over double those for students in the in-class active learning group."


I wonder how much this was a factor:

"The occurrence of inaccurate “hallucinations” by the current [LLMs] poses a significant challenge for their use in education. [...] we enriched our prompts with comprehensive, step-by-step answers, guiding the AI tutor to deliver accurate and high-quality explanations (v) to students. As a result, 83% of students reported that the AI tutor’s explanations were as good as, or better than, those from human instructors in the class."

Not at all dismissing the study, but to replicate these results for yourself, this level of gain over a classroom setting may be tricky to achieve without having someone make class materials for the bot to present to you first

Edit: the authors further say

"Krupp et al. (2023) observed limited reflection among students using ChatGPT without guidance, while Forero (2023) reported a decline in student performance when AI interactions lacked structure and did not encourage critical thinking. These previous approaches did not adhere to the same research-based best practices that informed our approach."

Two other studies failed to get positive results at all. YMMV a lot apparently (like, all bets are off and your learning might go in the negative direction if you don't do everything exactly as in this study)


In case you find it interesting: I deployed an early version of a "lesson administering" bot deployed on a college campus that guides students through tutored activities of content curated by a professor in the "study mode" style -- that is, forcing them to think for themselves. We saw an immediate student performance gain on exams of about 1 stdev in the course. So with the right material and right prompting, things are looking promising.


OpenAI should figure out how to onboard teachers. Teacher uploads context for the year, OpenAI distributes a chatbot to the class that's perma fixed into study mode. Basically like GPT store but with an interface and UX tuned for a classroom.


Looks really cool - I noticed Enterprise has smart consent management?

The thing I think some enterprise customers are worried about in this space is that in many jurisdictions you legally need to disclose recording - having a bot join the call can do that disclosure - but users hate the bot and it takes up too much visibility on many of these calls.

Would love to learn more about your approach there


yes, we’re rolling out flexible consent options based on legal needs - like chat messages, silent bots, blinking backgrounds, or consent links before/during meetings. but still figuring out if there's a more elegant way to do this. would love to hear your take as well.


Please shoot me a note - I'm trying to figure this out for my enterprise now, would love to figure out a way to get you in / trial it out.


can i send you a follow-up to the email that's on your profile?


yes


For me this benchmark suggests that an LLM will try to “force the issue” which results in compounding errors. But I think the logical counterpoint is that you may be asking the LLM to come up an answer without all of the necessary details? Some of these are “baked into” historical transactions which is why it does well in months 1-2.

My takeaway is scaling in the enterprise is about making implicit information explicit.


My question on all of the “can’t work with big codebases” is how would a codebase that was designed for an LLM look like? Composed of many many small functions that can be composed together?


I believe it’s the same as for humans: different files implementing different parts of the system with good interfaces and sensible boundaries.


Well documented helps a lot too.

You can use an LLM to help document a codebase, but it's still an arduous task because you do need to review and fix up the generated docs. It will make, sometimes glaring sometimes subtle, mistakes. And you want your documentation to provide accuracy rather than double down on or even introduce misunderstanding.


this is a common pattern I see -- if your codebase is confusing for LLMs, it's probably confusing for people too


This fact is one of the most pleasant surprises I’ve had during this AI wave. Finally, a concrete reason to care about your docs and your code quality.


"What helps the human helps the AI" in https://blog.nilenso.com/blog/2025/05/29/ai-assisted-coding/

In future I'll go "In the name of our new darling bot, let us unit test and refactor this complicated thing".


And on top of that - can you steer an LLM to create this kind of code? In my experience the models don’t really have a „taste” for detecting complexity creep and reengineering for simplicity, in the same way an experienced human does.


I am vibe coding a complex app. You can certainly keep things clean but the trick is to enforce a rigid structure. This does add a veneer of complexity but simplifies " implement this new module" or "add this feature across all relevant files".


And my question to that is how would that be different from a codebase designed for humans?


I think it means finer toplevel granularity re: what's runnable/testable at a given moment. I've been exploring this for my own projects and although it's not a silver bullet, I think there's something to it.

----

Several codebases I've known have provided a three-stage pipeline: unit tests, integration tests, and e2e tests. Each of these batches of tests depend on the creation of one of three environments, and the code being tested is what ends up in those environments. If you're interested in a particular failing test, you can use the associated environment and just iterate on the failing test.

For humans with a bit of tribal knowledge about the project, humans who have already solved the get-my-dev-environment-set-up problem in more or less uniform way, this works ok. Humans are better at retaining context over weeks and months, whereas you have to spin up a new session with an LLM every few hours or so. So we've created environments for ourselves that we ignore most of the time, but that are too complex to be bite sized for an agent that comes on the scene as a blank slate every few hours. There are too few steps from blank-slate to production, and each of them is too large.

But if successively more complex environments can be built on each other in arbitrarily many steps, then we could achieve finer granularity. As a nix user, my mental model for this is function composition where the inputs and outputs are environments, but an analogous model would be layers in a docker files where you test each layer before building the one on top of it.

Instead of maybe three steps, there are eight or ten. The goal would be to have both whatever code builds the environment, and whatever code tests it, paired up into bite-sized chunks so that a failure in the pipeline points you a specific stage which is more specific that "the unit tests are failing". Ideally test coverage and implementation complexity get distributed uniformly across those stages.

Keeping the scope of the stages small maximizes the amount of your codebase that the LLM can ignore while it works. I have a flake output and nix devshell corresponding to each stage in the pipeline and I'm using pytest to mark tests based on which stage they should run in. So I run the agent from the devshell that corresponds with whichever stage is relevant at the moment, and I introduce it to onlythe tests and code that are relevant to that stage (the assumption being that all previous stages are known to be in good shape). Most of the time, it doesn't need to know that it's working stage 5 of 9, so it "feels" like a smaller codebase than it actually is.

If evidence emerges that I've engaged the LLM at the wrong stage, I abandon the session and start over at the right level (now 6 of 9 or somesuch).


I found that it is beneficial to create more libraries. If I for example build a large integration to an API (basically a whole api client) I would in the past have it in the same repo but now I make it a standalone library.


like a microservice architecture? overall architecture to get the context and then dive into a micro one?


I think end state is LLM-facilitated micropayments. One vast clearinghouse / marketplace of human-generated up to date content. Contributors get paid based on whether LLMs called their content via some kind of RAG. Maybe there are multiple aggregators / publishers.


Has anyone tried to use an LLM to test questions / concepts in a broader way via spaced repetition instead of just memorization? Just wondering.


Whoever writes the tool that can Actually Make a legitimate microsoft office powerpoint slide from text will make a lot of money.

From what I have seen most of these tools need to do more user research on how powerpoint slides actually look like in practice.

There's a lot of "you're doing it wrong, show don't tell, just keep the basics on the slide" but the people that use powerpoint to make $$$ make incredibly dense powerpoint materials that serve as reference documents, not presentation guides (i.e. they are intended as leave-behind documents that people can read in advance)

Presentations are also quite hard because:

1. It must "compile to" Powerpoint (it must compile to powerpoint because your end users will want to make direct edits and those end users will NOT be comfortable in markdown and in general will be very averse to change) 2. Powerpoint has no layout engine 3. Powerpoint presentations are in fact a beautiful medium in which VISUAL LAYOUT HAS SEMANTIC MEANING (powerpoint is like medieval art where larger is more important)

If anyone wants to help me build an engine that can get an LLM to ACTUALLY make powerpoints please let me know. I am sure this is a lot harder than you think it is.


I think the use case would be limited to a handful of users (slight hyperbole).

The average user is content enough with using plain PowerPoint and won’t bother with Markdown. People using Markdown are more on the “you’re doing it wrong, put the basics on the slide” side.

The people that make nice backgrounds for their talk, sometimes with a word or two, won’t get there with a text based tool either.

People that use LaTeX, markdown or some other text to slides tool are few and far between.


> 1. It must "compile to" Powerpoint

You simple cannot compete with Microsoft on Microsoft's turf. This approach is doomed.


The value in a presentation tool is that it can create a variety of presentations. The future of presentations will be variety (possibly via AI) to convey information in both optimal and creative ways.


Do you have one of these information dense powerpoints as a reference? Is there data visualization embedded in the slides? I don't know anything about this so I'm curious.


Here's one potential example of a moderate complexity slide deck generated by serious professionals : https://media.kalzumeus.com/complex-systems/334925792-Shorti...

That is a PDF copy of the actual pitch deck Deutsche Bank used for a proposed trade to take advantage of the 2008 housing financial crash by "Shorting Home Equity Mezzanine Tranches" (an incredible and lucrative prediction they made back in 2007, when the PDF was authored). The real meat and potatoes starts on page 6, but every page after the disclaimer could be put on screen as a slide in a powerpoint.

Note how nearly every slide is a diagram with title and potentially a caption. Each diagram is annotated with custom annotations explaining the concepts at play, requiring a ton of annotations. There's charts, block diagrams, process workflows, tables, and more. A minority of the pages are text-only with bulleted lists. This is an ultra high value artifact and very little of it would have benefited from a markdown->slides automation. What makes it amazing is the sheer volume and detail of very specific information, only replicable via tremendous elbow grease.


I teach the math behind AI, so my slides are very dense. Not "text heavy", but elaborate - some animations, graphics with arrows pointing to more graphics, 3-4 slides on just explaining what all the symbols in a math equation mean, and a worked example.

I would never be able to design my slides if I used a Markdown to PPT converter.


These are the slide decks that McKinsey and BCG consultants leave behind after a 6 month contract. The deck is the work product that the consulting firm got paid 7 or 8 figures for.

They are typically 60+ pages slides on something like go-to-market strategy or organizational realignment that the C-level at the hiring firm hired them to do and will forward around to his reports and teams to implement.

Each slide is handcrafted to have a punchy title and be self contained, dense, with links to references and data sources. There's a hefty appendix section so that when someone asks a "what about X?" question, there's a slide in there about alternatives considered and a data-centric reason on why it wasn't or shouldn't be pursued.


It's consultants doing everything in PPT. Imho it is very, very annoying.


why dont they increase class sizes


If I had to guess it sounds like they are using CURE to cluster the source documents, then map each generated fact back to the best-matching cluster, and finally test whether the best-matching cluster actually provides / supports the fact?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: