There are a few improvements I'd suggest with that prompt if you want to maximise its performance.
1. You're really asking for hallucinations here. Asking for factual data is very unreliable, and not what these models are strong at. I'm curious how close/far the results are from ground truth.
I would definitely bet that outside of the top 5, numbers would be wobbly and outside of top... 25?, even the ranking would be difficult to trust. Why not just get this from a more trustworthy source?[0]
2. Asking in French might, in my experience, give you results that are not as solid as asking in English. Unless you're asking for a creative task where the model might get confused with EN instructions requiring an FR result, it might be better to ask in EN. And you'll save tokens.
3. Providing the model with a rough example of your output JSON seems to perform better than describing the JSON in plan language.
For some context, this snippet is just an educational demo to show what can be done with regard to structured output & data types validation.
Re 1: for more advanced cases (using the exact same stack), I am using ensemble techniques & automated comparisons to double-check, and so far this has really well protected the app from hallucinations. I am definitely careful with this (but point well taken).
2/3: agreed overall! Apart from this example, I am using French only where it make sense. It make sense when the target is directly French students, for instance, or when the domain model (e.g. French literature) makes it really relevant (and translating would be worst than directly using French).
Ah, I understand your use case better! If you're teaching students this stuff, I'm in awe. I would expect it would take several years at many institutions before these tools became part of the curriculum.
https://gist.github.com/thbar/a53123cbe7765219c1eca77e03e675...