Good question - my best assessment is just the text classifier. IE was the LLM able to “trick” the classifier into believing the text came from the IPJ?
And it came quite a long way in training. Initially the classifier scores were very low (mean around 0.05, meaning modern). Over training, the scores came up and ended close to 0.95 (IPJ). The standard deviation of the group also declined, so the consistency of responses improved as well.
My thought on the application of this is that you could use it to create different voices to your responses and probably even add multiple at a time to a single model. I chose this one to experiment, because it is easy to classify and the data was available in the public domain.
GRPO kind of opens up RL to lower tiers of hardware and I’ve been able to experiment with it at home. I think this is something people can do themselves and it’s fun and potentially useful in games or possibly in relation to applications interfacing kids with lower reading levels (eg using a reading level classifier instead).
Yet, one might justly question the imperative of cultivating a distinct model for such an endeavour, when a judiciously framed prompt, enriched by apposite examples, might suffice to imbue a sophisticated engine with the desired stylistic graces. Though it is undeniable these modern engines shall wax greatly in their proportions, and the art of discovering the exact prompt to elicit their most felicitous expressions is a task far from trivial, yet, it must be admitted, the pursuit holds a certain diversion for the inquisitive mind! It is, perchance, not the creation of manifold engines, but rather the artful disposition of singular contexts, that shall bestow upon diverse interlocutors their proper and unique voices.
No, "Polyhedral Code Generation in the Real World" (the paper you just linked) describes the code generation process of polyhedral compilers. Code generation is the last step of polyhedral compilation: first you model code using polyhedra, then you optimize in the polyhedral representation, and finally you generate new code from the optimized polyhedral representation.
The survey paper in the OP compares different state-of-the art polyhedral compilers on full benchmarks (the complete process described above), not just code generation.
I don't understand the force of this no when the question is "is this like...". That's a solid yes as far as I can tell since the two papers are about the same general area.
> first you model code using polyhedra
No one does this - if your code is already known to be a "polyhedron" (a set of inequalities) then you would optimize like everyone else does: using simplex. In polyhedral compilers the hard part is finding the implicit polyhedra in loop nests. Optimization is also hard (np- hard) but there's no way around that so it doesn't matter and so everyone just uses ISL.
> I don't understand the force of this no when the question is "is this like...". That's a solid yes as far as I can tell since the two papers are about the same general area.
I understood the question as asking whether the two papers contain similar content (which they do not) and replied in kind. You are right of course that they deal with the same general concept.
> In polyhedral compilers the hard part is finding the implicit polyhedra in loop nests
I was not very clear; this is what I had in mind when writing "you model code using polyhedra". I meant that the polyhedral compiler builds a model of the code using polyhedra as the first step of the compilation, not that the user would do it.
the interesting advance in the anthropic/mats research program is the application of dictionary learning to the "superpositioned" latent representations of transformers to find more "interpretable" features. however, "interpretability" is generally scored by the explainer/interpreter paradigm which is a bit ad hoc, and true automated circuit discovery (rather than simple concept representation) is still a bit off afaik.
How much of this is due to DNNs (e.g. VAEs but also others) forcing embeddings to distribute in a Gaussianish manner? Is the data intrinsically missing geometry or could a more subtle learning algorithm give a cleaner manifold and therefore more efficiently indexable structure?
reply