it seems like the raison d'être is performance rather than static correctness checking; i wonder how it compares to luajit? often my crude tests find luajit performance comparable to c compiled by gcc with optimization, but sometimes about half as fast, and presumably on a large enough program with a flat enough profile its performance would converge to slightly worse than simple bytecode interpretation. i'm never sure how much i can rely on luajit's performance, and static compilation would presumably ameliorate that problem. also, static typing would give me compile errors instead of unexplained slowdowns when i do something that frustrates the optimizer
the readme says, 'Compared to LuaJIT, Pallene aims to offer more predictable run-time performance.'
but seriously luajit is astounding. in https://gitlab.com/kragen/bubbleos/blob/master/yeso/mand.lua there is semantically a function pointer invocation per pixel (pixfunc), but luajit's trace compilation apparently hoists that entirely out of the inner loop of drawing the fractal. conventional compilation of statically typed languages, even ocaml, can't do anything like that; you need something like c++ templates to specialize the inner loop for each possible fractal being drawn
For compiled code, Pallene's performance can be comparable to LuaJIT[1]. Pallene might be better in code with unpredictable branches, which don't fit in a single trace. LuaJIT has the edge in purely-interpreted code, and can inline indirect or cross-module function calls. LuaJIT is also more featureful, and has a better FFI story at the moment.
LuaJIT is really good software, it is hard to beat it as its own game. Pallene tries to get its edge elsewhere: it tracks Lua 5.4, whereas LuaJIT diverged around 5.1 and will stay that way forever. Pallene's implementation is arguably more portable, because it compiles down to C and doesn't need hand-crafted assembly.
this is fantastic, thanks! it seems like you're saying pallene is using a c compiler as its compiler backend? that seems like it could give you a big leg up on difficult optimizations like inlining recursion (compared to writing your own backend, i mean, not specifically compared to luajit)
i'll read the paper. aha, it says:
> The Pallene compiler generates C source code and uses a conventional C compiler as a backend. This
is simple to implement and is also useful for portability. The generated C code for a Pallene module is
standard C code that uses the same headers and system calls that the reference Lua does. This means that
Pallene code should be portable to any platforms where Lua itself can run.
btw, if you're not on a microcontroller, this can even be a feasible thing to do for a jit compiler; running gcc 9.4 on a small program takes about 70 milliseconds on this micropc, 55 milliseconds to compile a small shared library. clang is of course less practical, but older gcc versions were even better, and tcc is even better, taking respectively 10ms and 7ms. you might have to cut down your header files though
the paper also has a longer section comparing pallene with luajit, which i will have to read in more detail. thank you very much for linking it!
5.4 vs. 5.1 could be an advantage for either side; minetest, for example, uses 5.1, so luajit is an option and pallene probably isn't
Afaik there was a prototype Ruby JIT that used the C compiler this way and loaded the resulting code as a shared library. I did this with a Python decorator and ctypes so I could inline and hot reload my C extensions using
That binary search benchmark probably triggers a trace explosion in LuaJIT like I've found quicksort does. If your lucky the function gets trace blacklisted, if not it ends up hitting the default max number of traces and throwing away all the JIT'ed code and repeating the same thing over and over.
> LuaJIT is really good software, it is hard to beat it as its own game. Pallene tries to get its edge elsewhere: it tracks Lua 5.4, whereas LuaJIT diverged around 5.1 and will stay that way forever.
Last I saw (admittedly some years ago), this also meant that an awful lot of the authors of Lua libraries/bindings diverged around then and de facto intend to stay that way forever and give up on anything that doesn't have LuaJIT's FFI implementation compared to mainline Lua's C bindings, and then a pile of Lua embedders also did this either because they had to keep tracking that crowd or because they really wanted the performance, and… so on. Unless this has changed in the meantime, I'm not sure what kind of edge you're hoping to get there.
Pallene unfortunately is only a subset of Lua, i.e. you cannot simply take a Lua program and compile it with Pallene. The use case of this language is not obvious to me, since if I have to re-implement the performance-critical parts of the Lua program anyway, I can just as good implement it in C using the Lua C API. Thus the claim of the Pallene authors is a bit misleading.
Concerning LuaJIT performance, I have done a lot of measurements over the years (see e.g. https://github.com/rochus-keller/Oberon/blob/master/testcase...) and would rather say that its performance is about factor four worse on average than an equivalent C++ -O2 compiled program. For some exceptional use cases it might perform as good as C, but not in general (one reason is the missing tracing JIT support for closures).
> I can just as good implement it in C using the Lua C API
Pallene beats C when the code uses many Lua data structures. Acessing Lua data from C via the Lua-C API has significant overhead that can erase the gains from rewriting into C. Also, rewriting from Lua to Pallene is much less work than rewriting it in C.
Although Pallene is only a subset of Lua, the idea is that you use it together with Lua. It's not meant to replace Lua entirely.
The difference is the Lua-C API. The default Lua-C API is designed for humans: it is stable and safe to use, but every operation must pay the cost of function calls and passing data through the Lua stack. Pallene bypasses the API and reaches into the Lua tables directly. This is much faster, but would be impractical without the Pallene compiler. The internal struct layouts are unstable, and unsafe if you're not very careful.
It doesn’t have the Lua to C interop overhead. You can obviously ameliorate that overhead by working on batches in C, but if you have a large and complicated dataset in Lua and need to iterate through it in C, the overhead is constantly additive so it’s certainly not just “the performance of C” when you step into C, necessarily.
If on the other hand you’re dropping into C to do something like decode a compressed stream, then the interop overhead is negligible compared to the work done in C. However, that interop overhead will be present wherever you put the boundary layer....
maybe! tracking down unexpected performance regressions is more work than correcting type errors reported by compiler errors, and your luajit results suggest that typically a c subroutine (and perhaps consequently a pallene subroutine) will enjoy a 4× speed advantage over the luajit version, which might save you a lot of optimization work elsewhere
> tracking down unexpected performance regressions is more work
In particular you have to know a lot of technical details about LuaJIT, which essentially contradicts the benefit of using Lua in the first place. In my case I used LuaJIT as a backend, directly generating bytecode considering LuaJIT internals as far as possible. But it's still much slower than e.g. Mono.
> your luajit results suggest that typically a c subroutine (and perhaps consequently a pallene subroutine) will enjoy a 4× speed advantage
The Lua implementation of Are-we-fast-yet apparently doesn't consider LuaJIT internals, but is just pretty ideomatic Lua not optimized for speed. So it doesn't represent what is possible with LuaJIT performance-wise, but gives a good impression how LuaJIT performs in general applications.
interesting, thank you for sharing your experience! in the example case my pixfunc is in fact not a closure, just a pure function, but it sounds like the performance would collapse if i made it a closure
(you can of course compile specialized versions of the inner loop for each fractal in c; 'just' use the preprocessor)
it seems plausible to me that it would be more convenient to reimplement a lua function in pallene rather than in c, especially for 'shallow' functions where the code for passing arguments and return values back and forth on the stack to lua is a substantial percentage of the total function code
but there's still the question of how good or bad the performance of the code generated by the pallene compiler is! it might be worth rewriting the code in c if gcc or clang produces faster code than pallene does, even if rewriting it in pallene would be easier
The problem is actually FNEW. If you create your function on top-level and leave it as is then FNEW is only used once. But if you require e.g. nested functions, then each call to FNEW will terminate the trace. My measurement with the Smalltalk and SOM implementations using LuaJIT showed that the benchmark essentially remains in the interpreter most of the time for that reason.
> it seems plausible to me that it would be more convenient to reimplement a lua function in pallene rather than in c
If just Pallene would be more similar to Lua. But essentially it's a different language which just resembles Lua a bit. So you could use any language compatible with the Lua C API if you don't like C (instead of learning Pallene). You have to rewrite the performance critical parts anyway. Personally I prefer LuaJIT, since everything is Lua, and if you know enough details and do measurements, the performance is no longer a surprise.
> how good or bad the performance of the code generated by the pallene compiler is
There is no reason to assume that Pallene is faster than LuaJIT if things like the ones mentioned above are considered.
that must have been a huge disappointment when you ran into that problem with som! are you planning to write your own jit compiler, or is there still a way you can leverage luajit?
I ran into stability problems with the Oberon+ FFI which I was not able to solve with the LuaJIT bytecode generator; so I switched to Mono, which even has a much better debugging infrastructure, and for speed I also implemented a C transpiler, the result of which is very close to the native C++ implementation. SOM was mostly an effort to find out how close we can come to Cog with a LuaJIT backend, as a follow-up to my Smalltalk LuaJIT adventure.
hmm, are you going to have to switch to .net core now? or are mono users succeeding in banding together to handle security updates in precisely the way python 2 users have failed to?
I'm not using the dotnet framework, just the clr, and for the use cases covered by Oberon+ security is of little concern. The project is still regularly updated, see https://github.com/mono/mono/commits/main/.
Ideally, CoreCLR will give you much better performance, ability to AOT compile your binaries to native executables and will remove the need for multiple back-ends.
These are two different Mono implementations. The one maintained by MS has a different focus. I'm mostly interested in a lean, independent build and many different targets (e.g. Apple M1). The "much better performance" of CoreCLR is a fairy tale (see e.g. https://www.quora.com/Is-the-Mono-CLR-really-slower-than-Cor...). Actually, Mono supported AOT even before the CoreCLR (but I don't need it, because I have a to C transpiler which produces cross-platform near-native performance).
Unfortunately, there are a few issues with this quora post. First it is outdated - .NET is on an annual cadence and statements true for 3.1 or 5 have no impact on user experience today, second it focuses on edge case differences with log scale rather than the full picture. Also note that compiler improves significantly with each version - look at the difference between 3.1, 5 and 6. The differences between 6 and 8 are way more significant (DynamicPGO light-up, physical promotion for structs, etc.). And third there is more to the story than microbenchmarks. Again, there is a huge rift in performance between Mono and CoreCLR, and there are classes of optimizations Mono does not and will not do, like devirtualization. All optimization work goes into CoreCLR for multiple ISAs. Mono does not support AVX2 and AVX512, which core routines in .NET CoreLib rely on if available. It doesn't have a fancy if-conversion pass to merge branches and comparisons that CoreCLR does for ARM64, write barrier optimizations, etc etc.
This is not a myth but what anyone who works closely with .NET or is a part of .NET teams will tell you.
For example, look at https://aka.ms/aspnet/benchmarks select mono tab and pick coreclr and fortunes - you'll notice that Mono is more than five times slower than CoreCLR. And it is the Mono flavour which also sees performance improvements for the features that .NET adds or optimizations focused on targets served by Mono.
Also, the independent build is not independent - it limps along Unity Technologies contributions until Unity finishes its move to CoreCLR, which they are in the process of.
The post is good enough, using recognized methods and allowing reproducability. It considers different CoreCLR and Mono versions, and there is no reason to assume big performance differences since. The raw data is referenced, you can use a different scale if you prefer, but the information is the same. If you are interested, take a look at the Are-we-fast-yet benchmark suite to see that it is well defined, not just a bunch of random micro benchmarks, but also includes a few larger programs (e.g. to exercise the GC). And again: this is about the CLR, not the framework. I'm not interested in the .NET framework implementation. The build works well on all platforms without any dependencies and without downloading and compiling several gigs of stuff.
I am talking about CoreCLR, which has nothing to do with .NET Framework. It has tangentially something to do with .NET's standard library but its interaction with it is the same as the one Mono has. It just executes CIL assemblies, maybe AOT compiles them if you ask build system for it.
The statement "there is no reason to assume big performance differences since" is naive and ignores the performance work that went into releases 7 and 8, particularly DynamicPGO which enables the whole class of new optimizations. It also ignores the steady, significant improvements in versions 3.1, 5 and 6. These are annual jumps in performance on code that had otherwise no changes.
I caution other readers to take extra grain of salt and perform their own benchmarking if they do not believe my statements. Luckily, the numbers will be very easy to interpret with difference occasionally being as much as an order of magnitude in favour of CoreCLR.
Mono is easier to embed which justifies select scenarios of its usage, but if there is no such specific requirement, then you should use CoreCLR. Doing otherwise means harming your project and its users in the desire to be stubborn.
If you want to convice me, try running the referenced code on the most recent Mono and CoreCLR version and relevant architectures, and collect/report the data in an accessible way. I'm only interested in reproducible measurements, not in opinions.
> Luckily, the numbers will be very easy to interpret with difference occasionally being as much as an order of magnitude in favour of CoreCLR. ...
It was interesting to look at how .NET deals with arbitrary assemblies from something that isn't C# or F#. In the end, I did not bother with manually linking .dll files for ILC so there will be only JIT results. In general, I do not think the CIL back-end for Oberon produces good code.
It is compiler-unfriendly, ignores functions provided by .NET's CoreLib (available both under Mono and CoreCLR) to use its own poorly performing variants of them, makes everything virtual and etc. It does silly things like makes all references to a string literal into an array allocation ("test".ToCharArray()) to just do a comparison, then also treats such arrays as zero-terminated pointers despite, well, .NET arrays having length in them included already, and more.
Overall, a disregard what CIL, CLI and CTS are and what they offer. What are you using .NET for if not for writing truly portable C-like? There are already pointers and manual memory management, the lowering strategy could have been much closer, and a lot of suffering is very much self-imposed.
And the benchmark initially ran for so little time that it would give advantage to pure interpreter based runtime as long as it did very little, or natively compiled binaries. Surely your average program runs more than 200ms?
Anyways, you can find these and a couple other notes with data and charts here:
I did not have time to look closer in benchmarks with 1:1 numbers, but if you note things like JSON handling which is compiler sensitive, you'll note that the difference there was the biggest.
Apart from the unnecessarily arrogant remarks, which by the way are completely irrelevant since the same code runs on both measured VMs, no matter how you find it, I find the result interesting and thank you very much for your effort. I had never run CoreCLR on Apple M1 before. Apparently there was indeed a further improvement on this architecture, although still the overall difference is not that dramatic. If I analyze your numbers the same way I did before, I get a geometric mean factor of 1.8, which would almost make it worth switching to CoreCLR, at least on Apple M1. But there are other critical requirements, e.g. that CoreCLR does not support all the platforms I need, and that the debugging support, as far as comparable at all, requires a completely different integration. Just for the speed-up on Apple it's not worth switching. As soon as I find some time, I will try again on x86 and x86_64 and see how it looks there in comparison to my 2021 measurements.
Here are some comparisons to my 2021 measurements.
- Mono 5 and Mono 6 on Windows 10 x86 achieved about the same performance (factor 1.04)
- Taking Mono 5 as a reference on Windows 10 x64, CoreCLR 3 had about the same performance, CoreCLR 5 was factor 1.3 faster than Mono 5, and CoreCLR 6 was factor 1.2 faster than Mono 5.
- The measurements done by neonsunset on Mac M1 show that CoreCLR 9 is factor 1.8 faster than Mono 6; therefore CoreCLR 9 (M1) seems to be around factor 1.4 faster than CoreCLR 5 (x64).
The issue of poor lowering strategy remains. The back-end should be using structs, pointers and manual memory management if this is what it does with C. C# supports full set of features to make it possible including (now) raw function pointers. Just no C macros, sorry.
It should not even be close to Mono in performance. It should be close to C, given correct back-end output, which it isn't. At this point, outputting C# would have produced much better results that do not defeat compiler optimizations, where CoreCLR performs rather well despite the CIL quality, not because of it. And even then, you can easily see that either computationally intensive or logically complex benchmarks show the biggest difference as they stress out compiler capability the most. In any case, the option to target C# instead would result in guaranteed correct CIL form, with loops written in a way that does not occasionally break loop recognition, does not introduce unnecessary memory and abstraction overhead and does not have the issues with the basics like that one with string comparison that allocates each time for no reason.
Also note that compiler improvements are generic - most affect any platform supported, with platform-specific peephole optimizations responsible for cleaning up the codegen and eking out the last %. If-conversion pass for ARM64 is going to be an exception, but on the other hand x86_64 offers wider vectors which .NET makes use of.
You are still clinging to the same misconception. I want to measure the CLR, not the frontend output. The CLR is by definition a "Common Language Runtime", i.e. it makes the promise to execute applications written in multiple high-level languages without the requirement that these applications have to take into consideration the unique characteristics of the specific CLR or environment. There is no requirement in ISO 23271 whatsoever that the CIL fed to the CLR must meet other requirements than the ones stated in the standard. There is especially no requirement in the standard that the bytecode fed into the CLR must be optimized or meet certain patterns so that loops are recognized. My code is perfectly compliant with the standard.
The CLR is essentially a JIT processing a well-defined intermediate language. A JIT is all about optimization. In contrast to a static compiler, a JIT can also consider specific system and runtime information for this purpose. Mono is very good in both static and dynamic optimizations, and it is especially good in handling bytecode from other than the included C# compiler. Microsoft, for some reason, seems to have decided to move a lot of optimizations to their new C# compiler infrastructure, which also heavily uses inside knowledge of the CoreCLR, and generates patterns which the latter is looking for (e.g. for loop detection and the like). It is therefore not surprising if code compiled with Microsoft's own C# compiler runs faster on the CoreCLR than other code.
However, I am only interested in the original purpose of the CLR. I don't want to have to use .NET or Microsoft's compiler framework in order for the CLR to achieve good performance. That is why I deliberately use unoptimized code for the performance measurements, which in particular does not take into account any of the rules that Microsoft has introduced outside of the above-mentioned standard.
Your arguments would be relevant if different C# compilers + CLR were to be compared with each other, or to my Oberon+ compiler. That is a different use case. As a language developer and not equipped with the resources of Microsoft or the CoreCLR team, I have been able to save a lot of effort by profiting of the CLR standard and of Mono in particular, as forseen by the original purpose of the CLI/CLR. My measurements and claims represent this original purpose and use case.
If I had supplied highly optimized bytecode instead, I would - besides my much larger effort - to a large extent be measuring the compiler frontend (or its consideration of the many, sparsely documented rules that the bytecode must take into account beyond the standard in order for CoreCLR to run optimally), not the CLR.
That said, if you want to immortalize yourself in the Are-we-fast-yet community, you could contribute a C# implementation of the benchmark suite. This would be a good way to measure the influence of the frontend on CLR performance, which would also be very interesting.
I won't - already spent enough time on this and don't have more. My points were only that repurposing C emitter to target C# as well, with the same structs/pointers/memory management is likely both easier and bound to produce massively faster performing solution than something that has deliberately compiler-unfriendly IL. But even if you target CIL, intentionally using incorrect structures or the ones that are different to C back-end, especially around string comparisons, sounds like ignoring what the platform can offer. You can keep discounting my suggestions as something that is outside of ECMA spec and to be ignored, which it isn't. At least the cases like this one are few and far between and serious projects like IronPython3, ClojureCLR or Godot use appropriate runtime.
i notice that debian still doesn't have a .net core package. do you know what the obstacle is? is there a hidden licensing problem, or are the debian developers having trouble getting the released source to compile successfully? they do have a mono package so it can't be that they're morally opposed to c#
Most distro feeds do have it, hell even on FreeBSD it's just `pkg install lang/dotnet`. I guess Debian is special just like that (not that I care), there can't be a licensing issue because it's all MIT and everything can be built (takes a lot of time though) from a monorepo https://github.com/dotnet/dotnet.
thanks! yeah, i'd checked out the official guide previously, but it doesn't explain how to build from source, just how to download prebuilt binaries from some fly-by-night third-party software vendor website. that's part of what made me suspicious!
Building the entire Roslyn + runtime + additional packs + sdk installer and other bits is something that makes little sense to do for pretty much anyone but specific distro maintainers that have policies that require built-from-source only packages, organizations or select individuals that just want to do it because reasons.
Otherwise it takes way too much time as it means compiling multiple large projects (surely you're not always building LLVM or Chromium from source?), this isn't really something official documentation for the regular user needs to concern itself with but the documentation in the respective repository.
i don't want to become dependent on software where in theory it's free for me to study, modify, and redistribute modified versions of it, but in practice doing so is far too difficult to be worthwhile. it sounds like you're saying .net core has slid pretty far down that slippery slope?
debian does of course always rebuild llvm and chromium from source
Have LLVM, GCC or, say, Firefox or Chromium slid pretty far down that slippery slope? It might seem you're just looking for an excuse to make a straw-man of, which is not a productive way to approach whatever it is that grinds your gears.
You can build .NET (like the entire thing) on your machine (and the readme does a good job to guide you through that), it's just not a very practical thing to do for a regular user. Exact same reason applies to OpenJDK which is why there are usually community-maintained builds of it like Adoptium (besides addressing support-contract trap by Oracle which some of the "first-party" Java SDK builds are).
In the case of Debian, it is the target audience for the use of dotnet/dotnet VMR. Why they are not using it is something I don't care about though. My choice would be something saner for daily driving like Fedora anyway.
It’s a phrase of French origin that’s extremely common in English speaking countries, to the extent that it is in both the Oxford and Chambers English dictionaries.
Maybe they thought that writing <insert words> was more succinct than “the reason why this exists”. <insert words> expresses that sentiment idiomatically (expression) in two words instead of having to add a bunch of words around reason (you can’t just write “the reason for this” since “reason” then becomes too vague).
Also you can expect people that both italicize foreign expressions and use the proper diacritics and stuff to prefer a fancy style of writing.
to be perfectly fair i didn't go back and add the italics until after i saw efilife's comment which used them
'reason for being' is a pretty adequate and shorter translation, but it's not valid to write it as 'reason for beinĝ', which makes it a worse alternative. also it isn't a cliché
My suggestion is that you approach this kind of question the way you did in that post, not the way you did in this thread. "I'm not a native speaker, why did you say raison d'être there?" would get your question answered without the downvotes.
English is a magpie language, and there are dozens to hundreds of undigested phrases borrowed from other languages in it, with more arriving all the time.
It's the same in most languages... the children/teen in Sweden use English phrases as slang :D quite funny to hear (it's several "oh my god", "whatever", "no way" all over... but the swearing seems to remain in their language, interestingly).
Yup, it's definitely my fault here. I should have phrased this in a completely different way. Nonetheless, I still think using a foreign phrase is pointless and potentially obfuscates the meaning of a sentence, since we already have english words for that.
You are right, my fault for that. English is a dumpster fire sometimes, could have accounted for that
The point is that "foreign phrase" is a distinct form of English phrase.
You can't just toss any bit of any other language into English and expect it to fly. "I like Sally, she's got that joie de vivre", that's fine, "I like Sally's new hairstyle, très chic", also works. "Have you met Sally's new chienne? He's un vrai chéri!" is not English, this is code-switching and would only happen between two bilingual speakers.
English is a very large language in terms of vocabulary, but I would expect educated people to know joie de vivre and almost anyone to know très chic, although not necessarily how to spell it (and the accent grave is basically an affectation: "proper" English spelling does include various diacritics, but informally it's very rare to actually include them).
I think you have a biased view of what is common knowledge.
Certainly up north I could see those being known, as it's more likely you had French in highschool.
In the south, there are many common Spanish phrases known. I know some French phrases but I have never heard anyone say the ones you mentioned, nor did I know them. I would consider myself "educated".
I think most people believe what they grew up with is common until they run into evidence that it isn't.
That being said, I've both heard and used those phrases, and not in an "educated" context. Sometimes they say what you want to say better than another phrase that uses entirely English words.
At the risk of stating some obvious things: the claim that most educated persons would know these phrases does not imply that all educated persons must know these phrases. Nor does it imply that someone isn't educated if they don't.
I'm aware that I said that said I would expect educated people to know joie de vivre and, in fact, I do. I find it somewhat surprising that you claim to have never heard it before, but hey, I believe you. It doesn't change my expectation, though, which is formed by many years of saying it and having the people around me understand what I said. In many US states, including Southern ones, and abroad as well.
But no, they aren't "up north" regionalisms at all, and their existence in the language has nothing to do with American elementary schools having French classes or not. They're known throughout the Commonwealth, for instance. These are just undigested bits of French which live in the language, like a priori is to Latin.
The goals and broad approach are quite similar, and I'm sure that the Pallene authors are aware that Terra exists, so I'd love to hear why they decided that a new language was worth pursuing.
tl;dr: For Pallene, Lua is a scripting language. For Terra, Lua is mainly for metaprogramming.
The key difference is that Terra cannot directly manipulate Lua data. Although Terra's syntax is similar to Lua, its type systems and data structures are actually much closer to C. If we want to pass Lua table into Terra, we must first convert it into Terra arrays/structs. This cost adds up if the code uses many Lua data structures; in the worst case it can be worse than just using Lua for everything.
Passing data between Lua and Pallene should incur no cost. If the programmer rewrites a piece of code from Lua to Pallene, it should hopefully make things faster, and never slow it down.
Terra is more focused on metaprogramming. Generating code and executing things at compile-time. Think of C++ templates, but better.
This documentation is fantastic. I'd love to have Pallene on PLDB.io. If you had a few minutes, here's a post on how to add it: https://pldb.io/blog/addLanguageDemo.html
I can also add it myself, but I it's more helpful to have the language creators add their language. (But I'm happy to add if you are too busy, and/or assist with addding)
I love Lua but I also prefer static types, so this looks great. I’ve used Luau[1] in the past which has gradual typing, but that’s less useful unless it’s enforced as a coding standard.
It also offers speed improvements and other optimisations specifically for games development, since LuaJIT cannot be used on several target platforms.
One of the interesting things about Pallene is that it claims to reduce the overhead between the Lua code and C++, which is hugely important for game dev.
the readme says, 'Compared to LuaJIT, Pallene aims to offer more predictable run-time performance.'
but seriously luajit is astounding. in https://gitlab.com/kragen/bubbleos/blob/master/yeso/mand.lua there is semantically a function pointer invocation per pixel (pixfunc), but luajit's trace compilation apparently hoists that entirely out of the inner loop of drawing the fractal. conventional compilation of statically typed languages, even ocaml, can't do anything like that; you need something like c++ templates to specialize the inner loop for each possible fractal being drawn