Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is an important distinction between statically compiled and interpreted languages which is often lost in discussions that focus on developer usability.

For many years I worked on my own game engine which combined a C++ core with an interpreted scripting language. I had developed a system of language bindings which allowed the interpreted language portion to call functions from C++. The engine quickly grew to a multiple-gigabyte executable (in the debug build), and no matter how much I tried to optimize the size it was still unconscionably huge.

One of the reasons I eventually gave up on the project was I realized I was overlooking a simple mathematical truth. The size was NxM, where N is the number of bindings and M the size of each binding. I was focusing on optimizing M, the size each binding added to the executable, while not just ignoring N but actually increasing it every time I added bindings for a new library I wanted to call from the game engine.

There were diminishing returns to how much I could improve M because I was relying on compiler implementations of certain things I was doing (and I was using then-new next generation C++ features that weren't well optimized); it would be a lot easier to simply reduce N. And the easiest way to do that would be some sort of tree shaking.

Unfortunately due to the nature of interpreted code it isn't known at compile time which things will/will not be ultimately called. That determination is a runtime thing, by calls via function pointer, by interpretation of scripts that start out as strings at compile time (or even strings entered by the user during runtime).

From a compile time perspective, static usage of every bound function, feature or library already exists - it is the C++ side of the cross-language binding. That's enough to convince the linker to keep it and not discard it.

In fact, the mere presence of the bindings caused my game executable to grow more per each included library than would a similar C++ -only, all-statically linked program. If a library provided 5 overloads of a function to do a similar thing with different arguments, an all- C or C++ application that uses only one of them would need only include that version in the compiled executable; the others would be stripped out during the linking step.

Since I don't necessarily know ahead of time which overload(s) I'm going to end up using from the interpreted language side of the engine, I would end up binding all 5. Then my executable grew simply from adding the capability to use the library, whether or not I make use of it, but moreover if I did use it my executable grew even more than an equivalent C/C++ - only user of the library because I also incur costs for all the unused overloads.

You can see why something like Electron would have the same problem. Unused functions can't be automatically stripped out because that information isn't known at compile time. To do it by static analysis the developer of the Electron app would have to re-run the entire build from source process of the Electron executable to combine that with static analysis of the app's Javascript to inform the linker what can be stripped out of the final executable.

And it bears mentioning neither such a static analysis tool for Electron app Javascript nor the compiler/linker integrations for it currently exist. In theory they could exist but would still have trouble with things like eval'd code.

Manual configuration would be possible but necessarily either coarse-grained or too tedious to expect most developers (of Electron itself or users of Electron) to go into that much detail. That is, you may have manual configuration to include or not include Xbox 360 controller, but probably not for "only uses the motion controls" while not including other controller features.

Either way you wouldn't be able to add-back support for it Javascript written after build time turned out to actually need the function or feature after all, unless you distributed a new executable. If you're building so much from source with configuration and static analysis, at that point why not write your whole application in a statically compiled language in the first place?

My thesis here is not that we should accept things like Electron being bloated because they cannot be any other way. My point is (as happens time and again in Computer Science) we had certain things already (like tree shaking and unused symbol stripping during the linking stage of statically compiled languages) and then in the name of "progress" let them either be Jedi-mind-tricked away or the people developing the new thing didn't understand what was being left behind.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: