[MLton-user] replacing python in Fedora

Wed Mar 2 07:33:25 PST 2011

On Tue, Mar 1, 2011 at 9:04 PM, Christopher Cramer
<tsuyoshi at yumegakanau.org> wrote:
> On Tue, Mar 01, 2011 at 10:41:08PM +0100, Gergely Buday wrote:
>> Some suggested OCaml for a replacement of python, but got this
>> response from a Fedora board member:
>>
>> > ... Or take the opportunity to jump to a safe language .. we've
>> > already got an OCaml cross-compiler.
>>
>> Yes, let's move to something 'safer', which doesn't have ABI compatibility
>> ANYWHERE. That makes perfect sense.
>>
>> I am no ABI expert so ask you: could mlton do that?
>
> It's hard to say without better specification of what he means by "ABI".
>
> It could be that he's referring to the fact that OCaml libraries compiled
> with one version of the compiler will not link with libraries compiled
> with a different version. MLton does not do separate compilation so there
> is no linking at all (that is to say, MLton compiles the entire program
> from source every single time), other than the dynamic linking to the
> standard C library etc. that takes place when you run any executable.

Right.  And one can look at that as either a plus or a minus.  The
plus side is that every MLton compiled program is truly standalone.
The downside is that every MLton compiled program has its own copy of
the runtime system; hence, there would arguably be wasted memory in a
system with many MLton compiled programs running simultaneously.  This
may also be the ABI issue; if MLton switched to a dynamically linked
runtime system, then one would have to have some policy on the
evolution and backwards compatibility of its interface.  And that is a
fairly tricky thing to get right; the garbage collector is deeply
entwined with the compiler and low-level implementation details.

On the whole, though, I suspect that MLton would do better than Python
in terms of memory usage in the user program.  In general, its the
same argument that one can make for MLton versus other SML compilers:
MLton's whole program analysis leads to very good representation
choices, which saves memory.  Python, as a dynamically typed language,
would have the overhead of both uniform representation of data values
(which is overhead shared by most typical implementations of
statically typed languages with polymorphism) and the overhead of
dynamic type representation (which might not be that significant,
since it generally can be folded into the header word, which is likely
exists for other reasons as well).

And, yes, MLton has the same "memory leaks" of any tracing garbage
collector.  If there is apparently live data that is reachable, then
it is retained, even if it is semantically garbage.  Weak pointers and
NONE-ing out ref cells can sometimes help, but it is programmer
burden.  And a NONE-ed out ref cell is little different from a
dangling pointer, so arguably, you are back in the mess that the GC
was supposed to help.