[MLton] The bool array/vector performance bug

Vesa Karvonen vesa.a.j.k at gmail.com
Sat May 31 10:19:06 PDT 2008


The bool array/vector performance bug (bool array/vector uses 4-bytes per
element) has been discussed on the list a number of times (you can find
them by googling for "bool representation", for example).  Skimming
through at a couple of discussions, I was most convinced by the reasoning
in the following posts by Wesley and Stephen:

  http://mlton.org/pipermail/mlton/2006-June/028936.html
  http://mlton.org/pipermail/mlton/2006-June/028940.html

In summary,

- don't allow ML bool in the FFI, but do
- allow C Bool ("C_Bool.t") in the FFI.

This means that one needs to explicitly convert between ML and C bools.

I think that this is the pretty much the only sane alternative, because
the resulting code will be portable (the size of C bool is platform
dependent) and it allows maximal flexibility on the ML side.

The current approach, using 4-byte bools, is just plain wrong, IMO,
because a C bool is not specified to be 4-bytes.  Using 4-bytes per bool
on the ML side is also highly inefficient and (which doesn't make it wrong
but) makes MLton look bad on some toy benchmarks
(http://shootout.alioth.debian.org/gp4/benchmark.php?test=nsieve&lang=all).

So, what is the status of the bool FFI thingy?  And what places in the
compiler/runtime/basis lib need to be changed to:
- eliminate ML bool from the FFI,
- add C_Bool to FFI (this might already be available?), and
- change the representation of ML bool to use 8-bits (or 1-bit) (in arrays
  and vectors at least)?
Is that all?  I would guess that changing the representation is about as
difficult as knowing all the places that need to be changed.  If more
programming is needed, I'd be happy to help to fix this.

-Vesa Karvonen



More information about the MLton mailing list