[MLton] PackWord to/from nonsense

Tue Jul 7 04:08:12 PDT 2009

As I'm sure everyone has run into at some time or another, the PackWordX API
is flawed:

*val* bytesPerElem<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.bytesPerElem:VAL>
>  *:* int
> *val* isBigEndian<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.isBigEndian:VAL>
>  *:* bool
> *val* subVec<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.subVec:VAL>
>   *:* Word8Vector.vector *** int *->* LargeWord.word
> *val* subVecX<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.subVecX:VAL>
>  *:* Word8Vector.vector *** int *->* LargeWord.word
> *val* subArr<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.subArr:VAL>
>   *:* Word8Array.array *** int *->* LargeWord.word
> *val* subArrX<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.subArrX:VAL>
>  *:* Word8Array.array *** int *->* LargeWord.word
> *val* update<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.update:VAL>
>  *:* Word8Array.array *** int *** LargeWord.word
>                *->* unit
>

where instead it should read something like:

*type* word<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.bytesPerElem:VAL>
> *val* bytesPerElem<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.bytesPerElem:VAL>
>  *:* int
>  *val* isBigEndian<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.isBigEndian:VAL>
>  *:* bool
> *val* subVec<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.subVec:VAL>
>   *:* Word8Vector.vector *** int *->* word
> ***val* subArr<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.subArr:VAL>
>   *:* Word8Array.array *** int *->* word
>  ***val* update<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.update:VAL>
>  *:* Word8Array.array *** int *** word *->* unit

In our networking code, I worked around this by using _prim
"Word8Array_subWordX"  if MLton is used. This avoids the two C calls casting
in and out of a 64-bit word for every word written into the data stream. I
recently ran into trouble on a 64-bit machine because SeqIndex.int is not
int, and I got a PrimApp error. As a stop-gap measure, I'm open to
suggestions of an Int/Word type that must match SeqIndex.

It would be nice to have 'unsafe' versions without the LargeWord baggage
available somewhere, so _prim isn't needed. Armed with 'unsafe' PackWord, it
would be easy to implement faster string/Word8Array copies, as discussed
beforre.

I'll also note that PackWord represents yet another case where the basis
library expects MLton to optimize fromLarge o toLarge to nothing. I've been
getting increasingly annoyed by the costs I pay to convert between types. I
really liked Vesa's suggestion of {to/from}Fixed for the INTEGER signature.
Combining that with the optimization to turn
  x_1227: word32 = Word8Vector_subWord32 (x_1072, x_1074)
  x_1226: word64 = WordU32_extdToWord64 (x_1227)
  x_1225: word32 = WordU64_extdToWord32 (x_1226)
into
  x_1225:Word = x_1227
I think we would be able to achieve 0-cost conversions in almost all the
cases where it is safe.

If that conversion optimization were placed before commonArg and knownCase I
think Int8.fromFixed o Int8.toFixed would even become a no-op with overflow
checking:

x_1 = ...
x_2 = WordU8_sextdToWord64 x_1
x_3 = WordU64_sextdToWord8 x_2
(* from iwconv0 bounds checking: *)
x_4 = WordU8_sextdToWord64 x_3
x_5 = Word64_eq (x_2, x_4)
raise Overflow exception if x_5 is false

First, comes the new optimization:
x_3 = x_1
Then comes commonArg/commSubexp
x_4 and x_3 are replaced by x_2 and x_1 respectively
Then comes knownCase:
Word64_eq (x_2, x_2) is never false -> exception never raised

Am I correct in this assessment? If so, that's a pretty serious speed-up: 5
C calls and a potential branch turned into a no-op. Compared to 4 conversion
in/out of an IntInf, things look even better!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mlton.org/pipermail/mlton/attachments/20090707/351a26ad/attachment-0001.html