[MLton] Re: [MLton-user] How to write performant network code

Thu Jan 15 21:02:11 PST 2009

On Thu, 15 Jan 2009, Wesley W. Terpstra wrote:
> (moved from mlton-user)
>
> On Wed, Jan 14, 2009 at 10:24 PM, Wesley W. Terpstra <wesley at terpstra.ca> wrote:
>> Have you noticed that calling Word32.fromLarge o
>> PackWord32Little.subVec will generate this:
>>        call WordU32_extdToWord64
>>        call WordU64_extdToWord32
>
> In general, 64-bit Words/Ints suck in MLton 32-bit because it just
> passes the work to a C call.

A lot of the 64-bit ops could be done by the codegen.  It does 64-bit 
add/andb/neg/notb/orb/sub/orb natively.  The comparisons and extensions 
should really be done by the codegen as well.

> Wouldn't it make more sense to implement
> a Word64 using Word32 * Word32 and do the arithmetic in the basis
> library? The conversion to/from LargeWord would then be automatically
> detected by the optimizer as being useless. Then we would just pick to
> use the real Word64 for 64-bit machines and the fake Word64 on 32-bit.
> The problem with my proposal is of course that tuples are not
> FFI-friendly.

I think the FFI-unfriendliness is a show stopper.  Position.int is 64-bit 
(even on a 32-bit platform), and gets passed back and forth across the FFI 
for I/O.

> I looked into the ssa directory to see how to implement an
> optimization pass that detects and simplifies these cases:
>
>    x_1107: word64 = WordU8_extdToWord64 (x_1108)
>    x_1106: word32 = WordU64_extdToWord32 (x_1107)
>
>    x_1227: word32 = Word8Vector_subWord32 (x_1072, x_1074)
>    x_1226: word64 = WordU32_extdToWord64 (x_1227)
>    x_1225: word32 = WordU64_extdToWord32 (x_1226)
>
> I'm not really sure how to do this. It seems fairly easy to detect two
> lines one after another that can be combined (like the above example),
> but I don't know how to be sure that x_122[56] are used nowhere else.

Well, clearly x_1225 is being used somewhere else --- it is a pure 
operation, so would be dropped (by the removeUnused pass, if not by the 
shrink sub-pass (that is run as a cleanup sub-pass of all the optimization 
passes)) if it were unused.  It is true that x_1226 might or might not be 
unused.  But, you can always introduce dead code and allow one of the 
aforementioned passes clean up.  That is, with regards to the second 
example, it suffices to transform it to:

    x_1227: word32 = Word8Vector_subWord32 (x_1072, x_1074)
    x_1226: word64 = WordU32_extdToWord64 (x_1227)
    x_1225: word32 = x_1227

This local change hasn't changed the meaning of the program, so you can be 
confident that any uses of x_1226 and x_1225 are unaffected.  If it turns 
out that there are no longer any uses of x_1226, then removeUnused (or 
shrink) will drop it from the program.  Similarly, in the first example, 
it suffices to transform it to:

    x_1107: word64 = WordU8_extdToWord64 (x_1108)
    x_1106: word32 = WordU8_extdToWord32 (x_1108)

And, it is likely that x_1107 will be unused and subsequently dropped.

> Also, this approach wouldn't result in nearly the same performance
> gains as the Word64 = Word32 * Word32 approach. One could also
> implement the Word64_ primapps in the x86 codegen to avoid some of the
> overhead (seems fairly straight-forward).
>
>> How about I just add a MLton.Socket.Address.toVector which simply
>> exposes the underlying Word8Vector.vector in network byte order?
>
> The (completely trivial) patch is attached. Ok to commit?

Looks fine.  The MLton.Socket interface is supposedly deprecated (a 
holdover from the pre Basis 2002 days when the Basis Library networking 
modules weren't finalized), but there doesn't seem to be a particular need 
to purge it.