[MLton-user] How to write performant network code

Matthew Fluet fluet at tti-c.org
Sun Jan 11 20:13:33 PST 2009


On Wed, 7 Jan 2009, Wesley W. Terpstra wrote:
> I have been working on a network oriented program in SML. Currently we
> have two main bottlenecks in our code (according to mlprof and
> confirmed via replacing the code with stubs).
>
> The first problem has to do with copying data when assembling a
> packet. We've already eliminated as many copies as possible through
> careful use of slices, and only one copy happens when finally
> construction the (contiguous) packet which is to be sent. This copy is
> one of the slowest points in our code, using Word8ArraySlice.copy. The
> problem is that Word8ArraySlice.copy does a byte-by-byte copy with a
> not very optimized loop. It seems to me that all the
> WordXArray[Slice].copy[Vec] functions could just call out to memcpy.
> This should be a lot faster. Is there any objection to my preparing a
> patch to apply this optimization in the basis?

Does memcpy (or memmove, since the *Array{,Slice}.copy functions needs to
work with potentially overlapping regions) do anything more than a
word-by-word copy?  If not, then it seems to me that, at least for the
Word8Array{,Slice}.copy{,Vec} functions, you could stay in SML and use the
PackWord<N>{Big,Little}.{subVec,subArr,update} functions?  Or, it is
probably better to directly use the corresponding primitives, in order to
amortize the bounds checking.

> The second bottleneck comes from hashing network addresses. We need to
> identify who sent us a packet and we keep information for each peer in
> a hash table. Unfortunately, it's practically impossible to hash the
> address as obtained from Socket.recvArrFrom. The only method that
> seems to be available to us is to first convert the address to a
> string and then to hash that. Unbelievable as it may be, hashing the
> address this way is one of the slowest parts to processing the packet!
> The address type can't be passed through the FFI itself (to extract
> the 32-bit IP+16-bit port) because it is wrapped inside a datatype. So
> the only options that seem available to me at present are: 1)
> completely replace all networking calls with direct FFI, by-passing
> the basis or 2) add some extension in MLton.Socket that lets me get
> the address out as a Word8VectorSlice.slice. Any better suggestions?

Certainly, 2) is much better than 1).

A while ago, I added a primitive (structural) polymorphic hash:
   http://mlton.org/cgi-bin/viewsvn.cgi?view=rev&rev=6352
It would seem to suit your purposes: you can use it to hash any value, 
including datatypes.




More information about the MLton-user mailing list