benchmarks for the release

Henry Cejtin henry@sourcelight.com
Mon, 1 Apr 2002 02:01:00 -0600


In TextIO.inputLine, the following copies occur:
    First the read system call copies the bytes into the buffer.
    Second the inputLine code copies the next line in to a list.
    Third the list is copied into an array (the final string).
I am not proposing changing the string representation, but note that the only
reason for the list is because you don't know how big the string is going  to
be  until you get to the newline.  The alternative would be to keep a list of
the buffers which have the current  line  (usually  just  1)  and  then  copy
directly  from  them  into the final string.  This would require not re-using
buffers, but that is very cheap compared to copying the line into a list.  In
the end it changes the number of copies from 3 to 2.

Note,  an  alternative  would  be to use some more efficient structure than a
list as the temporary  space.   Ignoring  the  horrible  problems  of  thread
safety,  you  could  have  a  fixed  array which you copy bytes into until it
either overflows or you see the newline.  This still costs you an extra copy,
but  it  isn't  into  a  list.  If the fixed array overflows then you go into
doubling, so the amortized cost is still only 1 extra copy