TextIO.scanStream REALLY slow

Henry Cejtin henry@sourcelight.com
Thu, 14 Sep 2000 02:10:36 -0500


I  tried to use TextIO.scanstream to read in something big, but the result is
increadibly slow, and I'm curious why.  The code that was doing  the  reading
was

    fun ('elem, 'state)
       makeListReader (chreader: (char, 'state) StringCvt.reader,
                       ereader: ('elem, 'state) StringCvt.reader) =
          let fun nonwsreader (state: 'state): (char * 'state) option =
                     case chreader state of
                       NONE => NONE
                     | res as SOME (ch, state') =>
                          if Char.isSpace ch
                             then nonwsreader state'
                             else res
              fun next (state: 'state): ('elem list * 'state) option =
                     case nonwsreader state of
                       NONE => NONE
                     | SOME (#"]", state') =>
                          SOME ([], state')
                     | _ => case ereader state of
                              NONE => NONE
                            | SOME (e, state') =>
                                 case nonwsreader state' of
                                   NONE => NONE
                                 | SOME (#"]", state'') =>
                                      SOME ([e], state'')
                                 | SOME (#",", state'') =>
                                      (case next state'' of
                                         NONE => NONE
                                       | SOME (ac, state''') =>
                                            SOME (e::ac, state'''))
                                 | _ => NONE
          in fn state =>
                case nonwsreader state of
                  SOME (#"[", state') =>
                     next state'
                | _ => NONE
          end

    fun makeIntListReader chreader =
           makeListReader (chreader,
                           Int.scan StringCvt.DEC chreader)

    fun makeIntListListReader chreader =
           makeListReader (chreader,
                            makeIntListReader chreader)

    val matrix = valOf (TextIO.scanStream makeIntListListReader
                                          TextIO.stdin)

Note,  I  know  that this isn't going to be great because until if it doesn't
see an int list list, the stream is going to be left at the start, so all  of
that data has to be saved.

In  reading a file which was had a matrix which was 200 rows by 1297 columns,
each entry being 0, 1 or -1, it could only read at not quite  45K  bytes  per
second.  (The matrix result was being used in other parts of the program.)

I  figured  it  might be some strangeness connected with scanStream, but that
doesn't seem to be the case.  The program

    val _ = TextIO.scanStream let fun loop reader state =
                                              case reader state of
                                                NONE => NONE
                                              | SOME (_, state') =>
                                                   loop reader state'
                              in loop
                              end
                              TextIO.stdIn

reads the same file at just under 5 megabytes per second.

I can  understand  Int.scan  not  being  super  fast,  it  seems  to  be  the
bottleneck.   Note  that even the fast case uses the state argument, although
it doesn't use the character thing.

My question is: why is the integer reading so slow and what can  I  do  about
it.

Oh,  I  tried  to  profile the code, but the profiler is broken.  The problem
isn't the mlprof program (although that is  wrong  to)  but  in  the  PROFILE
comments inserted in the code by the MLton compiler.  They are supposed to be
only on CPS function boundaries, but refer to things which don't exist in the
CPS file and whic look too frequent to be functions.

As  to the output of mlprof, my old mlprof and the new one don't quite agree.
They don't differ by too much, so it could be just random things.

Note,   all   of  this  is  with  the  vanilla  mlton-20000906,  without  any
TextIO.scanStream changes.