forwarded message from Lal George

Stephen Weeks MLton@sourcelight.com
Fri, 8 Dec 2000 13:05:40 -0800 (PST)


--DpCXMC9NIj
Content-Type: text/plain; charset=us-ascii
Content-Description: message body text
Content-Transfer-Encoding: 7bit


Just in case y'all aren't on the SML/NJ mailing list.  Here was a pointer they
gave to their new x86 floating point generator.

     http://cm.bell-labs.com/cm/cs/what/smlnj/compiler-notes/x86-fp.ps



--DpCXMC9NIj
Content-Type: message/rfc822
Content-Description: forwarded message
Content-Transfer-Encoding: 7bit

Received: from maguro.epr.com ([198.3.162.27]) by exchange.epr.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21)
	id WNKA3TZD; Fri, 8 Dec 2000 12:43:03 -0800
Received: from magrathea.epr.com (firewall-user@magrathea.epr.com [198.3.160.1])
	by maguro.epr.com (8.9.3/8.9.3) with ESMTP id MAA15780
	for <sweeks@intertrust.com>; Fri, 8 Dec 2000 12:47:01 -0800 (PST)
Received: (from uucp@localhost) by magrathea.epr.com (8.9.3/8.7.3) id MAA11902 for <sweeks@intertrust.com>; Fri, 8 Dec 2000 12:46:48 -0800 (PST)
Received: from nodnsquery(199.222.69.4) by magrathea.epr.com via smap (V5.5)
	id xma011399; Fri, 8 Dec 00 12:46:05 -0800
Received: from dirty.research.bell-labs.com (dirty.research.bell-labs.com [204.178.16.6])
	by mail.acm.org (8.9.3/8.9.3) with SMTP id PAA04826
	for <sweeks@acm.org>; Fri, 8 Dec 2000 15:46:02 -0500
Received: from nslocum.cs.bell-labs.com ([135.104.8.38]) by dirty; Fri Dec  8 15:44:46 EST 2000
Received: (from george@localhost)
	by nslocum.cs.bell-labs.com (8.9.3/8.9.3) id PAA29578744
	for sweeks@acm.org; Fri, 8 Dec 2000 15:44:46 -0500 (EST)
Message-Id: <200012082044.PAA29578744@nslocum.cs.bell-labs.com>
From: Lal George <george@research.bell-labs.com>
To: sweeks@acm.org
Subject: SML/NJ: 110.31 NEWS 
Date: Fri, 8 Dec 2000 15:44:46 -0500 (EST)

    
			S  M  L   /   N  J

                  1  1  0  .  3  1      N  E  W  S
			
  		           December 8, 2000

			      WARNING

  	This version is intended for compiler hackers. The 
	version ought to be stable, however we have not run
	our full regression testing.

        http://cm.bell-labs.com/cm/cs/what/smlnj/index.html


Summary:
   o Socket related bug fixes.
   o Improvements to CM autoloading.
   o General cleanup in the use of CM libraries in the compiler.
   o A new x86 fp compilation strategy.
   o Removal of regmaps from MLRISC.


		--------------------------------
Bug Fixes:
  1514. sockets c-library broken
  1582. SysErr exception connecting to socket
  1585. getpeername in sockets



		--------------------------------
CM:

  Drastically improved link traversal code, resulting is faster load
  times for CM and CMB.

  Changed CM tool-plugin mechanism. See new manual

  Made pickle-lib.cm and eliminated use of comp-lib.cm


		--------------------------------
SML/NJ Library:

  Fixed "where" clause to GraphSCCFn.


		--------------------------------
MLRISC:
	
 A. Intel X86 floating point:

   As of 110.31, there is an alternative floating point code generator
   and register allocator for the x86.  Since this is still experimental,
   by default this is turned off.  To turn this on, do:
 
     CM.autoload "$smlnj/compiler.cm";
     Compiler.Control.MLRISC.getFlag "x86-fast-fp" := true;

   The new floating point code generator treats the x86 fp stack as
   7 registers, plus one temporary, and directly allocates floating point
   values into these registers.  Currently, fp parameter passing is still
   done through memory, so the new code generator only benefits floating
   point heavy loops.  However, code compiled under the old and new 
   code generator can coexist.

   The algorithm is described in: 
     http://cm.bell-labs.com/cm/cs/what/smlnj/compiler-notes/x86-fp.ps

   Benchmarks:

      We compared Version 110.30 compiling the PCLubIN entry in the
      ICFP'00 programming context. 

			   110.30    new fp   Speedup
     (ICFP00, PCLubIN)
     chess.gml             22.16     20.98       5.63%
     cone-fractal.gml       5.70      5.45       4.51%
     cylinder.gml           1.61      1.58       2.28%
     dice.gml               7.33      6.88       6.57%
     ellipsoid.gml          1.35      1.30       4.16%
     fov.gml                2.63      2.51       4.70%
     fractal.gml           42.08     41.03       2.56%
     golf.gml               3.09      2.95       4.75%
     holes.gml              3.72      3.50       6.40%
     house.gml              1.41      1.33       5.71%
     intercyl.gml           3.02      2.78       8.41%
     large.gml              8.01      7.81       2.64%
     pipe.gml               6.35      5.78      10.01%
     snowgoon.gml           4.70      4.31       8.95%
     spheres.gml            1.26      1.17       6.98%
     spotlight.gml          0.71      0.68       4.69% 

     By inlining Array2 in the same benchmark we get the following results:

			       110.30   new fp Speedup
     chess.gml                 21.85s  21.46s  1.83%
     cone-fractal.gml           5.82s   5.47s  6.28%
     cylinder.gml               1.57s   1.61s -2.85%
     dice.gml                   7.57s   6.85s 10.50%
     ellipsoid.gml              1.33s   1.25s  6.74%
     fov.gml                    2.75s   2.57s  7.01%
     fractal.gml               22.64s  21.52s  5.20%
     golf.gml                   3.04s   2.92s  4.25%
     holes.gml                  3.66s   3.48s  5.11%
     house.gml                  1.39s   1.29s  7.74%
     intercyl.gml               3.00s   2.78s  7.91%
     large.gml                  7.91s   7.82s  1.13%
     pipe.gml                   6.44s   5.65s 13.98%
     snowgoon.gml               4.75s   4.29s 10.53%
     spheres.gml                1.22s   1.12s  8.36%
     spotlight.gml              0.71s   0.68s  5.62%

     Results from other benchmarks:

     barnes-hut            1.714     1.696        1.0%   
     fft                   0.954     0.906        5.2%
     mandelbrot            19.91     14.99       32.8%
     matrix-multiply(a)    47.77     45.81        4.3%
     matrix-multiply(b)    17.04     15.42       10.5%
     simple                 3.02      2.69       12.3%
     tsp                    1.75      1.656       5.6%

	NOTE: Matrix multiply(b) has all bounds checking removed.

     Each test is run 10 times and I take the average.

     Overall, the numbers do not improve as much as I was hoping, except for
     mandelbrot.  The following benchmarks compare smlnj with mlton and C:

		     mandelbrot fft    barnes-hut 
     sml/nj 110.30   19.91      0.96   1.71       
     sml/nj new fp   14.99      0.90   1.71       
     gcc -O          14.83      
     gcc -O6         14.01      0.68
     mlton -O6       17.46      1.04   1.62       
     (version 200000906) 


 B. Internal 'regmap' Changes:

   1. Changed interface to CELLS and the type of cell, cellkind, cellset etc. 

   2. No more regmaps!!  The attributes of cells, including its current color,  
      are accessible from CELLS interface.  Cells can now take arbitrary 
      annotations.  [They will also have a width attribute in the next 
      go around.]

   3. Interface of STREAM etc have changed (again, no more regmap).

   4. Some MLTREE constructors, like, IF, BCC, JMP, CALL, etc have 
      been simplified.  CVTI2I has been renamed into SX and ZX respectively,
      following the lambda rtl convention.

   5. The old RA interface was getting too complicated.  There are now
      two functors, RISC_RA (in ra/risc_ra.sml) and X86RA (in x86/ra/x86RA.sml)
      which abstract out from all the ugly business.  The first is for
      RISC machines, and the second is for x86.  Please let us know if you
      use these functors.

   6. The cell change broke the peephole phases, because they used to
      pattern match on specific cell number.  I (Allen) hacked up a simple 
      tool to translate fake SML with where clauses into real ML.  This makes 
      it much easier to write the rules.  Seems to work. (See Tools/WhereGen) 

--DpCXMC9NIj--