[MLton-commit] r4746

Sat Oct 21 19:33:20 PDT 2006

Added x86_64 porting notes
----------------------------------------------------------------------

A   mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/
A   mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/TODO
A   mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/bench.txt
A   mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/exec-summary.0.txt
A   mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/exec-summary.1.txt
A   mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/mltongc.txt
A   mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/semantics.txt
A   mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/status.0.txt
A   mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/status.1.txt
D   mlton/branches/on-20050822-x86_64-branch/runtime/TODO
D   mlton/branches/on-20050822-x86_64-branch/runtime/gc/mltongc.txt

----------------------------------------------------------------------

Copied: mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/TODO (from rev 4745, mlton/branches/on-20050822-x86_64-branch/runtime/TODO)

Added: mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/bench.txt
===================================================================

--- mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/bench.txt	2006-10-22 02:26:17 UTC (rev 4745)
+++ mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/bench.txt	2006-10-22 02:33:18 UTC (rev 4746)
@@ -0,0 +1,535 @@
+
+Now that the refactoring on the x86_64 branch as mostly quiesced, I
+ran the benchmark suite to verify that there weren't any major
+regressions in performance.  It is to be expected that there will be
+some variability between HEAD and the x86_64 branch, since lots of
+code has been tweaked -- both in the runtime and in the implementation
+of the Basis Library.
+
+I've run the benchmark suite on the following two systems:
+ * FedoraCore 4; gcc 4.0.2; AMD Opteron 2GHz; 4GB memory
+ * RedHat; gcc 3.2.2; Intel Pentium 1.1GHz; 2GB memory
+
+Overall, there don't appear to be any significant (unexplained)
+regressions, but the x86_64 branch does appear to be running a little
+bit slower.  I'll go over some of the highlights, but if anyone sees
+anything that they believe deserves more investigation, let me know.
+
+Reminder: on the AMD Opteron system, these are 32-bit executables
+(running on a 64-bit kernel).  However, I will note that on the
+Opteron we compile the runtime and C-codegen generated files with the
+'-mopteron' option.
+
+
+Run-time ratio:
+
+Across the board, the 'checksum' benchmark performs poorly under the
+x86_64 branch; this is easily explained by the fact that the
+'checksum' benchmark is dominated by PackWord32Little.subArr, which is
+a primitive on HEAD, but is a C-call on the x86_64 branch.  See
+revision 4418.  We should eventually turn the PackWord operations into
+a more general primitives; see: 
+  http://mlton.org/pipermail/mlton-user/2004-November/000556.html
+  http://mlton.org/pipermail/mlton/2004-November/026246.html
+This should also partially explain the performance of 'md5', which
+also makes use of PackWord32Little operations.
+
+
+For the native-codegen on HEAD vs x86_64 on Opteron, the outliers are:
+        checksum                2.31
+        count-graphs            1.63
+        md5                     1.41
+        ray                     1.08
+The 'count-graphs' benchmark deserves further investigation, since it
+seems to perform badly on the configurations as well.
+
+For the native-codegen on HEAD vs x86_64 on i686, the outliers are:
+        checksum                2.18
+        count-graphs            1.74
+        md5                     1.47
+        tyan                    1.25
+        logic                   1.20
+        DLXSimulator            1.13
+        zebra                   1.12
+        zern                    1.12
+        model-elimination       1.11
+        hamlet                  1.09
+        wc-input1               1.09
+        life                    1.09
+        mlyacc                  1.08
+        flat-array              1.08
+        lexgen                  1.08
+        smith-normal-form       1.07
+
+For the C-codegen on HEAD vs x86_64 on Opteron, the outliers are:
+        checksum                4.61
+        mpuz                    2.05
+        count-graphs            1.68
+        md5                     1.60
+        tailfib                 1.53
+        zern                    1.40
+        imp-for                 1.40
+        simple                  1.26
+        matrix-multiply         1.24
+        mandelbrot              1.18
+        vector-concat           1.15
+        vliw                    1.12
+        tyan                    1.11
+        fib                     1.10
+        hamlet                  1.09
+        flat-array              1.07
+
+For the C-codegen on HEAD vs x86_64 on i686, the outliers are:
+        checksum                3.80
+        count-graphs            1.68
+        md5                     1.61
+        zern                    1.24
+        ray                     1.19
+        logic                   1.18
+        mpuz                    1.18
+        tyan                    1.16
+        vliw                    1.14
+        barnes-hut              1.13
+        fft                     1.13
+        zebra                   1.12
+        DLXSimulator            1.12
+        smith-normal-form       1.08
+        knuth-bendix            1.07
+        model-elimination       1.06
+        mlyacc                  1.06
+        wc-scanStream           1.06
+        hamlet                  1.06
+        psdes-random            1.06
+
+Since quite a few of our platforms are using the C-codegen, its
+probably worth investigating whether there is some low-hanging fruit
+to improve its performance.
+
+
+Size:
+
+Generally, the size of executables on the x86_64 branch are larger
+than those on HEAD.  
+
+Size x86_64 - Size HEAD:
+
+system      codegen     mean    min     max
+Opteron     native      33K     0K      37K
+Opteron     C           32K     0K      37K
+Opteron     byte        56K     0K      66K
+Pentium     native      20K     0K      24K
+Pentium     C           18K     -18K    38K
+
+Much of the size can probably be attributed to the refactored runtime
+code and aggressive inlining with the garbage collector.  On the
+Opteron system:
+
+   text    data     bss     dec     hex filename
+  54485       1     352   54838    d636 mlton.svn.x86_64/runtime/gc.o
+  33175       4      52   33231    81cf mlton.svn.HEAD/runtime/gc.o
+  52318    1004   31040   84362   1498a mlton.svn.x86_64/runtime/bytecode/interpret.o
+  34381    1004   31040   66425   10379 mlton.svn.HEAD/bytecode/interpret.o
+ 129625    1185   34399  165209   28559 mlton.svn.x86_64/build/lib/self/libmlton.a
+  91606    1136   33303  126045   1ec5d mlton.svn.HEAD/build/lib/self/libmlton.a
+
+and on the Pentium system:
+
+   text    data     bss     dec     hex filename
+  37098      16     400   37514    928a mlton.svn.x86_64/runtime/gc.o
+  29645      16      36   29697    7401 mlton.svn.HEAD/runtime/gc.o
+  35451    1004   31424   67879   10927 mlton.svn.x86_64/runtime/bytecode/interpret.o
+  32041    1004   31040   64085    fa55 mlton.svn.HEAD/bytecode/interpret.o
+  91314    1232   82490  175036   2abbc mlton.svn.x86_64/build/lib/self/libmlton.a
+  78982    1172   33239  113393   1baf1 mlton.svn.HEAD/build/lib/self/libmlton.a
+
+
+Compile time:
+
+On the Opteron system, compile times are on average 1.7s longer on the
+x86_64 branch than on HEAD (for all codegens), with no compile time
+more than 2s longer.  I believe that this is mainly explained by the
+revised Basis Library, which is nearly 10000 lines longer (39419 lines
+for x86_64, 29604 lines for HEAD), and makes aggressive use of
+functors.  When compiling the program "val () = ()", which includes
+type-checking the Basis Library, the x86_64 branch (on Opteron)
+requires
+
+         parseAndElaborate starting
+         parseAndElaborate finished in 2.47 + 1.50 (38% GC)
+
+while HEAD requires
+
+         parseAndElaborate starting
+         parseAndElaborate finished in 1.33 + 0.97 (42% GC)
+
+
+Benchmark Data:
+
+FedoraCore 4; gcc 4.0.2; AMD Opteron 2GHz; 4GB memory
+
+MLton0 -- /home/fluet/mlton/mlton.svn.HEAD/build/bin/mlton -codegen native
+MLton1 -- /home/fluet/mlton/mlton.svn.HEAD/build/bin/mlton -codegen c
+MLton2 -- /home/fluet/mlton/mlton.svn.HEAD/build/bin/mlton -codegen bytecode
+MLton3 -- /home/fluet/mlton/mlton.svn.x86_64/build/bin/mlton -codegen native
+MLton4 -- /home/fluet/mlton/mlton.svn.x86_64/build/bin/mlton -codegen c
+MLton5 -- /home/fluet/mlton/mlton.svn.x86_64/build/bin/mlton -codegen bytecode
+run time ratio
+benchmark         MLton0 MLton1 MLton2 MLton3 MLton4 MLton5
+barnes-hut          1.00   1.05  35.52   0.99   1.05  39.91
+boyer               1.00   1.45  48.58   0.90   1.34  54.04
+checksum            1.00   0.94  74.71   2.31   4.35 109.26
+count-graphs        1.00   1.05  71.94   1.63   1.77 118.20
+DLXSimulator        1.00   1.13  42.71   1.04   1.19  47.86
+fft                 1.00   1.06  11.10   0.98   1.06  12.40
+fib                 1.00   1.49  45.77   1.00   1.63  51.21
+flat-array          1.00   2.38      *   0.97   2.54 139.95
+hamlet              1.00   2.46  52.35   1.01   2.68  58.79
+imp-for             1.00   0.92 111.76   1.01   1.30 124.50
+knuth-bendix        1.00   1.97  82.38   1.01   2.02  92.02
+lexgen              1.00   1.25  63.31   0.97   1.15  69.67
+life                1.00   1.03  79.25   0.97   1.02  89.04
+logic               1.00   1.49  44.24   1.00   1.51  49.64
+mandelbrot          1.00   1.24  76.40   1.01   1.46  86.30
+matrix-multiply     1.00   1.34  71.18   1.00   1.66  79.63
+md5                 1.00   1.31  33.23   1.41   2.10  43.49
+merge               1.00   1.17  29.43   0.96   1.12  32.95
+mlyacc              1.00   1.28  37.96   1.02   1.29  42.41
+model-elimination   1.00   1.61  39.69   1.00   1.54  44.53
+mpuz                1.00   1.02  71.92   1.01   2.08  84.50
+nucleic             1.00   1.09  34.95   0.98   1.09  39.47
+output1             1.00   2.34 117.37   1.00   1.72 131.77
+peek                1.00   0.58  86.42   1.01   0.58  96.18
+psdes-random        1.00   1.53 137.87   1.04   1.54 153.87
+ratio-regions       1.00   1.21  55.21   0.99   1.22  61.90
+ray                 1.00   1.15  28.64   1.08   1.20  32.52
+raytrace            1.00   1.56  55.36   1.01   1.52  62.11
+simple              1.00   1.59  50.06   0.99   2.00  56.12
+smith-normal-form   1.00   1.00   1.55   1.00   1.00   1.65
+tailfib             1.00   2.16 125.85   1.00   3.29 141.95
+tak                 1.00   1.21  44.07   1.00   1.26  49.04
+tensor              1.00   2.73 221.51   1.00   2.34 249.18
+tsp                 1.00   1.07  32.75   0.99   1.10  36.47
+tyan                1.00   1.23  49.00   0.99   1.36  54.39
+vector-concat       1.00   2.10 117.04   1.00   2.41 131.42
+vector-rev          1.00   2.20 108.94   1.00   2.22 123.01
+vliw                1.00   1.58  38.45   0.95   1.77  42.15
+wc-input1           1.00   1.45  66.78   1.00   1.01  72.56
+wc-scanStream       1.00   1.38  85.70   1.01   1.29  96.10
+zebra               1.00   0.79  59.80   1.02   0.81  69.07
+zern                1.00   1.37  51.00   0.99   1.93  57.92
+size
+benchmark            MLton0    MLton1    MLton2    MLton3    MLton4    MLton5
+barnes-hut          105,267   104,417   165,416   139,837   138,215   232,889
+boyer               140,514   159,758   235,153   177,957   197,533   291,874
+checksum             56,054    56,294    95,329    89,801    93,425   153,298
+count-graphs         68,882    76,202   127,057   106,213   111,337   182,690
+DLXSimulator        135,234   146,354   229,221   169,092   176,216   287,985
+fft                  67,065    75,089   119,474   100,762   108,282   175,074
+fib                  49,670    56,438    95,369    86,841    92,845   151,778
+flat-array           49,710    56,514    95,425    86,913    92,665   151,906
+hamlet            1,257,401 1,436,385 2,205,344 1,278,403 1,468,331 2,251,676
+imp-for              49,542    56,306    95,497    86,713    92,393   151,938
+knuth-bendix        115,194   124,202   187,597   150,372   155,792   247,873
+lexgen              208,859   220,971   322,626   242,029   254,149   383,194
+life                 68,046    74,486   124,033   105,377   110,749   180,674
+logic               108,498   123,142   198,321   146,089   159,877   255,202
+mandelbrot           49,606    56,666    95,385    86,921    92,777   151,938
+matrix-multiply      50,146    56,970    96,281    87,413    92,977   152,818
+md5                  83,618    85,762   131,941   120,604   123,072   194,257
+merge                51,274    57,790    97,689    88,469    94,061   154,178
+mlyacc              511,891   565,983   795,250   546,353   602,813   856,506
+model-elimination   643,424   768,560 1,045,115   662,174   784,430 1,096,923
+mpuz                 52,582    59,982   100,817    89,649    96,245   157,218
+nucleic             200,330   159,021   226,891   237,861   195,196   286,321
+output1              86,748    90,724   136,647   121,316   120,832   196,545
+peek                 82,330    84,514   130,445   117,076   117,056   190,769
+psdes-random         50,302    57,286    96,545    87,489    93,189   153,026
+ratio-regions        75,846    83,366   136,993   112,873   120,301   192,674
+ray                 189,999   206,069   294,804   210,841   221,443   345,525
+raytrace            269,012   311,606   437,745   292,472   324,700   490,412
+simple              229,022   252,368   336,575   262,402   287,880   398,698
+smith-normal-form   187,722   210,750   264,629   223,784   245,772   330,081
+tailfib              49,334    56,242    94,961    86,505    92,329   151,394
+tak                  49,750    56,386    95,377    86,953    92,561   151,842
+tensor              103,625   112,809   174,708   139,227   145,515   239,952
+tsp                  88,194    89,620   142,687   122,964   124,362   207,232
+tyan                140,858   155,018   234,685   176,684   184,844   295,409
+vector-concat        50,934    57,954    97,505    88,137    94,241   153,986
+vector-rev           50,194    57,094    96,289    87,397    93,365   152,770
+vliw                400,590   475,066   682,121   415,992   492,872   727,701
+wc-input1           107,822   111,206   171,417   142,588   144,564   235,201
+wc-scanStream       115,102   121,150   183,745   149,936   151,548   247,521
+zebra               147,134   149,246   256,645   181,800   183,968   316,545
+zern                 96,747   104,479   153,564   113,951   121,011   198,699
+compile time
+benchmark         MLton0 MLton1 MLton2 MLton3 MLton4 MLton5
+barnes-hut          3.67   5.91   3.54   5.39   7.62   5.40
+boyer               4.03   8.59   3.65   5.66  10.19   5.28
+checksum            2.73   2.91   2.74   4.41   4.59   4.48
+count-graphs        3.08   4.21   3.00   4.73   5.85   4.66
+DLXSimulator        4.23   7.94   3.89   5.89   9.70   5.64
+fft                 2.96   3.49   2.92   4.66   5.17   4.65
+fib                 2.72   2.91   2.73   4.37   4.55   4.40
+flat-array          2.72   2.92   2.74   4.42   4.57   4.40
+hamlet             46.21 100.44  42.05  45.03  98.89  40.30
+imp-for             2.76   2.94   2.75   4.44   4.61   4.46
+knuth-bendix        3.52   6.53   3.31   5.20   8.28   5.05
+lexgen              4.92  11.05   4.24   6.63  12.94   6.05
+life                2.98   4.07   2.89   4.66   5.73   4.61
+logic               3.59   6.10   3.26   5.21   7.76   4.92
+mandelbrot          2.73   2.93   2.73   4.42   4.64   4.45
+matrix-multiply     2.76   2.97   2.74   4.43   4.66   4.49
+md5                 3.07   4.14   3.01   4.80   6.09   4.80
+merge               2.75   2.98   2.74   4.39   4.65   4.42
+mlyacc             10.98  28.62   8.40  12.62  30.30   9.86
+model-elimination  11.25  36.90   8.95  12.91  38.79  10.63
+mpuz                2.79   3.13   2.76   4.45   4.81   4.45
+nucleic             5.88  12.55   5.40   7.36  14.18   7.08
+output1             3.04   4.26   2.99   4.74   6.03   4.73
+peek                2.98   4.03   2.95   4.76   5.89   4.73
+psdes-random        2.73   2.94   2.74   4.40   4.62   4.43
+ratio-regions       3.27   4.71   3.11   4.89   6.32   4.81
+ray                 4.39   9.19   3.95   6.14  11.10   5.78
+raytrace            6.15  15.08   5.24   7.86  16.82   7.11
+simple              5.07  11.42   4.42   6.76  13.30   6.15
+smith-normal-form   4.37  11.58   3.92   6.14  13.51   5.73
+tailfib             2.72   2.89   2.72   4.36   4.56   4.40
+tak                 2.72   2.89   2.71   4.38   4.59   4.40
+tensor              3.78   6.03   3.63   5.55   8.00   5.50
+tsp                 3.19   4.47   3.11   4.93   6.39   4.88
+tyan                4.13   8.46   3.77   5.83  10.41   5.55
+vector-concat       2.73   2.98   2.73   4.41   4.62   4.41
+vector-rev          2.72   2.93   2.71   4.37   4.59   4.39
+vliw                8.26  22.55   6.72   9.85  24.40   8.39
+wc-input1           3.39   5.73   3.26   5.10   7.46   5.06
+wc-scanStream       3.50   5.91   3.32   5.20   7.66   5.13
+zebra               4.13   8.83   3.62   5.75  10.52   5.34
+zern                3.04   3.80   2.99   4.74   5.63   4.78
+run time
+benchmark         MLton0 MLton1  MLton2 MLton3 MLton4  MLton5
+barnes-hut         14.30  15.05  507.90  14.21  14.99  570.63
+boyer              18.04  26.23  876.42  16.21  24.16  974.97
+checksum           42.48  40.08 3173.59  97.97 184.62 4641.33
+count-graphs       20.80  21.87 1496.06  33.84  36.84 2458.24
+DLXSimulator       17.77  20.10  758.85  18.52  21.07  850.44
+fft                14.48  15.29  160.74  14.16  15.32  179.61
+fib                34.68  51.60 1587.41  34.68  56.67 1776.10
+flat-array          7.43  17.68       *   7.23  18.84 1039.24
+hamlet             16.43  40.33  860.05  16.55  44.09  965.80
+imp-for            28.83  26.66 3222.03  29.07  37.34 3589.23
+knuth-bendix       17.29  34.10 1424.23  17.51  34.84 1590.71
+lexgen             20.57  25.65 1302.31  19.97  23.67 1433.19
+life                8.93   9.23  707.85   8.65   9.12  795.25
+logic              18.82  27.99  832.67  18.76  28.49  934.14
+mandelbrot         24.40  30.33 1864.51  24.71  35.64 2105.89
+matrix-multiply     3.30   4.43  234.57   3.30   5.48  262.39
+md5                32.37  42.48 1075.62  45.56  67.87 1407.68
+merge              14.47  16.89  425.70  13.82  16.20  476.70
+mlyacc             16.48  21.16  625.73  16.84  21.25  699.21
+model-elimination  28.66  46.19 1137.74  28.66  44.12 1276.43
+mpuz               21.92  22.26 1576.65  22.08  45.68 1852.59
+nucleic            14.80  16.06  517.07  14.48  16.16  584.01
+output1             7.19  16.79  843.77   7.20  12.40  947.25
+peek               34.60  19.99 2990.07  34.79  19.96 3327.80
+psdes-random       15.90  24.29 2192.78  16.47  24.48 2447.26
+ratio-regions      24.02  28.99 1325.99  23.87  29.37 1486.63
+ray                15.73  18.14  450.44  17.05  18.88  511.61
+raytrace           16.37  25.59  906.21  16.55  24.86 1016.67
+simple             20.16  32.05 1009.38  20.03  40.41 1131.65
+smith-normal-form  10.32  10.32   15.96  10.31  10.32   17.07
+tailfib            19.36  41.81 2436.39  19.36  63.77 2748.18
+tak                12.92  15.70  569.51  12.92  16.24  633.76
+tensor             17.30  47.15 3831.05  17.30  40.45 4309.63
+tsp                19.84  21.15  649.58  19.54  21.86  723.41
+tyan               18.70  22.97  916.13  18.60  25.49 1016.87
+vector-concat      30.16  63.24 3530.57  30.21  72.83 3964.18
+vector-rev         18.61  40.95 2027.41  18.54  41.38 2289.30
+vliw               18.69  29.49  718.62  17.68  33.00  787.79
+wc-input1          27.42  39.70 1830.85  27.33  27.72 1989.39
+wc-scanStream      14.00  19.33 1200.10  14.12  18.02 1345.82
+zebra              26.26  20.82 1570.44  26.68  21.17 1814.11
+zern               17.18  23.60  876.26  16.94  33.15  995.14
+
+
+RedHat; gcc 3.2.2; Intel Pentium 1.1GHz; 2GB memory
+
+MLton0 -- /home/fluet/mlton/mlton.svn.HEAD/build/bin/mlton -codegen native
+MLton1 -- /home/fluet/mlton/mlton.svn.HEAD/build/bin/mlton -codegen c
+MLton2 -- /home/fluet/mlton/mlton.svn.x86_64/build/bin/mlton -codegen native
+MLton3 -- /home/fluet/mlton/mlton.svn.x86_64/build/bin/mlton -codegen c
+run time ratio
+benchmark         MLton0 MLton1 MLton2 MLton3
+barnes-hut          1.00   1.03   1.05   1.16
+boyer               1.00   1.17   1.04   1.22
+checksum            1.00   0.83   2.18   3.15
+count-graphs        1.00   1.44   1.74   2.42
+DLXSimulator        1.00   1.07   1.13   1.20
+fft                 1.00   1.04   1.01   1.17
+fib                 1.00   1.35   1.00   1.32
+flat-array          1.00   1.49   1.08   1.50
+hamlet              1.00   2.01   1.09   2.13
+imp-for             1.00   1.67   1.00   1.30
+knuth-bendix        1.00   1.98   1.00   2.12
+lexgen              1.00   1.34   1.08   1.39
+life                1.00   1.25   1.09   1.30
+logic               1.00   1.30   1.20   1.53
+mandelbrot          1.00   1.08   1.00   1.04
+matrix-multiply     1.00   1.08   1.00   0.99
+md5                 1.00   1.39   1.47   2.24
+merge               1.00   1.00   1.00   1.00
+mlyacc              1.00   1.30   1.08   1.38
+model-elimination   1.00   1.35   1.11   1.43
+mpuz                1.00   1.63   0.97   1.91
+nucleic             1.00   1.06   1.02   1.10
+output1             1.00   1.73   0.94   1.57
+peek                1.00   1.98   1.00   1.39
+psdes-random        1.00   0.93   1.00   0.98
+ratio-regions       1.00   1.39   1.01   1.42
+ray                 1.00   1.05   0.99   1.25
+raytrace            1.00   1.44   1.00   1.49
+simple              1.00   1.53   0.82   1.60
+smith-normal-form   1.00   1.00   1.07   1.08
+tailfib             1.00   2.42   1.00   2.40
+tak                 1.00   1.12   1.01   1.05
+tensor              1.00   2.87   1.00   1.83
+tsp                 1.00   1.46   1.04   1.51
+tyan                1.00   1.18   1.25   1.36
+vector-concat       1.00   1.48   0.99   1.20
+vector-rev          1.00   1.20   0.93   1.01
+vliw                1.00   1.36   1.04   1.55
+wc-input1           1.00   1.90   1.09   1.55
+wc-scanStream       1.00   1.38   1.04   1.46
+zebra               1.00   1.20   1.12   1.34
+zern                1.00   1.24   1.12   1.54
+size
+benchmark            MLton0    MLton1    MLton2    MLton3
+barnes-hut           97,508    97,306   120,294   116,848
+boyer               136,927   142,863   160,418   165,470
+checksum             51,663    51,311    71,706    73,842
+count-graphs         65,295    73,315    88,674    95,194
+DLXSimulator        127,763   136,771   149,829   154,097
+fft                  62,846    70,210    82,509    88,137
+fib                  46,083    51,327    69,286    72,846
+flat-array           46,123    51,323    69,358    73,914
+hamlet            1,254,374 1,363,870 1,264,408 1,344,664
+imp-for              45,955    51,099    69,158    72,738
+knuth-bendix        107,539   125,899   130,949   146,065
+lexgen              202,036   233,364   223,350   243,438
+life                 64,491    67,455    87,870    89,438
+logic               104,943    99,311   128,614   121,806
+mandelbrot           46,019    51,251    69,318    72,890
+matrix-multiply      46,559    51,859    69,810    73,490
+md5                  76,019    75,419   100,965    99,785
+merge                47,679    52,663    70,914    74,478
+mlyacc              505,988   610,056   528,634   649,166
+model-elimination   635,421   712,925   643,211   701,051
+mpuz                 48,987    55,991    72,110    77,310
+nucleic             196,751   149,485   220,274   171,076
+output1              79,133    77,813   101,909    98,473
+peek                 74,683    77,835    97,653    98,825
+psdes-random         46,715    52,127    69,934    73,782
+ratio-regions        72,275    87,903    95,350   106,166
+ray                 180,588   193,178   190,398   206,080
+raytrace            260,753   317,931   272,662   323,609
+simple              220,727   257,329   242,305   269,807
+smith-normal-form   180,099   188,743   204,361   211,045
+tailfib              45,747    51,067    68,950    72,730
+tak                  46,163    51,307    69,398    72,890
+tensor               95,986   105,482   119,796   127,380
+tsp                  80,579    81,508   103,501   104,954
+tyan                133,243   143,675   157,293   170,973
+vector-concat        47,379    52,643    70,582    74,330
+vector-rev           46,607    51,767    69,842    73,342
+vliw                391,203   452,871   395,557   450,293
+wc-input1           100,239   107,207   123,197   128,853
+wc-scanStream       107,511   107,671   130,497   129,437
+zebra               139,535   137,751   162,385   159,665
+zern                 88,236    95,996    95,376   101,680
+compile time
+benchmark         MLton0 MLton1 MLton2 MLton3
+barnes-hut          9.28  13.52  14.43  19.60
+boyer               9.55  28.24  15.23  34.07
+checksum            6.51   6.88  11.99  12.31
+count-graphs        7.30   9.48  12.74  14.96
+DLXSimulator       10.03  18.45  15.63  23.99
+fft                 6.98   7.97  12.60  13.52
+fib                 6.43   6.80  11.88  12.50
+flat-array          6.45   6.78  11.96  14.45
+hamlet            114.74 240.97 144.92 274.80
+imp-for             6.57   6.88  12.00  12.28
+knuth-bendix        8.31  14.22  14.11  20.17
+lexgen             11.86  25.81  17.46  31.33
+life                7.10   9.10  12.50  14.70
+logic               8.52  14.08  14.15  19.53
+mandelbrot          6.44   6.86  12.02  12.35
+matrix-multiply     6.57   6.92  12.08  12.48
+md5                 7.29   9.64  12.99  15.69
+merge               6.55   6.93  11.98  12.49
+mlyacc             27.38  74.39  34.33  79.63
+model-elimination  28.79  85.30  36.07  89.53
+mpuz                6.60   7.31  12.11  12.71
+nucleic            13.67  39.16  19.36  44.77
+output1             7.26   9.64  12.94  15.49
+peek                7.10   9.18  12.86  14.92
+psdes-random        6.49   6.80  12.00  12.47
+ratio-regions       7.77  10.97  13.27  16.10
+ray                10.59  20.62  16.37  26.32
+raytrace           14.88  36.18  21.19  42.23
+simple             12.83  30.33  18.15  33.79
+smith-normal-form  10.53  79.19  16.41  97.25
+tailfib             6.69   7.00  13.06  13.25
+tak                 7.41   9.00  14.06  12.38
+tensor              9.42  14.23  16.44  20.07
+tsp                 7.68  10.41  13.33  16.61
+tyan               10.74  20.20  16.64  26.49
+vector-concat       6.56   7.31  12.01  12.41
+vector-rev          6.96   8.09  15.08  12.44
+vliw               24.07  52.93  27.06  57.14
+wc-input1           8.39  13.38  15.94  23.38
+wc-scanStream       8.23  13.28  14.53  20.53
+zebra              11.06  17.77  15.57  23.29
+zern                7.45   8.66  13.06  16.02
+run time
+benchmark         MLton0 MLton1 MLton2 MLton3
+barnes-hut         44.53  45.98  46.83  51.87
+boyer              55.60  65.19  57.62  68.05
+checksum           97.34  80.59 211.83 306.25
+count-graphs       40.15  57.72  69.80  97.06
+DLXSimulator       85.24  91.25  96.11 102.38
+fft                35.98  37.38  36.41  42.09
+fib                70.23  94.84  70.23  92.87
+flat-array         24.93  37.08  26.86  37.37
+hamlet             50.91 102.55  55.62 108.50
+imp-for            46.81  78.06  46.80  61.02
+knuth-bendix       38.03  75.16  37.99  80.62
+lexgen             44.15  58.97  47.56  61.29
+life               14.89  18.63  16.18  19.41
+logic              53.38  69.48  64.10  81.84
+mandelbrot         55.98  60.45  55.97  58.03
+matrix-multiply     7.47   8.07   7.50   7.37
+md5                53.16  73.84  78.02 118.96
+merge              77.94  77.81  77.99  77.71
+mlyacc             40.97  53.38  44.35  56.66
+model-elimination  77.74 104.80  86.44 111.25
+mpuz               41.84  68.04  40.67  80.00
+nucleic            42.22  44.77  43.24  46.29
+output1            16.02  27.75  15.08  25.12
+peek               44.63  88.53  44.52  62.01
+psdes-random       38.86  36.12  38.85  38.19
+ratio-regions      51.77  71.99  52.52  73.43
+ray                33.97  35.83  33.70  42.58
+raytrace           42.30  61.08  42.47  63.03
+simple             60.46  92.79  49.83  96.88
+smith-normal-form  35.23  35.25  37.70  38.20
+tailfib            43.77 105.75  43.79 104.84
+tak                27.60  30.86  27.76  28.95
+tensor             58.98 169.31  59.06 107.94
+tsp                59.66  87.04  62.28  90.29
+tyan               59.12  69.47  73.83  80.61
+vector-concat      85.93 126.95  85.20 102.89
+vector-rev        122.82 147.47 113.88 123.68
+vliw               53.94  73.27  56.20  83.77
+wc-input1          39.13  74.19  42.66  60.60
+wc-scanStream      32.78  45.37  34.23  48.01
+zebra              43.41  51.91  48.45  58.28
+zern               43.72  54.23  48.77  67.33

Added: mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/exec-summary.0.txt
===================================================================
--- mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/exec-summary.0.txt	2006-10-22 02:26:17 UTC (rev 4745)
+++ mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/exec-summary.0.txt	2006-10-22 02:33:18 UTC (rev 4746)
@@ -0,0 +1,47 @@
+
+Notes on the status of the x86_64 port of MLton.
+=======================================================================
+
+Summary:
+
+The runtime system (i.e., garbage collector and related services) has
+been rewritten to be configurable along two independent axes: the
+native pointer size and the ML heap object pointer size.  There are no
+known functionality or performance regressions with respect to the
+rewritten runtime and the mainline runtime.
+
+The next step will be modify the Basis Library implementation (on both
+the SML and C sides) to be agnostic to the native representation of
+primitive C-types (e.g., int, long); this is important for getting the
+right representation for file descriptors, etc.  This step ensures
+that the Basis Library implementation may be shared between 32-bit and
+64-bit systems.
+
+Following that, it should be possible to push changes through the
+compiler proper to support a C-codegen in which all pointers are
+64-bit.  After shaking out bugs there, we should be able to consider
+supporting smaller ML-pointer representations and a simple native
+codegen.
+
+Timetable:
+
+It is expected that the Basis Library changes and the C-codegen will
+be completed by March 1.
+
+
+Technical Question:
+
+One of the native representations that changes from a 32-bit system to
+a 64-bit system is the GNU MP representation of arbitrary precision
+integers.  Hence, the MLton.IntInf representation datatype
+
+        datatype rep =
+           Big of Word.word Vector.vector
+         | Small of Int.int
+
+may not suffice (in the situations where Int.int and/or Word.word are
+32-bit but the host system is 64-bit).  We are considering the best
+way to accomodate IntInf in the 64-bit setting, but we recall that
+Polyspace has used MLton.IntInf.rep in the past, and wanted to ask if
+there were any particular requirements on maintaining or changing the
+interface.

Added: mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/exec-summary.1.txt
===================================================================
--- mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/exec-summary.1.txt	2006-10-22 02:26:17 UTC (rev 4745)
+++ mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/exec-summary.1.txt	2006-10-22 02:33:18 UTC (rev 4746)
@@ -0,0 +1,61 @@
+
+Notes on the status of the x86_64 port of MLton.
+=======================================================================
+
+Summary:
+
+The runtime system (i.e., garbage collector and related services) has
+been rewritten to be configurable along two independent axes: the
+native pointer size and the ML heap object pointer size.  There are no
+known functionality or performance regressions with respect to the
+rewritten runtime and the mainline runtime.
+
+The Basis Library has been refactored so that it is compile-time
+configurable on the following axes:
+
+   OBJPTR -- size of an object pointer (32-bits or 64-bits)
+   HEADER -- size of an object header (32-bits or 64-bits)
+   SEQINDEX -- size of an array/vector length (32-bits or 64-bits)
+
+   DEFAULT_CHAR -- size of Char.char (8-bits; no choice according to spec)
+   DEFAULT_INT -- size of Int.int (32-bits, 64-bits, and IntInf.int)
+   DEFAULT_REAL -- size of Real.real (32-bits, 64-bits)
+   DEFAULT_WORD -- size of Word.word (32-bits, 64-bits)
+
+   C_TYPES -- sizes of various primitive C types
+
+The object pointer and object header are needed for the IntInf
+implemention.  Configuring the default sizes support both adopting
+64-bit integers and words as the default on 64-bit platforms, but also
+supports retaining 32-bit integers and words as the default on 64-bit
+platforms.  The sizes of primitive C types are determined by the
+target architecture and operating system.  This ensures that the Basis
+Library uses the right representation for file descriptors, etc., and
+ensures that the implementation may be shared between 32-bit and
+64-bit systems.  There are no known functionality or performance
+regressions with respect to the refactored Basis Library
+implementation and the mainline implementation.
+
+The next step is to push changes through the compiler proper to
+support a C-codegen in which all pointers are 64-bit.  After shaking
+out bugs there, we should be able to consider supporting smaller
+ML-pointer representations and a simple native codegen.
+
+
+MLton.IntInf changes:
+
+As noted above, the object pointer size is needed by the IntInf
+implementation, which represents an IntInf.int either as a pointer to
+a vector of GNU MP mp_limb_t objects or as the upper bits of a
+pointer.  Since the representation of mp_limb_t changes from a 32-bit
+system to a 64-bit system, and the size of an object pointer may be
+compile-time configurable, we have changed the MLTON_INTINF signature
+to have the following:
+
+      structure BigWord : WORD
+      structure SmallInt : INTEGER
+
+      datatype rep =
+         Big of BigWord.word vector
+       | Small of SmallInt.int
+      val rep: t -> rep

Copied: mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/mltongc.txt (from rev 4742, mlton/branches/on-20050822-x86_64-branch/runtime/gc/mltongc.txt)

Added: mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/semantics.txt
===================================================================
--- mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/semantics.txt	2006-10-22 02:26:17 UTC (rev 4745)
+++ mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/semantics.txt	2006-10-22 02:33:18 UTC (rev 4746)
@@ -0,0 +1,28 @@
+Structure       Val             From    To      Semantics
+-------------------------------------------------------------------------------
+Word            fromInt         int     word    lowbits or sign-extend
+Word            fromIntZ        int     word    lowbits of zero-extend
+Word            fromWord        word    word    lowbits or zero-extend
+Word            fromWordX       word    word    lowbits of sign-extend
+Word            toInt           word    int     overflow check, unsigned
+Word            toIntX          word    int     overflow check, signed
+Word            toWord          word    word    lowbits or zero-extend
+Word            toWordX         word    word    lowbits or sign-extend
+
+Int             fromInt         int     int     overflow check, signed
+Int             fromWord        word    int     overflow check, unsigned
+Int             fromWordX       word    int     overflow check, signed
+Int             toInt           int     int     overflow check, signed
+Int             toWord          int     word    lowbits or zero-extend
+Int             toWordX         int     word    lowbits or sign-extend
+
+
+From:           int, word
+To:             int, word
+Semantics:      lowbits or sign-extend,
+                lowbits or zero-extend,
+                overflow check, unsigned
+                overflow check, signed
+
+
+Primitives are all: lowbits or sign-extend, lowbits or zero-extend

Added: mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/status.0.txt
===================================================================
--- mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/status.0.txt	2006-10-22 02:26:17 UTC (rev 4745)
+++ mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/status.0.txt	2006-10-22 02:33:18 UTC (rev 4746)
@@ -0,0 +1,87 @@
+
+Notes on the status of the x86_64 port of MLton.
+=======================================================================
+
+Sources:
+
+Work is progressing on the x86_64 branch; interested parties may check
+out the latest revision with:
+
+svn co svn://mlton.org/mlton/branches/on-20050822-x86_64-branch mlton.x86_64
+
+and view the sources on the web at:
+
+http://mlton.org/cgi-bin/viewsvn.cgi/mlton/branches/on-20050822-x86_64-branch/
+
+
+Background:
+
+(* Representing 64-bit pointers. *)
+http://mlton.org/pipermail/mlton/2004-October/026162.html
+(* MLton GC overview *)
+http://mlton.org/pipermail/mlton/2005-July/027585.html
+
+
+Summary:
+
+Thus far, the garbage collector (and related services) have been
+rewritten to be native pointer size agnostic with configurable heap
+object pointer representation.  There are no known regressions with
+respect to the rewritten GC and the present 32-bit compiler.  The next
+step will be to make the Basis Library implementation agnostic to the
+native representation of primitive C-types (e.g., int, char*, etc.).
+This will ensure that the Basis Library implementation may be shared
+among 32-bit and 64-bit systems.  Following that, I believe that it
+will be possible to push changes through the compiler proper to
+support a C-codegen in which all pointers are 64-bit.  After shaking
+out bugs there, we should be able to consider supporting smaller
+ML-pointer representations.
+
+
+Details:
+
+Thus far, code modifications have been limited to the runtime/
+directory:
+
+http://mlton.org/cgi-bin/viewsvn.cgi/mlton/branches/on-20050822-x86_64-branch/runtime/
+
+The new gc/ sub-directory breaks down the GC implementation into
+smaller pieces.  For efficiency, they are #include-ed together to form
+a single compilation unit to feed to the C compiler.
+
+A key design decision has been to implement the GC in a manner that is
+agnostic to the native pointer size and to the desired ML-pointer
+representation.  The file model.h encapsulates the key attributes that
+describe an ML-pointer representation, and the files objptr.{h,c}
+encapsulate the conversions between native pointers and ML-pointers.
+In most places, such conversions are relatively routine.  One major
+exception is that some care must be taken with threading of internal
+pointers for the Jonker's mark-compact GC, since it must compensate
+for the possibility that an ML-pointer is not the same size as an
+ML-header.
+
+Similarly, any assumptions about the native WORD_SIZE has been
+removed.  All object sizes are measured in 8-bit bytes and stored in
+size_t variables.  Statistics are gathered in uintmax_t and intmax_t
+variables.
+
+The C-side of the Basis Library implementation is entirely agnostic to
+the representation of ML-objects (pointers, headers, etc.).  That is,
+the FFI assumes that all ML-objects are passed by their native pointer
+representation.  Consequently, all functions exported by the GC to the
+Basis Library are expressed in terms of native pointers.
+
+The one, and only, exception is that basis/IntInf.c requires some
+additional information about ML-header sizes, the layout of the
+GC_state struct, etc.  It isn't clear that there is signficant benefit
+to be had by making the implementation agnostic to these decisions.
+
+Some decisions need to be made about the representation and
+implementation of IntInf.int.  The salient point is that on a 64-bit
+system, a GMP limb is represented as a 64-bit object.  
+
+
+With regards to the next step, I believe it will be worthwile to
+follow the technique used in the MLNLFFI-library implemantation.
+There, we use two ML Basis path variables (TARGET_ARCH, TARGET_OS) to
+choose the correct ML representation for primitive C types.

Added: mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/status.1.txt
===================================================================
--- mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/status.1.txt	2006-10-22 02:26:17 UTC (rev 4745)
+++ mlton/branches/on-20050822-x86_64-branch/doc/x86_64-port-notes/status.1.txt	2006-10-22 02:33:18 UTC (rev 4746)
@@ -0,0 +1,299 @@
+
+Notes on the status of the x86_64 port of MLton.
+=======================================================================
+
+Sources:
+
+Work is progressing on the x86_64 branch; interested parties may check
+out the latest revision with:
+
+svn co svn://mlton.org/mlton/branches/on-20050822-x86_64-branch mlton.x86_64
+
+and view the sources on the web at:
+
+http://mlton.org/cgi-bin/viewsvn.cgi/mlton/branches/on-20050822-x86_64-branch/
+
+
+Background:
+
+(* Representing 64-bit pointers. *)
+http://mlton.org/pipermail/mlton/2004-October/026162.html
+(* MLton GC overview *)
+http://mlton.org/pipermail/mlton/2005-July/027585.html
+(* Runtime rewrite *)
+http://mlton.org/pipermail/mlton/2005-December/028421.html
+
+
+Summary:
+
+Since the last summary, the Basis Library has been refactored so that
+it is compile-time configurable on the following axes:
+
+   OBJPTR -- size of an object pointer (32-bits or 64-bits)
+   HEADER -- size of an object header (32-bits or 64-bits)
+   SEQINDEX -- size of an array/vector length (32-bits or 64-bits)
+
+   DEFAULT_CHAR -- size of Char.char (8-bits; no choice according to spec)
+   DEFAULT_INT -- size of Int.int (32-bits, 64-bits, and IntInf.int)
+   DEFAULT_REAL -- size of Real.real (32-bits, 64-bits)
+   DEFAULT_WORD -- size of Word.word (32-bits, 64-bits)
+
+   C_TYPES -- sizes of various primitive C types
+
+The object pointer and object header are needed for the IntInf
+implemention.  Configuring the default sizes support both adopting
+64-bit integers and words as the default on 64-bit platforms, but also
+supports retaining 32-bit integers and words as the default on 64-bit
+platforms.  The sizes of primitive C types are determined by the
+target architecture and operating system.  This ensures that the Basis
+Library uses the right representation for file descriptors, etc., and
+ensures that the implementation may be shared between 32-bit and
+64-bit systems.
+
+
+MLton.IntInf changes:
+
+As noted above, the object pointer size is needed by the IntInf
+implementation, which represents an IntInf.int either as a pointer to
+a vector of GNU MP mp_limb_t objects or as the upper bits of a
+pointer.  Since the representation of mp_limb_t changes from a 32-bit
+system to a 64-bit system, and the size of an object pointer may be
+compile-time configurable, we have changed the MLTON_INTINF signature
+to have the following:
+
+      structure BigWord : WORD
+      structure SmallInt : INTEGER
+
+      datatype rep =
+         Big of BigWord.word vector
+       | Small of SmallInt.int
+      val rep: t -> rep
+
+
+Technical Details:
+
+The key techniques used in the refactoring of the Basis Library is
+aggressive use of ML Basis path variables, successive rebindings of
+structures, and special 'Choose' functors.  I'll describe each of
+these a little below.
+
+The Basis Library implementation is organized as a large ML Basis
+project.  In order to establish the appropriate mappings between C
+primitive types (int, long long int, etc.) and ML types (Int32.int,
+Int64.int, etc), we use the $(TARGET_ARCH) and $(TARGET_OS) path
+variables to elaborate a target specific c-types.sml file:
+
+      <basis>/config/c/$(TARGET_ARCH)-$(TARGET_OS)/c-types.sml
+
+The c-types.sml file is generated automatically for each target
+system, using the runtime/gen/gen-types.c program, and looks something
+like:
+
+(* C *)
+structure C_Char = struct open Int8 type t = int end
+functor C_Char_ChooseIntN (A: CHOOSE_INTN_ARG) = ChooseIntN_Int8 (A)
+structure C_SChar = struct open Int8 type t = int end
+functor C_SChar_ChooseIntN (A: CHOOSE_INTN_ARG) = ChooseIntN_Int8 (A)
+structure C_UChar = struct open Word8 type t = word end
+...
+structure C_Size = struct open Word32 type t = word end
+functor C_Size_ChooseWordN (A: CHOOSE_WORDN_ARG) = ChooseWordN_Word32 (A)
+...
+structure C_Off = struct open Int64 type t = int end
+functor C_Off_ChooseIntN (A: CHOOSE_INTN_ARG) = ChooseIntN_Int64 (A)
+...
+structure C_UId = struct open Word32 type t = word end
+functor C_UId_ChooseWordN (A: CHOOSE_WORDN_ARG) = ChooseWordN_Word32 (A)
+...
+(* from "gmp.h" *)
+structure C_MPLimb = struct open Word32 type t = word end
+functor C_MPLimb_ChooseWordN (A: CHOOSE_WORDN_ARG) = ChooseWordN_Word32 (A)
+
+Note that each C type has a corresponding structure which is bound to
+an Int<N> or Word<N> structure of the appropriate signedness and size.
+The extra binding of "type t = int" or "type t = word" ensures that
+the Basis Library may refer to C_TYPE.t, rather than C_TYPE.int or
+C_TYPE.word, for types whose signedness isn't specified by the
+standard.  (For example, uid_t and gid_t are only required to be
+integral types; in glibc, they happen to be unsigned.)
+
+When elaborating the MLB file that implements the Basis Library, we
+include
+
+      <basis>/config/c/$(TARGET_ARCH)-$(TARGET_OS)/c-types.sml
+
+multiple times, to rebind the C_TYPE structures to successively more
+complete implementations of the ML structures.  (For example, we need
+C_MPLimb to implement IntInf, but we need IntInf to implement
+Word32.toLargeInt.  Hence, we first bind C_MPLimb to a minimal,
+primitive structure, which provides enough to implement a little bit
+of IntInf, which in turn provides enough to implement
+Word32.toLargeInt, which we then rebind to C_MPLimb.)
+
+In a similar manner, we successively bind the default Int structure
+via:
+
+      <basis>/config/default/$(DEFAULT_INT)
+
+where the $(DEFAULT_INT) path variable denotes a file that looks
+something like:
+
+structure Int = Int32
+type int = Int.int
+
+functor Int_ChooseInt (A: CHOOSE_INT_ARG) :
+   sig val f : Int.int A.t end =
+   ChooseInt_Int32 (A)
+
+The 'Choose' functors are the mechanism by which we ensure that the
+majority of the Basis Library implemenation may be shared, while
+remaining "parametric" in the primitive C types and the default ML
+types.  Consider, for example, the INTEGER signature:
+
+signature INTEGER =
+   sig
+     type int
+
+     val fromInt: Int.int -> int
+     val toInt: int -> Int.int
+
+     ...
+   end
+
+How may we efficiently implement the Int8, Int16, Int32, and Int64
+structures, when the bindings for Int<N>.{from,to}Int must be
+different for the different choices of Int.int?  The solution adopted
+is to ensure that each "pre-implementation" of Int<N> knows how to
+convert to and from each possible choice of Int.int.  That is, we have
+
+signature PRE_INTEGER =
+   sig
+     type int
+
+     val fromInt8: Primitive.Int8.int -> int
+     val fromInt16: Primitive.Int16.int -> int
+     val fromInt32: Primitive.Int32.int -> int
+     val fromInt64: Primitive.Int64.int -> int
+     val fromIntInf: Primitive.IntInf.int -> int
+     val toInt8: int -> Primitive.Int8.int
+     val toInt16: int -> Primitive.Int16.int
+     val toInt32: int -> Primitive.Int32.int
+     val toInt64: int -> Primitive.Int64.int
+     val toIntInf: int -> Primitive.IntInf.int
+
+     ...
+   end
+
+We use a functor to convert each PRE_INTEGER to an INTEGER; within
+this functor, we use the Int_ChooseInt functor to select the
+appropriate conversion:
+
+functor Int (structure I : PRE_INTEGER) : INTEGER =
+   struct
+     type int = I.int
+
+     local
+       structure S =
+       Int_ChooseInt
+       (type 'a = 'a -> int
+        val fInt8 = I.fromInt8
+        val fInt16 = I.fromInt16
+        val fInt32 = I.fromInt32
+        val fInt64 = I.fromInt64
+        val fIntInf = I.fromIntInf)
+     in
+       val fromInt = S.f
+     end
+
+     local
+       structure S =
+       Int_ChooseInt
+       (type 'a = int -> 'a
+        val fInt8 = I.toInt8
+        val fInt16 = I.toInt16
+        val fInt32 = I.toInt32
+        val fInt64 = I.toInt64
+        val fIntInf = I.toIntInf)
+     in
+       val toInt = S.f
+     end
+
+     ...
+end
+
+The implementation of the 'Choose' functors is the obvious one:
+
+signature CHOOSE_INT_ARG =
+   sig
+      type 'a t
+      val fInt8: Int8.int t
+      val fInt16: Int16.int t
+      val fInt32: Int32.int t
+      val fInt64: Int64.int t
+      val fIntInf: IntInf.int t
+   end
+
+functor ChooseInt_Int8 (A : CHOOSE_INT_ARG) : 
+   sig val f : Int8.int A.t end = 
+   struct val f = A.fInt8 end
+functor ChooseInt_Int16 (A : CHOOSE_INT_ARG) : 
+   sig val f : Int16.int A.t end = 
+   struct val f = A.fInt16 end
+functor ChooseInt_Int32 (A : CHOOSE_INT_ARG) : 
+   sig val f : Int32.int A.t end = 
+   struct val f = A.fInt32 end
+functor ChooseInt_Int64 (A : CHOOSE_INT_ARG) : 
+   sig val f : Int64.int A.t end = 
+   struct val f = A.fInt64 end
+functor ChooseInt_IntInf (A : CHOOSE_INT_ARG) : 
+   sig val f : IntInf.int A.t end = 
+   struct val f = A.fIntInf end
+
+As a convenience mechanism, the $(DEFAULT_CHAR), $(DEFAULT_INT),
+$(DEFAULT_REAL), and $(DEFAULT_WORD) path variables are set by the
+compiler, and may be controlled by a compiler flag:
+
+  -default-type type
+      Specify the default binding for a primitive type.  For example,
+      '-default-type word64' causes the top-level type word and the 
+      top-level structure Word in the Basis Library to be equal to
+      Word64.word and Word64:WORD, respectively.  Similarly, 
+      '-default-type intinf' causes the top-level type int and the 
+      top-level structure Int in the Basis Library to be equal to 
+      IntInf.int and IntInf:INTEGER, respectively.
+
+As should be evident from the above, we only support power-of-two
+sized defaults.  Also, the Basis Library specification doesn't allow
+Char.char to be larger than 8bits, so '-default-type char8' is the
+only option allowed for char.  While '-default-int int8' is allowed,
+it probably isn't a good idea to set the default integer and word
+sizes to less than 32-bits, but it ought to be useful to set integers
+to IntInf.int.
+
+
+Platform Porters/Maintainers:
+
+Before merging the runtime and Basis Library changes in to HEAD, we
+would like to ensure that things are too broken on other platforms;
+I only have easy access to x86-linux and amd64-linux.
+
+It would be very helpful if individuals on other platforms (BSD and
+Darwin and Solaris particularly) could checkout the x86_64 branch and
+try to compile the runtime:
+
+  make runtime
+
+I'm specifically interested in the files c-types.h and c-types.sml
+(automatically copied to
+basis-library/config/c/$(TARGET_ARCH)-$(TARGET_OS)/), where the sizes
+and signedness of the C typedefs might be different from x86-linux.
+Second, I'm interested in any constants that aren't present on
+different platforms.  I've been following the Single UNIX
+Specification (as a superset of Posix, XOpen, and other standards).
+I'm guessing that we'll have to drop a few more things to get to the
+intersection of our platforms.
+
+Finally, the platform/* specific stuff will need to be ported.  Most
+of that should be straightforward, following what I've done to linux;
+essentially, changed some naming schemes, discharge all the gcc
+warnings, etc.  Cygwin and MinGW will be the biggest challenges.

Deleted: mlton/branches/on-20050822-x86_64-branch/runtime/TODO
===================================================================
--- mlton/branches/on-20050822-x86_64-branch/runtime/TODO	2006-10-22 02:26:17 UTC (rev 4745)
+++ mlton/branches/on-20050822-x86_64-branch/runtime/TODO	2006-10-22 02:33:18 UTC (rev 4746)
@@ -1,60 +0,0 @@
-
-* Why does hash-table use malloc/free while generational maps use mmap/munmap?
-
-* Use C99 <assert.h> instead of util/assert.{c,h}
-
-
-(* make-pdf stuff; not really x86_64 specific *)
-http://mlton.org/pipermail/mlton/2006-May/028840.html
- + http://mlton.org/pipermail/mlton/2006-June/028866.html
-
-(* drop ML 'bool' from FFI and add C 'bool' *)
-http://mlton.org/pipermail/mlton/2006-June/028927.html
- + http://mlton.org/pipermail/mlton/2006-June/028940.html
-
-(* platform dependent c-types.h; change <build>/ layout *)
-http://mlton.org/pipermail/mlton/2006-June/028943.html
- + http://mlton.org/pipermail/mlton/2006-June/028948.html
- + Revision 4665 -- build/lib/<target>/include
-
-(* Add basis-ffi.h to SVN; create .PHONY target to regenerate. *)
-http://mlton.org/pipermail/mlton/2006-June/028946.html
- + http://mlton.org/pipermail/mlton/2006-June/028947.html
-
-(* Real/Word primitives; could delay *)
-http://mlton.org/pipermail/mlton/2006-July/028963.html
- + Rename primitives to indicate that these are not bit-wise identities
-     Real_toWord, Real_toReal, Word_toReal
-   and add primitives
-     Real_toWord, Word_toReal
-   that correspond to bit-wise identities.
- + Revision 4672 -- nextAfter
-
-(* PackWord primitives; could delay *)
-http://mlton.org/pipermail/mlton/2006-May/028833.html
- + http://mlton.org/pipermail/mlton-user/2004-November/000556.html
- + http://mlton.org/pipermail/mlton/2004-November/026246.html
-
-(* Fields in GC_state *)
-http://mlton.org/pipermail/mlton/2006-July/028965.html
-
-(* Char signedness *)
-http://mlton.org/pipermail/mlton/2006-July/028970.html
- + http://mlton.org/pipermail/mlton/2006-July/028982.html
-
-(* auto-gen GC specific runtime imports *)
-http://mlton.org/pipermail/mlton/2006-July/028975.html
-
-Another minor thing I think we should do:
- * rename arch amd64 to x86_64, to be consistent with gcc target
-
-
-
-basis/MLton/allocTooLarge.c
-
-
-
-
-Revision 4658 -- convert 'int' to 'bool' by comparision with zero
-              -- revert when dropping 'bool' from FFI; comparision
-                 with zero will happen on the ML side.

Deleted: mlton/branches/on-20050822-x86_64-branch/runtime/gc/mltongc.txt
===================================================================
--- mlton/branches/on-20050822-x86_64-branch/runtime/gc/mltongc.txt	2006-10-22 02:26:17 UTC (rev 4745)
+++ mlton/branches/on-20050822-x86_64-branch/runtime/gc/mltongc.txt	2006-10-22 02:33:18 UTC (rev 4746)
@@ -1,319 +0,0 @@
-
-Notes on the MLton garbage collection system.  Until the "Thoughts on
-64-bits" section, a word is considered to be 32-bits.
-
-Garbage Collector
-=================
-
-MLton implements a relatively simple garbage collection strategy, that
-nonetheless adapts itself readily to different scenarios of memory usage.
-
-All ML objects (including ML execution stacks) are allocated in a
-contiguous heap.  The heap has the following general layout:
-
-  ---------------------------------------------------
- |    old generation    |   to space   |   nursery   |
-  ---------------------------------------------------
- ^                       ^                ^          ^
- start                   back             frontier   limit
-
-New ML objects are allocated in the nursery at the frontier.  Upon
-exhausting the nursery (i.e., when limit - frontier is insufficient
-for the next object allocation), a garbage collection is initiated.  
-
-It should be noted that in the absence of memory pressure, the
-to-space is of zero size and the old-generation is simply the live
-data from the last garbage collection.  Hence, generational garbage
-collection is only enabled when the program display sufficiently high
-memory usage.
-
-In the common, non-generational scenario, a garbage collection
-involves one of two major garbage collection strategies.  If there is
-sufficient memory to allocate a second heap of approximately the same
-size as the current heap, then a Cheney Copy garbage collection is
-performed.  (In practice, the second heap is already allocated and the
-two semi-spaces are swapped at each Cheney Copy.)  If there is
-insufficient memory for a second semi-space, then a Mark Compact
-garbage collection is performed.
-
-After a Mark Compact garbage collection, or if the live ratio is low
-enough, the runtime switches to a generational collection.  In this
-scenario, the current live data becomes the old-generation, while the
-remaining space is split into the to-space and the nursery.  A minor
-garbage collection copies live objects from the nursery to the
-beginning of to-space, thereby extending the old-generation and
-shrinking the space available for the to-space and the nursery.
-Eventually, the nursery becomes too small to accomodate new object
-allocations, and a major garbage collection is intiated.
-
-The MLton garbage collector additionally supports weak pointers and
-object finalizers, hash-consing (sharing) of both the entire heap and
-the heap reachable from individual objects, computing the dynamic size
-of objects, and provides some runtime support for profiling.
-
-In the sequel we will refer to pointers to objects in the ML heap as
-"heap pointers".  Note that a valid heap pointer is always bounded by
-the start pointer and the limit pointer of the current heap.  Hence,
-heap pointers admit representations other than the native pointer
-representation.  Furthermore, precise garbage collection requires
-identifying all heap pointers in ML objects.
-
-There are four kinds of ML objects: array, normal (fixed size), stack,
-and weak.  Each object has a header (currently, a 32-bit word), which
-immediately precedes the object data.  A heap pointer always denotes
-the address following the header (i.e., the first data word); there
-are no heap pointers to object interiors.
-
-
-A header word has the following bit layout:
-
-  00        : 1
-  01 - 19   : type index bits
-  20 - 30   : counter bits, used by mark compact GC
-       31   : mark bit, used by mark compact GC
-
-Normal objects have the following layout:
-
-  header word :: 
-  (non heap-pointers)* :: 
-  (heap pointers)*
-
-Note that the non heap-pointers denote a sequence of primitive data
-values.  These data values need not map directly to values of the
-native word size.  MLton's aggressive representation strategies may
-pack multiple primitive values into the same native word.  Likewise, a
-primitive value may span multiple native words (e.g., Word64.word).
-
-Array objects have the following layout:
-
-  counter word :: 
-  length word :: 
-  header word :: 
-  ( (non heap-pointers)* :: (heap pointers)* )*
-
-The counter word is used by mark compact GC.  The length word is the
-number of elements in the array.  Array elements have the same
-individual layout as normal objects, omitting the header word.
-
-Stack objects have the following layout:
-
-  header word ::
-  markTop pointer ::
-  markIndex word ::
-  reserved word ::
-  used word ::
-  ... reserved bytes ...
-
-The markTop pointer and markIndex word are used by mark compact GC.
-The reserved word gives the number of bytes for the stack (before the
-next ML object).  The used word gives the number of bytes currently
-used by the stack.  The sequence of reserved bytes correspond to ML
-stack frames, which will be discussed in more detail below.
-
-Weak objects have the following layout:
-
-  header word ::
-  unused word ::
-  link word ::
-  heap-pointer
-  
-
-The type index of a header word is an index into an array, where each
-element describes the layout of an object.  The 19 bits available for
-the type index means that there are only 2^19 different object layouts
-per program.  The "hello-world" program yields 37 object types in the
-array, though there are only 19 distinct object types.
-
-The type index array is declared as follows:
-
-        typedef enum { 
-                ARRAY_TAG,
-                NORMAL_TAG,
-                STACK_TAG,
-                WEAK_TAG,
-        } GC_ObjectTypeTag;
-
-        typedef struct {
-                GC_ObjectTypeTag tag;
-                Bool hasIdentity;
-                ushort numNonPointers;
-                ushort numPointers;
-        } GC_ObjectType;
-
-        GC_ObjectType *objectTypes; /* Array of object types. */
-
-The objectTypes pointer is initialized to point to a static array of
-object types that is emitted for each compiled program.  The
-hasIdentity field indicates whether or not the object has mutable
-fields, in which case it may not be hash-cons-ed.  In a normal object,
-the numNonPointers field indicates the number of 32-bit words of non
-heap-pointer data, while the numPointers field indicates the number of
-heap pointers.  In an array object, the numNonPointers field indicates
-the number of bytes of non heap-pointer data, while the numPointers
-field indicates the number of heap pointers.  In a stack object, the
-numNonPointers and numPointers fields are irrelevant.  In a weak
-object, the numNonPointers and numPointers fields are interpreted as
-in a normal object.
-
-As an example, here is a portion of the static data emitted for the
-"hello-world" program:
-
-static GC_ObjectType objectTypes[] = {
-        { 2, FALSE, 0, 0 },
-        { 0, FALSE, 1, 0 },
-        { 1, TRUE, 2, 1 },
-        { 3, FALSE, 3, 0 },
-        { 0, FALSE, 4, 0 },
-        ...
-}
-
-
-The "... reserved bytes ..." of a stack object constitute a linear
-sequence of frames.  For the purposes of garbage collection, we must
-be able to recover the size and offsets of live heap-pointers for each
-frame.  This data is declared as follows:
-
-        typedef ushort *GC_offsets;
-
-        typedef struct GC_frameLayout {
-                char isC;
-                ushort numBytes;
-                GC_offsets offsets;
-        } GC_frameLayout;
-
-        GC_frameLayout *frameLayouts;
-
-The frameLayouts pointer is initialized to point to a static array of
-frame layouts that is emitted for each compiled program.  The isC
-field identified whether or not the frame is for a C call. (Note: The
-ML stack is distinct from the system stack.  A C call executes on the
-system stack.  The frame left on the ML stack is just a marker.)  The
-numBytes field indicates the size of the frame, including space for
-the return address.  The offsets field points to an array (the zeroeth
-element recording the size of the array) whose elements record byte
-offsets from the bottom of the frame at which live heap pointers are
-located.
-
-As an example, here is a portion of the static data emitted for the
-"hello-world" program:
-
-static ushort frameOffsets0[] = {0};
-static ushort frameOffsets1[] = {2,0,4};
-static ushort frameOffsets2[] = {1,0};
-static ushort frameOffsets3[] = {2,4,16};
-static ushort frameOffsets4[] = {1,4};
-...
-static GC_frameLayout frameLayouts[] = {
-        {TRUE, 4, frameOffsets0},
-        {FALSE, 4, frameOffsets0},
-        {TRUE, 20, frameOffsets1},
-        {TRUE, 20, frameOffsets2},
-        {FALSE, 12, frameOffsets0},
-        ...
-
-
-
-Thoughts on 64-bits:
-
- * At this high level, I don't see obvious difficulties with adapting
-   the garbage collector to a 64-bit platform.  However, there are
-   certainly a number of design decisions.
-
- * What representation for heap pointers?
-   
-   There is a preliminary proposal from Stephen:
-     http://mlton.org/pipermail/mlton/2004-October/026162.html
-
-   Certainly, it would appear to be easiest to begin with a scenario
-   where heap pointers share the same representation as native
-   pointers (i.e., 64-bits).  However this means that ML objects will
-   be quite a bit bigger in the 64-bit world.  Ultimately, it would be
-   appropriate to have multiple strategies at hand.
-
-   Assuming that per-compile representation strategies are available,
-   the question arises as to how to best integrate with the runtime
-   system.  The compiler proper can handle internalizing/externalizing
-   heap pointers in the code it emits.  However, it seems likely that
-   we would want multiple libmlton.a libraries available,
-   corresponding to the different strategies.  The overhead of
-   consulting a flag in the runtime state to determine the
-   representation of heap pointers at every heap pointer dereference
-   would appear to much much too high.  The implementation may
-   certainly make use of inline functions or macros to unify the
-   different strategies, but it seems as though we will want to
-   compile different specializations of the runtime system.
-
-   Also, I think it makes sense to ensure that heap pointers passed
-   through the FFI are externalized -- that is, C code will only ever
-   see 64-bit pointers, regardless of the representation strategy.
-
-   However, there is an argument against this.  Currently, int ref ref
-   is a valid FFI type, and we currently claim that it has the
-   "natural C representation."  This claim would be broken if the
-   inner ref had a different heap pointer representation.
-
-   We could provide {extern,intern}HeapPointer functions for C, but
-   then it is not clear how to compile the C code, not knowing what
-   representation will be chosen for heap pointers.
-
- * How big should arrays be?
-
-   We currently allow arrays of size up to Int.maxInt, where Int.int
-   is a 32-bit integer.  It is a separate issue to decide how the
-   Basis Library should change in the presence of a 64-bit port, but
-   if we were to allow arrays of size up to Int64.maxInt, then the
-   representation of array objects would need to change, as the
-   counter word and the length word would need to be larger to
-   accomodate very large arrays.
-
- * Another big design decision concerns how best to accomodate both
-   the 32-bit garbage collector and the 64-bit garbage collection with
-   (much) the same code.  Sharing as much code as possible would be
-   desirable, as we do not wish the two systems to vary in any
-   significant way.
-
-   I think that this strongly suggests that all sizes and offsets are
-   measured in (8-bit) bytes.  I can't remember why array and normal
-   objects treat the numNonPointers field of a GC_ObjectType
-   differently.
-
-   I think that it also strongly suggests that we avoid the C types
-   int and long, and instead use more specific C99 types.
-
-   I also think that it is a fairly safe assumption to assume that the
-   programs compiled on 64-bit architectures are essentially the same
-   as those compiled on 32-bit architectures.  In particular, 2^19
-   object types should remain viable for some time to come.  Likewise,
-   the 10 counter bits in the header word (used to implement the mark
-   stack) should continue to be sufficient for the number of heap
-   pointers in a normal heap object.  Finally, 16-bits for the
-   numNonPointers and numPointers fields of a GC_ObjectType will
-   continue to suffice.  (For a truly absurd example, the currently
-   active exception handler is represented by a 32-bit offset from the
-   bottom of the stack.  If an ML execution stack were to grow to more
-   than 4GB, this representation would no longer suffice.)
-
-   On the other hand, it is not safe to assume that the parameters of
-   a 64-bit host system are essentially the same as a 32-bit host
-   system.  For example, in order to make decisions regarding garbage
-   collection strategies, the runtime must query the amount of
-   available RAM.  Likewise, garbage collection statistics, such as
-   bytesAllocated, bytesCopied, bytesLive, etc., could potentially be
-   an order of magnitude larger on 64-bit systems.  And, most
-   importantly, the actual size of the heap could be much larger on a
-   64-bit system.
-
- * Finally, I note that gc.c weighs in at 4826 lines, which is
-   significantly larger than almost any SML file in the compiler.
-   (The exceptions are the x86 native codegen register allocator and
-   the elaborator for the core language.)  Since we'll be going over
-   the garbage collector with a fine tooth comb anyway, it might be
-   time to start breaking it into separate implementation files.
-
-Those are some intial thoughts, and may provide a starting point for
-some discussion.
-
-_______________________________________________
-MLton mailing list
-MLton at mlton.org
-http://mlton.org/mailman/listinfo/mlton