[MLton] Scaling issue during refFlatten when building the type dependencies graph

Matthew Fluet fluet at tti-c.org
Mon Sep 8 19:12:49 PDT 2008


On Mon, 8 Sep 2008, Nicolas Bertolotti wrote:
> I recently faced a scaling issue during the refFlatten. The issue occurs 
> during the construction of a graph which uses a very large amount of RAM 
> because the edges are stored as lists and the redundant edges are not 
> automatically filtered during the construction of the graph itself.

I believe that the DirectedGraph module is meant to support graphs with 
multiple edges between nodes.  While representing the set of edges by a 
data structure other than a list might be more efficient, I don't believe 
that it would be appropriate the automatically combine edges between the 
same nodes.

> I have been able to workaround it by modifying the ref-flatten.fun file and replace the following construct:
>                      | Datatype tc =>
>                                    (ignore o Graph.addEdge)
>                                     (graph, {from = n, to = tyconNode tc})
> with:
>                      | Datatype tc =>
>                            (* NBe: Avoid redundant edges in the graph. *)
>                            let
>                                val m = tyconNode tc
>                            in
>                                if (Graph.Node.hasEdge {from = n, to = m}) then
>                                    ()
>                                else
>                                    ((ignore o Graph.addEdge)
>                                     (graph, {from = n, to = m}))
>                            end
>
> This change drastically reduced the memory usage (from millions of edges 
> to one thousand edges) and solved the problem.

Wow!  That is quite a difference (and suggests a large number of 
datatype/variants in the input program).

> I don't expect this fix to be included as is in the compiler as there 
> are probably better ways to handle this (use an other data type to store 
> the edges, use a different algorithm to build the graph ...).

Actually, I think this is the best solution to the immediate problem.  In 
any case, even with a different datatype for edges, there is no need for 
the RefFlatten graph to include duplicate edges.  It is true that the 
Node.hasEdge requires a linear search through the list of edges; that 
could be improved to constant time with another property.  But, I'm not 
sure that you would see a difference in practice; even with your large 
problem, the list of edges is now the "short" list of non-redundant edges 
and for any given node it can grow to at most the number of datatypes in 
the IL program (and will typically be much, much shorter in practice).



More information about the MLton mailing list