good in general, but bad nested loops

Stephen Weeks MLton@sourcelight.com
Tue, 10 Jul 2001 15:02:44 -0700


I took a look at nestedloop.{gcc,mlton,ocaml}

First, here is how I compiled each of them.

gcc -O3 -fomit-frame-pointer -o nestedloop nestedloop.c
ocamlopt -S -o nestedloop -noassert -unsafe -ccopt -O3 nestedloop.ml
mlton -detect-overflow false -keep cps -keep g nestedloop.sml

Here are the running times for "nestedloop 23"

gcc	 .95
ocaml	1.41
mlton	2.65

These jibe pretty well with the shootout numbers.

Here is what gcc generates.

.L60:
	incl	%ebp
	decl	%edx
	jne	.L60
.L72:
	decl	%ecx
	jne	.L56
.L71:
	decl	%ebx
	jne	.L52
.L70:
	decl	%esi
	jne	.L48
.L69:
	decl	%edi
	jne	.L44
.L68:
	decl	8(%esp)
	jne	.L40

So, it has changed the loop to count down from n to 0 instead of up from 0 to n.
It has also kept five of the loop variables in registers, plus the counter.
Most importantly, the hot loop, L_60, is only three instructions.

Here's the hot loop (about 85%) for mlton.  This is the same as what Matthew
sent earlier.

loopF_0:
	movl (220*1)(%edi),%esp
	cmpl $0,%esp
	je L_84
	decl %esp
	movl %esp,(220*1)(%edi)
	incl (216*1)(%edi)
	jmp loopF_0

Here's the hot loop for ocaml.  The low bit in integers is a tag bit.

.L101:
	cmpl	$1, %ebx
	je	.L100
	addl	$-2, %ebx
	addl	$2, %eax
	jmp	.L101

So, ocaml, like gcc keeps both the loop counter and the main counter (x) in
registers.

So ocaml beats us because of register allocation, and gcc beats ocaml by
reordering the loop so that there is only one test, which saves two
instructions.

gcc is also noticing the loop nesting, which ocaml doesn't do -- in fact ocaml
leaves the nested loop as calls.  But, loop nesting isn't necessary to get most
of the speed on this benchmark.  If ocaml just did a little more better local
code generation, it would be very close to gcc.

We can get potentially the same as gcc, because we have the loop nest.  We
just(!) need to do the right register allocation and instruction reordering.  Oh
yeah, and figuring out the Overflows can't happen.