Tail-call Optimization

When one function ends by calling another function, the compiler can engage in tail-call optimization, in which the function being called reuses the caller's stack frame. This procedure is most commonly used in the SPARC architecture, where the compiler reuses the caller's register window in the function being called in order to minimize register window pressure.

The presence of this optimization causes the return probe of the calling function to fire before the entry probe of the called function. This ordering can lead to quite a bit of confusion. For example, if you wanted to record all functions called from a particular function and any functions that this function calls, you might use the following script:

fbt::foo:entry
{
	self->traceme = 1;
}

fbt:::entry
/self->traceme/
{
	printf("called %s", probefunc);
}

fbt::foo:return
/self->traceme/
{
	self->traceme = 0;
}

However, if foo ends in an optimized tail-call, the tail-called function, and therefore any functions that it calls, will not be captured. The kernel cannot be dynamically deoptimized on the fly, and DTrace does not wish to engage in a lie about how code is structured. Therefore, you should be aware of when tail-call optimization might be used.

Tail-call optimization is likely to be used in source code similar to the following example:

	return (bar());

Or in source code similar to the following example:

	(void) bar();
	return;

Conversely, function source code that ends like the following example cannot have its call to bar optimized, because the call to bar is not a tail-call:

	bar();
	return (rval);

You can determine whether a call has been tail-call optimized using the following technique:

Due to the instruction set architecture, tail-call optimization is far more common on SPARC systems than on x86 systems. The following example uses mdb to discover tail-call optimization in the kernel's dup function:

# dtrace -q -n fbt::dup:return'{printf("%s+0x%x", probefunc, arg0);}'

While this command is running, run a program that performs a dup ( 2 ) , such as a bash process. The above command should provide output similar to the following example:

dup+0x10
^C

Now examine the function with mdb:

# echo "dup::dis" | mdb -k
dup:                            sra       %o0, 0, %o0
dup+4:                          mov       %o7, %g1
dup+8:                          clr       %o2
dup+0xc:                        clr       %o1
dup+0x10:                       call      -0x1278       <fcntl>
dup+0x14:                       mov       %g1, %o7

The output shows that dup+0x10 is a call to the fcntl function and not a ret instruction. Therefore, the call to fcntl is an example of tail-call optimization.