When one function ends by calling another function, the compiler can engage in tail-call optimization, in which the function being called reuses the caller's stack frame. This procedure is most commonly used in the SPARC architecture, where the compiler reuses the caller's register window in the function being called in order to minimize register window pressure.
The presence of this optimization causes the return
probe of the calling function to fire before the entry
probe of the called function. This ordering can lead to quite a bit of confusion. For example, if you wanted to record all functions called from a particular function and any functions that this function calls, you might use the following script:
fbt::foo:entry { self->traceme = 1; } fbt:::entry /self->traceme/ { printf("called %s", probefunc); } fbt::foo:return /self->traceme/ { self->traceme = 0; }
However, if foo
ends in an optimized tail-call, the tail-called function, and therefore any functions that it calls, will not be captured. The kernel cannot be dynamically deoptimized on the fly, and DTrace does not wish to engage in a lie about how code is structured. Therefore, you should be aware of when tail-call optimization might be used.
Tail-call optimization is likely to be used in source code similar to the following example:
return (bar());
Or in source code similar to the following example:
(void) bar(); return;
Conversely, function source code that ends like the following example cannot have its call to bar
optimized, because the call to bar
is not a tail-call:
bar(); return (rval);
You can determine whether a call has been tail-call optimized using the following technique:
While running DTrace, trace arg0
of the return
probe in question. arg0
contains the offset of the returning instruction in the function.
After DTrace has stopped, use mdb ( 1 ) to look at the function. If the traced offset contains a call to another function instead of an instruction to return from the function, the call has been tail-call optimized.
Due to the instruction set architecture, tail-call optimization is far more common on SPARC systems than on x86 systems. The following example uses mdb
to discover tail-call optimization in the kernel's dup
function:
# dtrace -q -n fbt::dup:return'{printf("%s+0x%x", probefunc, arg0);}'
While this command is running, run a program that performs a dup ( 2 ) , such as a bash process. The above command should provide output similar to the following example:
dup+0x10
^C
Now examine the function with mdb
:
# echo "dup::dis" | mdb -k
dup: sra %o0, 0, %o0
dup+4: mov %o7, %g1
dup+8: clr %o2
dup+0xc: clr %o1
dup+0x10: call -0x1278 <fcntl>
dup+0x14: mov %g1, %o7
The output shows that dup+0x10
is a call to the fcntl
function and not a ret
instruction. Therefore, the call to fcntl
is an example of tail-call optimization.