Kernel Programming Guide: Code Profiling

Code profiling means determining how often certain pieces of code are executed. By knowing how frequently a piece of code is used, you can more accurately gauge the importance of optimizing that piece of code. There are a number of good tools for profiling user space applications. However, code profiling in the kernel is a very different beast, since it isn’t reasonable to attach to it like you would a running process. (It is possible by using a second computer, but even then, it is not a trivial task.)

This section describes two useful ways of profiling your kernel code: counters and lock profiling. Any changes you make to allow code profiling should be done only during development. These are not the sort of changes that you want to release to end users.

In this section:

Using Counters for Code Profiling

The first method of code profiling is with counters. To profile a section of code with a counter, you must first create a global variable whose name describes that piece of code and initialize it to zero. You then add something like

#ifdef PROFILING

            foo_counter++; #endif

in the appropriate piece of code. If you then define PROFILING, that counter is created and initialized to zero, then incremented each time the code in question is executed.

One small snag with this sort of profiling is the problem of obtaining the data. This can be done in several ways. The simplest is probably to install a sysctl, using the address of foo_counter as an argument. Then, you could simply issue the sysctl command from the command line and read or clear the variable. Adding a sysctl is described in more detail in “BSD sysctl API”.

In addition to using sysctl, you could also obtain the data by printing its value when unloading the module (in the case of a KEXT) or by using a remote debugger to attach to the kernel and directly inspecting the variable. However, a sysctl provides the most flexibility. With a sysctl, you can sample the value at any time, not just when the module is unloaded. The ability to arbitrarily sample the value makes it easier to determine the importance of a piece of code to one particular action.

If you are developing code for use in the I/O Kit, you should probably use your driver’s setProperties call instead of a sysctl.

Lock Profiling

Lock profiling is another useful way to find the cause of code inefficiency. Lock profiling can give you the following information:

Put another way, this allows you to determine the contention of a lock, and in so doing, can help you to minimize contention by code restructuring.

There are many different ways to do lock profiling. The most common way is to create your own lock calls that increment a counter and then call the real locking functions. When you move from debugging into a testing cycle before release, you can then replace the functions with defines to cause the actual functions to be called directly. For example, you might write something like this:

extern struct timeval time;

boolean_t   mymutex_try(mymutex_t *lock) {

    int ret;

    ret=mutex_try(lock->mutex);

    if (ret) {

        lock->tryfailcount++;

    return ret;

void    mymutex_lock(mymutex_t *lock) {

    if (!(mymutex_try(lock))) {

        mutex_lock(lock->mutex);

    lock->starttime = time.tv_sec;

void    mymutex_unlock(mymutex_t *lock) {

    lock->lockheldtime += (time.tv_sec - lock->starttime);

    lock->heldcount++;

    mutex_unlock(lock->mutex);

This routine has accuracy only to the nearest second, which is not particularly accurate. Ideally, you want to keep track of both time.tv_sec and time.tv_usec and roll the microseconds into seconds as the number gets large.

From this information, you can obtain the average time the lock was held by dividing the total time held by the number of times it was held. It also tells you the number of times a lock was taken immediately instead of waiting, which is a valuable piece of data when analyzing contention.

As with counter-based profiling, after you have written code to record lock use and contention, you must find a way to obtain that information. A sysctl is a good way of doing this, since it is relatively easy to implement and can provide a “snapshot” view of the data structure at any point in time. For more information on adding a sysctl, see “BSD sysctl API”.

Another way to do lock profiling is to use the built-in ETAP (Event Trace Analysis Package). This package consists of additional code designed for lock profiling. However, since this requires a kernel recompile, it is generally not recommended.




	Get information on Apple products. Visit the Apple Store online or at retail locations. 1-800-MY-APPLE Copyright © 2007 Apple Inc. All rights reserved. \| Terms of use \| Privacy Notice