Advanced Search
Apple Developer Connection
Member Login Log In | Not a Member? Contact ADC

< Previous PageNext Page >

BSD sysctl API

The system control (sysctl) API is specifically designed for kernel parameter tuning. This functionality supersedes the syscall API, and also provides an easy way to tune simple kernel parameters without actually needing to write a handler routine in the kernel. The sysctl namespace is divided into several broad categories corresponding to the purpose of the parameters in it. Some of these areas include

Most of the time, programs use the sysctl call to retrieve the current value of a kernel parameter. For example, in Mac OS X, the hw sysctl group includes the option ncpu, which returns the number of processors in the current computer (or the maximum number of processors supported by the kernel on that particular computer, whichever is less).

The sysctl API can also be used to modify parameters (though most parameters can only be changed by the root). For example, in the net hierarchy, net.inet.ip.forwarding can be set to 1 or 0, to indicate whether the computer should forward packets between multiple interfaces (basic routing).

In this section:

General Information on Adding a sysctl
Adding a sysctl Procedure Call
Registering a New Top Level sysctl
Adding a Simple sysctl
Calling a sysctl From User Space


General Information on Adding a sysctl

When adding a sysctl, you must do all of the following first:

Adding a sysctl Procedure Call

Adding a system control (sysctl) was once a daunting task requiring changes to dozens of files. With the current implementation, a system control can be added simply by writing the appropriate handler functions and then registering the handler with the system at runtime. The old-style sysctl, which used fixed numbers for each control, is deprecated.

Note: Because this is largely a construct of the BSD subsystem, all path names in this section can be assumed to be from /path/to/xnu-version/bsd/.

Also, you may safely assume that all program code snippets should go into the main source file for your subsystem or module unless otherwise noted, and that in the case of modules, function calls should be made from your start or stop routines unless otherwise noted.

The preferred way of adding a sysctl looks something like the following:

SYSCTL_PROC(_hw, OID_AUTO, l2cr, CTLTYPE_INT|CTLFLAG_RW, 
    &L2CR, 0, &sysctl_l2cr, "I", "L2 Cache Register");

The _PROC part indicates that you are registering a procedure to provide the value (as opposed to simply reading from a static address in kernel memory). _hw is the top level category (in this case, hardware), and OID_AUTO indicates that you should be assigned the next available control ID in that category (as opposed to the old-style, fixed ID controls). l2cr is the name of your control, which will be used by applications to look up the number of your control using sysctlbyname.

Note: Not all top level categories will necessarily accept the addition of a user-specified new-style sysctl. If you run into problems, you should try a different top-level category.

CTLTYPE_INT indicates that the value being changed is an integer. Other legal values are CTLTYPE_NODE, CTLTYPE_STRING, CTLTYPE_QUAD, and CTLTYPE_OPAQUE (also known as CTLTYPE_STRUCT). CTLTYPE_NODE is the only one that isn’t somewhat obvious. It refers to a node in the sysctl hierarchy that isn’t directly usable, but instead is a parent to other entries. Two examples of nodes are hw and kern.

CTLFLAG_RW indicates that the value can be read and written. Other legal values are CTLFLAG_RD, CTLFLAG_WR, CTLFLAG_ANYBODY, and CTLFLAG_SECURE. CTLFLAG_ANYBODY means that the value should be modifiable by anybody. (The default is for variables to be changeable only by root.) CTLFLAG_SECURE means that the variable can be changed only when running at securelevel <= 0 (effectively, in single-user mode).

L2CR is the location where the sysctl will store its data. Since the address is set at compile time, however, this must be a global variable or a static local variable. In this case, L2CR is a global of type unsigned int.

The number 0 is a second argument that is passed to your function. This can be used, for example, to identify which sysctl was used to call your handler function if the same handler function is used for more than one control. In the case of strings, this is used to store the maximum allowable length for incoming values.

sysctl_l2cr is the handler function for this sysctl. The prototype for these functions is of the form

static int sysctl_l2cr SYSCTL_HANDLER_ARGS;

If the sysctl is writable, the function may either use sysctl_handle_int to obtain the value passed in from user space and store it in the default location or use the SYSCTL_IN macro to store it into an alternate buffer. This function must also use the SYSCTL_OUT macro to return a value to user space.

“I” indicates that the argument should refer to a variable of type integer (or a constant, pointer, or other piece of data of equivalent width), as opposed to “L” for a long, “A” for a string, “N” for a node (a sysctl that is the parent of a sysctl category or subcategory), or “S” for a struct. “L2 Cache Register” is a human-readable description of your sysctl.

In order for a control to be accessible from an application, it must be registered. To do this, you do the following:

sysctl_register_oid(&sysctl__hw_l2cr);

You should generally do this in an init routine for a loadable module. If your code is not part of a loadable module, you should add your sysctl to the list of built-in OIDs in the file kern/sysctl_init.c.

If you study the SYSCTL_PROC constructor macro, you will notice that sysctl__hw_l2cr is the name of a variable created by that macro. This means that the SYSCTL_PROC line must be before sysctl_register_oid in the file, and must be in the same (or broader) scope. This name is in the form of sysctl_ followed by the name of it’s parent node, followed by another underscore ( _ ) followed by the name of your sysctl.

A similar function, sysctl_unregister_oid exists to remove a sysctl from the registry. If you are writing a loadable module, you should be certain to do this when your module is unloaded.

In addition to registering your handler function, you also have to write the function. The following is a typical example

static int myhandler SYSCTL_HANDLER_ARGS
{
 
    int error, retval;
 
    error = sysctl_handle_int(oidp, oidp->oid_arg1, oidp->oid_arg2,  req);
    if (!error && req->newptr) {
        /* We have a new value stored in the standard location.*/
        /* Do with it as you see fit here. */
        printf("sysctl_test: stored %d\n", SCTEST);
    } else if (req->newptr) {
        /* Something was wrong with the write request */
        /* Do something here if you feel like it.... */
    } else {
        /* Read request. Always return 763, just for grins. */
        printf("sysctl_test: read %d\n", SCTEST);
        retval=763;
        error=SYSCTL_OUT(req, &retval, sizeof retval);
    }
    /* In any case, return success or return the reason for failure  */
    return error;
}

This demonstrates the use of SYSCTL_OUT to send an arbitrary value out to user space from the sysctl handler. The “phantom” req argument is part of the function prototype when the SYSCTL_HANDLER_ARGS macro is expanded, as is the oidp variable used elsewhere. The remaining arguments are a pointer (type indifferent) and the length of data to copy (in bytes).

This code sample also introduces a new function, sysctl_handle_int, which takes the arguments passed to the sysctl, and writes the integer into the usual storage area (L2CR in the earlier example, SCTEST in this one). If you want to see the new value without storing it (to do a sanity check, for example), you should instead use the SYSCTL_IN macro, whose arguments are the same as SYSCTL_OUT.

Registering a New Top Level sysctl

In addition to adding new sysctl options, you can also add a new category or subcategory. The macro SYSCTL_DECL can be used to declare a node that can have children. This requires modifying one additional file to create the child list. For example, if your main C file does this:

SYSCTL_DECL(_net_newcat);
SYSCTL_NODE(_net, OID_AUTO, newcat, CTLFLAG_RW, handler, "new  category");

then this is basically the same thing as declaring extern sysctl_oid_list sysctl__net_newcat_children in your program. In order for the kernel to compile, or the module to link, you must then add this line:

struct sysctl_oid_list sysctl__net_newcat_children;

If you are not writing a module, this should go in the file kern/kern_newsysctl.c. Otherwise, it should go in one of the files of your module. Once you have created this variable, you can use _net_newcat as the parent when creating a new control. As with any sysctl, the node (sysctl__net_newcat) must be registered with sysctl_register_oid and can be unregistered with sysctl_unregister_oid.

Note: When creating a top level sysctl, parent is simply left blank, for example,

SYSCTL_NODE( , OID_AUTO, _topname, flags, handler_fn, “desc”);

Adding a Simple sysctl

If your sysctl only needs to read a value out of a variable, then you do not need to write a function to provide access to that variable. Instead, you can use one of the following macros:

The first four parameters for each macro are the same as for SYSCTL_PROC (described in the previous section) as is the last parameter. The len parameter (where applicable) gives a length of the string or opaque object in bytes.

The arg parameters are pointers just like the ptr parameters. However, the parameters named ptr are explicitly described as pointers because you must explicitly use the “address of” (&) operator unless you are already working with a pointer. Parameters called arg either operate on base types that are implicitly pointers or add the & operator in the appropriate place during macro expansion. In both cases, the argument should refer to the integer, character, or other object that the sysctl will use to store the current value.

The type parameter is the name of the type minus the “struct”. For example, if you have an object of type struct scsipi, then you would use scsipi as that argument. The SYSCTL_STRUCT macro is functionally equivalent to SYSCTL_OPAQUE, except that it hides the use of sizeof.

Finally, the val parameter for SYSCTL_INT is a default value. If the value passed in ptr is NULL, this value is returned when the sysctl is used. You can use this, for example, when adding a sysctl that is specific to certain hardware or certain compile options. One possible example of this might be a special value for feature.version that means “not present.” If that feature became available (for example, if a module were loaded by some user action), it could then update that pointer. If that module were subsequently unloaded, it could set the pointer back to NULL.

Calling a sysctl From User Space

Unlike RPC, sysctl requires explicit intervention on the part of the programmer. To complicate things further, there are two different ways of calling sysctl functions, and neither one works for every control. The old-style sysctl call can only invoke a control if it is listed in a static OID table in the kernel. The new-style sysctlbyname call will work for any user-added sysctl, but not for those listed in the static table. Occasionally, you will even find a control that is registered in both ways, and thus available to both calls. In order to understand the distinction, you must first consider the functions used.

The sysctlbyname System Call

If you are calling a sysctl that was added using the new sysctl method (including any sysctl that you may have added), then your sysctl does not have a fixed number that identifies it, since it was added dynamically to the system. Since there is no approved way to get this number from user space, and since the underlying implementation is not guaranteed to remain the same in future releases, you cannot call a dynamically added control using the sysctl function. Instead, you must use sysctlbyname.

sysctlbyname(char *name, void *oldp, size_t *oldlenp,
        void *newp, u_int newlen)

The parameter name is the name of the sysctl, encoded as a standard C string.

The parameter oldp is a pointer to a buffer where the old value will be stored. The oldlenp parameter is a pointer to an integer-sized buffer that holds the current size of the oldp buffer. If the oldp buffer is not large enough to hold the returned data, the call will fail with errno set to ENOMEM, and the value pointed to by oldlenp will be changed to indicate the buffer size needed for a future call to succeed.

Here is an example for reading an integer, in this case a buffer size.

int get_debug_bufsize()
{
char *name="debug.bpf_bufsize";
int bufsize, retval;
size_t len;
len=4;
retval=sysctlbyname(name, &bufsize, &len, NULL, 0);
/* Check retval here */
return bufsize;
}

The sysctl System Call

The sysctlbyname system call is the recommended way to call system calls. However, not every built-in system control is registered in the kernel in such a way that it can be called with sysctlbyname. For this reason, you should also be aware of the sysctl system call.

Note: If you are adding a sysctl, it will be accessible using sysctlbyname. You should use this system call only if the sysctl you need cannot be retrieved using sysctlbyname. In particular, you should not assume that future versions of sysctl will be backed by traditional numeric OIDs except for the existing legacy OIDs, which will be retained for compatibility reasons.

The sysctl system call is part of the original historical BSD implementation of system controls. You should not depend on its use for any control that you might add to the system. The classic usage of sysctl looks like the following

sysctl(int *name, u_int namelen, void *oldp, size_t *oldlenp,
        void *newp, u_int newlen)

System controls, in this form, are based on the MIB, or Management Information Base architecture. A MIB is a list of objects and identifiers for those objects. Each object identifier, or OID, is a list of integers that represent a tokenization of a path through the sysctl tree. For example, if the hw class of sysctl is number 3, the first integer in the OID would be the number 3. If the l2cr option is built into the system and assigned the number 75, then the second integer in the OID would be 75. To put it another way, each number in the OID is an index into a node’s list of children.

Here is a short example of a call to get the bus speed of the current computer:

int get_bus_speed()
{
int mib[2], busspeed, retval;
unsigned int miblen;
size_t len;
mib[0]=CTL_HW;
mib[1]=HW_BUS_FREQ;
miblen=2;
len=4;
retval=sysctl(mib, miblen, &busspeed, &len, NULL, 0);
/* Check retval here */
return busspeed;
}

For more information on the sysctl system call, see the manual page sysctl(3).



< Previous PageNext Page >


Last updated: 2006-11-07




Did this document help you?
Yes: Tell us what works for you.

It’s good, but: Report typos, inaccuracies, and so forth.

It wasn’t helpful: Tell us what would have helped.
Get information on Apple products.
Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Copyright © 2007 Apple Inc.
All rights reserved. | Terms of use | Privacy Notice