Chapter 4. Using DTrace

Table of Contents

Performance Monitoring
Examining Performance Problems With The sysinfo Provider
Tracing User Processes
Using the copyin and copyinstr Subroutines
Eliminating dtrace Interference
syscall Provider
The ustack Action
The pid Provider
Anonymous Tracing
Anonymous Enablings
Claiming Anonymous State
Anonymous Tracing Examples
Speculative Tracing
Speculation Interfaces
Creating a Speculation
Using a Speculation
Committing a Speculation
Discarding a Speculation
Speculation Example

Performance Monitoring

Several DTrace providers implement probes that correspond to existing performance monitoring tools:

  • The vminfo provider implements probes that correspond to the vmstat ( 1M ) tool

  • The sysinfo provider implements probes that correspond to the mpstat ( 1M ) tool

  • The io provider implements probes that correspond to the iostat ( 1M ) tool

  • The syscall provider implements probes that correspond to the truss ( 1 ) tool

You can use the DTrace facility to extract the same information that the bundled tools provide, but with greater flexibility. The DTrace facility provides arbitrary kernel information that is available at the time that the probes fire. The DTrace facility enables you to receive information such as process identification, thread identification, and stack traces.

Examining Performance Problems With The sysinfo Provider

The sysinfo provider makes available probes that correspond to the sys kernel statistics. These statistics provide the input for system monitoring utilities such as mpstat. The sysinfo provider probes fire immediately before the sys named kstat increments. The probes that are provided by the sysinfo provider are in the following list.

bawrite

Probe that fires whenever a buffer is about to be asynchronously written out to a device.

bread

Probe that fires whenever a buffer is physically read from a device. bread fires after the buffer has been requested from the device, but before blocking pending its completion.

bwrite

Probe that fires whenever a buffer is about to be written out to a device, whether synchronously or asynchronously.

cpu_ticks_idle

Probe that fires when the periodic system clock has made the determination that a CPU is idle. Note that this probe fires in the context of the system clock and therefore fires on the CPU running the system clock. The cpu_t argument (arg2) indicates the CPU that has been deemed idle.

cpu_ticks_kernel

Probe that fires when the periodic system clock has made the determination that a CPU is executing in the kernel. This probe fires in the context of the system clock and therefore fires on the CPU running the system clock. The cpu_t argument (arg2) indicates the CPU that has been deemed to be executing in the kernel.

cpu_ticks_user

Probe that fires when the periodic system clock has made the determination that a CPU is executing in user mode. This probe fires in the context of the system clock and therefore fires on the CPU running the system clock. The cpu_t argument (arg2) indicates the CPU that has been deemed to be running in user-mode.

cpu_ticks_wait

Probe that fires when the periodic system clock has made the determination that a CPU is otherwise idle, but some threads are waiting for I/O on the CPU. This probe fires in the context of the system clock and therefore fires on the CPU running the system clock. The cpu_t argument (arg2) indicates the CPU that has been deemed waiting on I/O.

idlethread

Probe that fires whenever a CPU enters the idle loop.

intrblk

Probe that fires whenever an interrupt thread blocks.

inv_swtch

Probe that fires whenever a running thread is forced to involuntarily give up the CPU.

lread

Probe that fires whenever a buffer is logically read from a device.

lwrite

Probe that fires whenever a buffer is logically written to a device

modload

Probe that fires whenever a kernel module is loaded.

modunload

Probe that fires whenever a kernel module is unloaded.

msg

Probe that fires whenever a msgsnd ( 2 ) or msgrcv ( 2 ) system call is made, but before the message queue operations have been performed.

mutex_adenters

Probe that fires whenever an attempt is made to acquire an owned adaptive lock. If this probe fires, one of the lockstat provider's adaptive-block or adaptive-spin probes also fires.

namei

Probe that fires whenever a name lookup is attempted in the filesystem.

nthreads

Probe that fires whenever a thread is created.

phread

Probe that fires whenever a raw I/O read is about to be performed.

phwrite

Probe that fires whenever a raw I/O write is about to be performed.

procovf

Probe that fires whenever a new process cannot be created because the system is out of process table entries.

pswitch

Probe that fires whenever a CPU switches from executing one thread to executing another.

readch

Probe that fires after each successful read, but before control is returned to the thread that is performing the read. A read can occur through the read ( 2 ) , readv ( 2 ) or pread ( 2 ) system calls. arg0 contains the number of bytes that were successfully read.

rw_rdfails

Probe that fires whenever an attempt is made to read-lock a reader or writer when the lock is held by a writer or desired by a writer. If this probe fires, the lockstat provider's rw-block probe also fires.

rw_wrfails

Probe that fires whenever an attempt is made to write-lock a reader or writer lock when the lock is held by readers or by another writer. If this probe fires, the lockstat provider's rw-block probe also fires.

sema

Probe that fires whenever a semop ( 2 ) system call is made, but before any semaphore operations have been performed.

sysexec

Probe that fires whenever an exec ( 2 ) system call is made.

sysfork

Probe that fires whenever a fork ( 2 ) system call is made.

sysread

Probe that fires whenever a read, readv, or pread system call is made.

sysvfork

Probe that fires whenever a vfork ( 2 ) system call is made.

syswrite

Probe that fires whenever a write ( 2 ) , writev ( 2 ) , or pwrite ( 2 ) system call is made.

trap

Probe that fires whenever a processor trap occurs. Note that some processors, in particular UltraSPARC variants, handle some lightweight traps through a mechanism that does not cause this probe to fire.

ufsdirblk

Probe that fires whenever a directory block is read from the UFS file system. See ufs ( 7FS ) for details on UFS.

ufsiget

Probe that fires whenever an inode is retrieved. See ufs ( 7FS ) for details on UFS.

ufsinopage

Probe that fires after an in-core inode without any associated data pages has been made available for reuse. See ufs ( 7FS ) for details on UFS.

ufsipage

Probe that fires after an in-core inode with associated data pages has been made available for reuse. This probe fires after the associated data pages have been flushed to disk. See ufs ( 7FS ) for details on UFS.

wait_ticks_io

Probe that fires when the periodic system clock has made the determination that a CPU is otherwise idle but some threads are waiting for I/O on the CPU. This probe fires in the context of the system clock and therefore fires on the CPU running the system clock. The cpu_t argument (arg2) indicates the CPU that is described as waiting for I/O. No semantic difference between wait_ticks_io and cpu_ticks_wait; wait_ticks_io exists solely for historical reasons.

writech

Probe that fires after each successful write, but before control is returned to the thread performing the write. A write can occur through the write, writev, or pwrite system calls. arg0 contains the number of bytes that were successfully written.

xcalls

Probe that fires whenever a cross-call is about to be made. A cross-call is the operating system's mechanism for one CPU to request immediate work of another CPU.

Example 4.1. Using the quantize Aggregation Function With the sysinfo Probes

The quantize aggregation function displays a power-of-two frequency distribution bar graph of its argument. The following example uses the quantize function to determine the size of the read calls that are performed by all processes on the system over a period of ten seconds. The arg0 argument for the sysinfo probes states the amount by which to increment the statistic. This value is 1 for most sysinfo probes. Two exceptions are the readch and writech probes. For these probes, the arg0 argument is set to the actual number of bytes that are read or are written, respectively.

# cat -n read.d
   1  #!/usr/sbin/dtrace -s
   2  sysinfo:::readch
   3  {
   4     @[execname] = quantize(arg0);
   5  }
   6
   7  tick-10sec
   8  {
   9     exit(0);
  10  }

# dtrace -s read.d
dtrace: script 'read.d' matched 5 probes
CPU        ID                    FUNCTION:NAME
  0     36754                      :tick-10sec

  bash
         value  ---------- Distribution ---------- count
             0 |                                   0
             1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 13
             2 |                                   0

  file
         value  ---------- Distribution ---------- count
            -1 |                                   0
             0 |                                   2
             1 |                                   0
             2 |                                   0
             4 |                                   6
             8 |                                   0
            16 |                                   0
            32 |                                   6
            64 |                                   6
           128 |@@                                 16
           256 |@@@@                               30
           512 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      199
          1024 |                                   0
          2048 |                                   0
          4096 |                                   1
          8192 |                                   1
         16384 |                                   0

  grep
         value  ---------- Distribution ---------- count
            -1 |                                   0
             0 |@@@@@@@@@@@@@@@@@@@                99
             1 |                                   0
             2 |                                   0
             4 |                                   0
             8 |                                   0
            16 |                                   0
            32 |                                   0
            64 |                                   0
           128 |                                   1
           256 |@@@@                               25
           512 |@@@@                               23
          1024 |@@@@                               24
          2048 |@@@@                               22
          4096 |                                   4
          8192 |                                   3
         16384 |                                   0

Example 4.2. Finding the Source of Cross-Calls

In this example, consider the following output form the mpstat ( 1M ) command:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0 2189   0 1302    14    1  215   12   54   28    0 12995   13  14   0  73
  1 3385   0 1137   218  104  195   13   58   33    0 14486   19  15   0  66
  2 1918   0 1039    12    1  226   15   49   22    0 13251   13  12   0  75
  3 2430   0 1284   220  113  201   10   50   26    0 13926   10  15   0  75

The values in the xcal and syscl columns are atypically high, reflecting a possible drain on system performance. The system is relatively idle and is not spending an unusual amount of time waiting for I/O. The numbers in the xcal column are scaled per second and are read from the xcalls field of the sys kstat. To see which executables are responsible for the cross-calls, enter the following dtrace command:

# dtrace -n 'xcalls {@[execname] = count()}'
dtrace: description 'xcalls ' matched 3 probes
^C
  find                                                   2
  cut                                                    2
  snmpd                                                  2
  mpstat                                                22
  sendmail                                             101
  grep                                                 123
  bash                                                 175
  dtrace                                               435
  sched                                                784
  xargs                                              22308
  file                                               89889
#

This output indicates that the bulk of the cross calls are originating from file ( 1 ) and xargs ( 1 ) processes. You can find these processes with the pgrep ( 1 ) and ptree ( 1 ) commands.

# pgrep xargs
15973
# ptree 15973
204   /usr/sbin/inetd -s
  5650  in.telnetd
    5653  -sh
      5657  bash
        15970 /bin/sh ./findtxt configuration
          15971 cut -f1 -d:
            15973 xargs file
              16686 file /usr/bin/tbl /usr/bin/troff /usr/bin/ul /usr/bin/vgrind /usr/bin/catman

This output indicates that the xargs and file commands form part of a custom user shell script. To locate this script, you can perform the following commands:

# find / -name findtxt
/usrs1/james/findtxt
# cat /usrs1/james/findtxt
#!/bin/sh
find / -type f | xargs file | grep text | cut -f1 -d: > /tmp/findtxt$$
cat /tmp/findtxt$$ | xargs grep $1
rm /tmp/findtxt$$
#

This script runs many process concurrently. A large amount of interprocess communication is happening through pipes. The number of pipes makes the script resource intensive. The script attempts to find every text file on the system and then searches each file for a specific text.