Data Normalization

When aggregating data over some period of time, you might want to normalize the data with respect to some constant factor. This technique enables you to compare disjoint data more easily. For example, when aggregating system calls, you might want to output system calls as a per-second rate instead of as an absolute value over the course of the run. The DTrace normalize action enables you to normalize data in this way. The parameters to normalize are an aggregation and a normalization factor. The output of the aggregation shows each value divided by the normalization factor.

The following example shows how to aggregate data by system call:

#pragma D option quiet

BEGIN
{
	/*
	 * Get the start time, in nanoseconds.
	 */
	start = timestamp;
}

syscall:::entry
{
	@func[execname] = count();
}

END
{
	/*
	 * Normalize the aggregation based on the number of seconds we have
	 * been running.  (There are 1,000,000,000 nanoseconds in one second.)
	 */	
	normalize(@func, (timestamp - start) / 1000000000);
}

Running the above script for a brief period of time results in the following output on a desktop machine:

# dtrace -s ./normalize.d
 ^C
  syslogd                                                           0
  rpc.rusersd                                                       0
  utmpd                                                             0
  xbiff                                                             0
  in.routed                                                         1
  sendmail                                                          2
  echo                                                              2
  FvwmAuto                                                          2
  stty                                                              2
  cut                                                               2
  init                                                              2
  pt_chmod                                                          3
  picld                                                             3
  utmp_update                                                       3
  httpd                                                             4
  xclock                                                            5
  basename                                                          6
  tput                                                              6
  sh                                                                7
  tr                                                                7
  arch                                                              9
  expr                                                             10
  uname                                                            11
  mibiisa                                                          15
  dirname                                                          18
  dtrace                                                           40
  ksh                                                              48
  java                                                             58
  xterm                                                           100
  nscd                                                            120
  fvwm2                                                           154
  prstat                                                          180
  perfbar                                                         188
  Xsun                                                           1309
  .netscape.bin                                                  3005

normalize sets the normalization factor for the specified aggregation, but this action does not modify the underlying data. This behavior the data to be denormalized with the denormalize function. denormalize takes only an aggregation. Adding the denormalize action to the preceding example returns both raw system call counts and per-second rates:

#pragma D option quiet

BEGIN
{
	start = timestamp;
}

syscall:::entry
{
	@func[execname] = count();
}

END
{
	this->seconds = (timestamp - start) / 1000000000;
	printf("Ran for %d seconds.\n", this->seconds);

	printf("Per-second rate:\n");
	normalize(@func, this->seconds);
	printa(@func);

	printf("\nRaw counts:\n");
	denormalize(@func);
	printa(@func);
}

Running the above script for a brief period of time produces output similar to the following example:

# dtrace -s ./denorm.d
^C
Ran for 14 seconds.
Per-second rate:

  syslogd                                                           0
  in.routed                                                         0
  xbiff                                                             1
  sendmail                                                          2
  elm                                                               2
  picld                                                             3
  httpd                                                             4
  xclock                                                            6
  FvwmAuto                                                          7
  mibiisa                                                          22
  dtrace                                                           42
  java                                                             55
  xterm                                                            75
  adeptedit                                                       118
  nscd                                                            127
  prstat                                                          179
  perfbar                                                         184
  fvwm2                                                           296
  Xsun                                                            829

Raw counts:

  syslogd                                                           1
  in.routed                                                         4
  xbiff                                                            21
  sendmail                                                         30
  elm                                                              36
  picld                                                            43
  httpd                                                            56
  xclock                                                           91
  FvwmAuto                                                        104
  mibiisa                                                         314
  dtrace                                                          592
  java                                                            774
  xterm                                                          1062
  adeptedit                                                      1665
  nscd                                                           1781
  prstat                                                         2506
  perfbar                                                        2581
  fvwm2                                                          4156
  Xsun                                                          11616

Aggregations can also be renormalized. If normalize is called more than once for the same aggregation, the normalization factor will be the factor specified in the most recent call. The following example prints per-second rates over time:

Example 9.1.  renormalize.d: Renormalizing an Aggregation

#pragma D option quiet

BEGIN
{
	start = timestamp;
}

syscall:::entry
{
	@func[execname] = count();
}

tick-10sec
{
	normalize(@func, (timestamp - start) / 1000000000);
	printa(@func);
}