Specialized Statistics Processing

Optimizer statistics can be reviewed and processed by several utilities. You can:

View and delete statistics
Unload statistics to a text file
Load statistics from a text file
Copy a table and its associated statistics to another database
Create sampled statistics

Display Optimizer Statistics

In VDBA, you use the Display Statistics dialog to view and delete statistics that have already been collected. For more information, see the online help topic Viewing Database Statistics.

You can also accomplish this task using the statdump system command. For more information, see the Command Reference Guide.

The usual output is based on statistics generated by the optimization process, as described in Database Statistics.

Display Optimizer Statistics for Individual Tables and Columns

By default, optimizer statistics are shown for all tables and columns in the current database, but you can view statistics for specific table columns.

In VDBA, you use the Specify Tables and Specify Columns check boxes in the Display Statistics dialog. For example, specify that statistics be displayed only for the empno column of the emp table.

Delete Optimizer Statistics

You can delete statistics by enabling the Delete Statistics from Syscat check box.

Using this check box in conjunction with the Specify Tables and Specify Columns check boxes, you can specify the tables and columns for which to delete statistics.

For example, enable the Delete Statistics from Syscat check box, specify the empno and sex columns from the emp table and the empno column from the task table.

Floating Point Precision in Optimizer Statistics Display

You can specify the precision with which floating point numbers are displayed in the statistics by enabling the Set Precision Level to check box and entering a value in the corresponding edit control to determine the number of decimal digits in the text format of the floating point numbers.

For example, assume a table, t_float, is defined with a column named c_float of type float, and that the following statements are used to insert values (all of which are approximately 1.0):

insert into t_float values (0.99999998);

insert into t_float values (0.99999999);

insert into t_float values (1.0);

insert into t_float values (1.00000001);

insert into t_float values (1.00000002);

You can create statistics for this table using the optimization procedure described in Database Statistics.

With its default floating point precision, the standard output is show seven places after the decimal point. For greater precision, you can enable the Set Precision Level check box and enter a larger value.

For example, specifying a precision level of 14 generates output similar to the following, in which there is sufficient precision to maintain a visible difference in the values:

*** statistics for database demodb version: 00850

*** table t_float rows:5 pages:3 overflow pages:0

*** column c_float of type float (length:8, scale:0, nullable)

date:2000_02_24 15:15:30 GMT unique values:5.000

repetition factor:1.000 unique flag:Y complete flag:0

domain:0 histogram cells:10 null count:0.00000000000000 value length:8

cell: 0 count:0.00000000000000 repf:0.00000000000000 value:0.99999997999999

cell: 1 count:0.20000000298023 repf:1.00000000000000 value:0.99999998000000

cell: 2 count:0.00000000000000 repf:0.00000000000000 value:0.99999998999999

cell: 3 count:0.20000000298023 repf:1.00000000000000 value:0.99999999000000

cell: 4 count:0.00000000000000 repf:0.00000000000000 value:0.99999999999999

cell: 5 count:0.20000000298023 repf:1.00000000000000 value:1.00000000000000

cell: 6 count:0.00000000000000 repf:0.00000000000000 value:1.00000000999999

cell: 7 count:0.20000000298023 repf:1.00000000000000 value:1.00000001000000

cell: 8 count:0.00000000000000 repf:0.00000000000000 value:1.00000001999999

cell: 9 count:0.20000000298023 repf:1.00000000000000 value:1.00000002000000

This can be useful when statistics are output to a text file or input from a text file. For more information, see Statistics in Text Files. When reading statistics from a text file, the optimization process assumes that all cell values are in ascending order. You can use the Set Precision Level option to preserve sufficient precision for floating point numbers.

Statistics in Text Files

The optimization process can directly read a set of optimizer statistics from an external text file, rapidly updating the statistics for specific database tables. This can be useful when:

Information is being moved from one database to another (for example, using copydb), and you want to copy the statistics for the tables as well.
You know the distribution of the data in a table and want to read or input these values directly, instead of letting the optimization process generate them for you.

The actual table data is ignored. This gives you a modeling ability, because the table can actually be empty but there are statistics that indicate the presence of data and its distribution. The query optimizer uses those false statistics to determine a QEP. For more information, see Query Execution Plans. This gives the DBA the ability to verify correctness of QEPs without having to load data into tables.

The text file read by the optimization process can be created in one of two ways:

When displaying statistics, you can unload statistics that already exist in the database and use the generated file as input to the optimization process.
You can create an input file from scratch or by editing a file created when displaying statistics.

Unload Optimizer Statistics to a Text File

To unload optimizer statistics to a text file, you use the Direct Output to Server File option in the Display Statistics dialog in VDBA. The generated file is in an appropriate format so that it can be used as input to the optimization process. This allows:

Statistics to be easily moved from one database to another
A default text file to be created if you are generating your own statistics

For example, to dump all statistics from the current database into the file stats.out, enable the Direct Output to Server File check box and enter stats.out in the corresponding edit control.

Unload Statistics for Selected Tables or Columns

To unload statistics for selected tables or columns, use the Read Statistics from Server File option in conjunction with the Specify Tables and Specify Columns check boxes in the Display Statistics dialog in VDBA.

For example, if you want the stats.out file to contain statistics for the entire arel table and the col1 column in the brel table, enable the Specify Tables check box, and choose only the arel and brel tables from the Specify Tables dialog. Enable the Specify Columns check box and choose only the col1 column for brel from the Specify Columns dialog.

Sample Text File Statistics

A sample output file generated using the Direct Output to Server File option of the Display Statistics dialog is shown below. This same text file can be used as input to the optimization process, as described in the next section, Loading Optimizer Statistics from a Text File:

*** statistics for database demodb version: 00850

*** table brel rows:151 pages:3 overflow pages:1

*** column col1 of type integer (length:4, scale:0, nullable)

date:2000_02_24 16:04:37 GMT unique values:132.000

repetition factor:1.144 unique flag:N complete flag:0

domain:0 histogram cells:16 null count:0.0000000 value length:4

cell: 0 count:0.0000000 repf:0.0000000 value: 0

cell: 1 count:0.0728477 repf:1.3750000 value: 23

cell: 2 count:0.0728477 repf:1.8333334 value: 31

cell: 3 count:0.0728477 repf:1.3750000 value: 59

cell: 4 count:0.0728477 repf:1.1000000 value: 138

cell: 5 count:0.0728477 repf:1.0000000 value: 151

cell: 6 count:0.0728477 repf:1.0000000 value: 162

cell: 7 count:0.0728477 repf:1.0000000 value: 173

cell: 8 count:0.0662252 repf:1.2500000 value: 181

cell: 9 count:0.0662252 repf:1.1111112 value: 193

cell: 10 count:0.0662252 repf:1.2500000 value: 202

cell: 11 count:0.0662252 repf:1.0000000 value: 214

cell: 12 count:0.0662252 repf:1.0000000 value: 224

cell: 13 count:0.0662252 repf:1.0000000 value: 236

cell: 14 count:0.0662252 repf:1.2500000 value: 256

cell: 15 count:0.0264901 repf:1.0000000 value: 261

Load Optimizer Statistics from a Text File

To load optimizer statistics from a text file, you use the Read Statistics from Server File option in the Optimize Database dialog. For example, if the file arelbrel.dat contains statistics for the arel and brel tables, these are loaded into the database by enabling the Read Statistics from Server File check box and entering arelbrel.dat in the corresponding edit control.

Load Statistics for Selected Tables or Columns

If the input file contains statistics for multiple tables, you can load selected tables or columns by using the Read Statistics from Server File option, in conjunction with the Specify Tables and Specify Columns check boxes in the Optimize Database dialog.

For example, if the file arelbrel.dat contains statistics for the arel and brel tables, just the statistics for arel are loaded into the database by enabling the Specify Tables check box and choosing only the arel table from the Specify Tables dialog.

To load only statistics for column col3 of the arel table, enable the Specify Columns check box and choose only the col3 column from the Specify Columns dialog.

Update Row and Page Counts

The input file for the optimization process contains information about the number of rows, as well as primary and overflow page counts in a table. However, because these values are critical to correct operation, these input values are normally disregarded when creating statistics, leaving the catalog values untouched.

You can force the values in the input file to be used when loading the statistics by enabling the Read Row and Page check box in the Optimize Database dialog.

Important! This option must be used with extreme care, because it sets critical values.

This option can be useful for certain specialized processing, such as query modeling and performance problem debugging. Bear in mind that the row count value can be modified for the table and its indexes. However, the page count is modified for the table only—the index page count values remains unchanged.

Copy a Table and Associated Statistics

You can copy a table and its associated optimizer statistics from one database to another using copydb and statistics that have been unloaded to a text file. This is usually much faster than copying only the table and rerunning the optimization process to recreate the statistics.

Note: Doing this makes sense only if the statistics are up-to-date.

First, unload the table and its statistics to text files, as described in the steps below:

Enter the following command to generate copy.in and copy.out scripts for the arel table:
copydb olddb arel
Copy the are1 table out of the olddb database:
sql olddb <copy.out
Use the Display Statistics dialog in VDBA to unload the statistics for the are1 table to a text file named are1.dat. For more information, see online help.

Next, copy the table and statistics back into the new database:

Copy the are1 table into the new database:
sql newdb <copy.in
Use the Optimize Database dialog in VDBA to load the statistics for the are1 table from the text file are1.dat in Step 3 of the previous example. For more information, see online help.

Sampled Optimizer Statistics

The optimization process allows you to create sampled optimizer statistics on database tables. For large tables, sampled statistics are usually much faster to generate than full statistics, and if the percentage of rows to be sampled is chosen appropriately, they can be nearly as accurate.

Sampled statistics are generated by only looking at a certain percentage of the rows in a table. The percentage must be chosen so that all significant variations in the data distribution are likely to be sampled.

The sampled rows are selected by randomly generating values for the tuple identifier (tid), so tid values are required to support this functionality.

Create Sampled Statistics

To specify sampled statistics and the percentage of rows to be sampled, you use the Statistics on Sampled Data check box and the Percentage control in the Optimize Database dialog. For example, to optimize the table bigtable, sampling 3% of the rows, perform the following steps. For more information, see online.

Enable the Statistics on Sample Data check box.
Enter 3 for the Percentage.
Enable the Specify Tables check box.
Click Tables to open the Specify Tables dialog.
Enable the bigtable table, and click OK.
Click OK.

When sampling, the query optimizer chooses rows randomly from the base table and inserts them into a temporary table. Rows are selected in such a way that uniqueness of column values are preserved (conceptually, when sampling, a row can be selected not more than once). Full statistics, or minmax if requested, are created on the temporary table and stored in the catalogs as statistics for the base table. The temporary table is deleted. Be sure you have enough free disk space to store this temporary table, and that create_table permission has been granted to the user running the optimization process. For information on granting permissions, see the chapter "Ensuring Access Security."

You have control over the percentage of rows that are sampled. It is worthwhile to experiment with this percentage. When the percentages are too small for a good sampling, the statistics created change as percentage figures change. As you increase the percentage, eventually a plateau is reached where the statistics begin coming out almost the same. The smallest percentage that provides stable statistics is the most efficient number.