Optimizer statistics can be reviewed and processed by several utilities. You can:
In VDBA, you use the Display Statistics dialog to view and delete statistics that have already been collected. For more information, see the online help topic Viewing Database Statistics.
You can also accomplish this task using the statdump system command. For more information, see the Command Reference Guide.
The usual output is based on statistics generated by the optimization process, as described in Database Statistics.
By default, optimizer statistics are shown for all tables and columns in the current database, but you can view statistics for specific table columns.
In VDBA, you use the Specify Tables and Specify Columns check boxes in the Display Statistics dialog. For example, specify that statistics be displayed only for the empno column of the emp table.
You can delete statistics by enabling the Delete Statistics from Syscat check box.
Using this check box in conjunction with the Specify Tables and Specify Columns check boxes, you can specify the tables and columns for which to delete statistics.
For example, enable the Delete Statistics from Syscat check box, specify the empno and sex columns from the emp table and the empno column from the task table.
You can specify the precision with which floating point numbers are displayed in the statistics by enabling the Set Precision Level to check box and entering a value in the corresponding edit control to determine the number of decimal digits in the text format of the floating point numbers.
For example, assume a table, t_float, is defined with a column named c_float of type float, and that the following statements are used to insert values (all of which are approximately 1.0):
insert into t_float values (0.99999998);
insert into t_float values (0.99999999);
insert into t_float values (1.0);
insert into t_float values (1.00000001);
insert into t_float values (1.00000002);
You can create statistics for this table using the optimization procedure described in Database Statistics.
With its default floating point precision, the standard output is show seven places after the decimal point. For greater precision, you can enable the Set Precision Level check box and enter a larger value.
For example, specifying a precision level of 14 generates output similar to the following, in which there is sufficient precision to maintain a visible difference in the values:
*** statistics for database demodb version: 00850
*** table t_float rows:5 pages:3 overflow pages:0
*** column c_float of type float (length:8, scale:0, nullable)
date:2000_02_24 15:15:30 GMT unique values:5.000
repetition factor:1.000 unique flag:Y complete flag:0
domain:0 histogram cells:10 null count:0.00000000000000 value length:8
cell: 0 count:0.00000000000000 repf:0.00000000000000 value:0.99999997999999
cell: 1 count:0.20000000298023 repf:1.00000000000000 value:0.99999998000000
cell: 2 count:0.00000000000000 repf:0.00000000000000 value:0.99999998999999
cell: 3 count:0.20000000298023 repf:1.00000000000000 value:0.99999999000000
cell: 4 count:0.00000000000000 repf:0.00000000000000 value:0.99999999999999
cell: 5 count:0.20000000298023 repf:1.00000000000000 value:1.00000000000000
cell: 6 count:0.00000000000000 repf:0.00000000000000 value:1.00000000999999
cell: 7 count:0.20000000298023 repf:1.00000000000000 value:1.00000001000000
cell: 8 count:0.00000000000000 repf:0.00000000000000 value:1.00000001999999
cell: 9 count:0.20000000298023 repf:1.00000000000000 value:1.00000002000000
This can be useful when statistics are output to a text file or input from a text file. For more information, see Statistics in Text Files. When reading statistics from a text file, the optimization process assumes that all cell values are in ascending order. You can use the Set Precision Level option to preserve sufficient precision for floating point numbers.
The optimization process can directly read a set of optimizer statistics from an external text file, rapidly updating the statistics for specific database tables. This can be useful when:
The actual table data is ignored. This gives you a modeling ability, because the table can actually be empty but there are statistics that indicate the presence of data and its distribution. The query optimizer uses those false statistics to determine a QEP. For more information, see Query Execution Plans. This gives the DBA the ability to verify correctness of QEPs without having to load data into tables.
The text file read by the optimization process can be created in one of two ways:
To unload optimizer statistics to a text file, you use the Direct Output to Server File option in the Display Statistics dialog in VDBA. The generated file is in an appropriate format so that it can be used as input to the optimization process. This allows:
For example, to dump all statistics from the current database into the file stats.out, enable the Direct Output to Server File check box and enter stats.out in the corresponding edit control.
To unload statistics for selected tables or columns, use the Read Statistics from Server File option in conjunction with the Specify Tables and Specify Columns check boxes in the Display Statistics dialog in VDBA.
For example, if you want the stats.out file to contain statistics for the entire arel table and the col1 column in the brel table, enable the Specify Tables check box, and choose only the arel and brel tables from the Specify Tables dialog. Enable the Specify Columns check box and choose only the col1 column for brel from the Specify Columns dialog.
A sample output file generated using the Direct Output to Server File option of the Display Statistics dialog is shown below. This same text file can be used as input to the optimization process, as described in the next section, Loading Optimizer Statistics from a Text File:
*** statistics for database demodb version: 00850
*** table brel rows:151 pages:3 overflow pages:1
*** column col1 of type integer (length:4, scale:0, nullable)
date:2000_02_24 16:04:37 GMT unique values:132.000
repetition factor:1.144 unique flag:N complete flag:0
domain:0 histogram cells:16 null count:0.0000000 value length:4
cell: 0 count:0.0000000 repf:0.0000000 value: 0
cell: 1 count:0.0728477 repf:1.3750000 value: 23
cell: 2 count:0.0728477 repf:1.8333334 value: 31
cell: 3 count:0.0728477 repf:1.3750000 value: 59
cell: 4 count:0.0728477 repf:1.1000000 value: 138
cell: 5 count:0.0728477 repf:1.0000000 value: 151
cell: 6 count:0.0728477 repf:1.0000000 value: 162
cell: 7 count:0.0728477 repf:1.0000000 value: 173
cell: 8 count:0.0662252 repf:1.2500000 value: 181
cell: 9 count:0.0662252 repf:1.1111112 value: 193
cell: 10 count:0.0662252 repf:1.2500000 value: 202
cell: 11 count:0.0662252 repf:1.0000000 value: 214
cell: 12 count:0.0662252 repf:1.0000000 value: 224
cell: 13 count:0.0662252 repf:1.0000000 value: 236
cell: 14 count:0.0662252 repf:1.2500000 value: 256
cell: 15 count:0.0264901 repf:1.0000000 value: 261
To load optimizer statistics from a text file, you use the Read Statistics from Server File option in the Optimize Database dialog. For example, if the file arelbrel.dat contains statistics for the arel and brel tables, these are loaded into the database by enabling the Read Statistics from Server File check box and entering arelbrel.dat in the corresponding edit control.
If the input file contains statistics for multiple tables, you can load selected tables or columns by using the Read Statistics from Server File option, in conjunction with the Specify Tables and Specify Columns check boxes in the Optimize Database dialog.
For example, if the file arelbrel.dat contains statistics for the arel and brel tables, just the statistics for arel are loaded into the database by enabling the Specify Tables check box and choosing only the arel table from the Specify Tables dialog.
To load only statistics for column col3 of the arel table, enable the Specify Columns check box and choose only the col3 column from the Specify Columns dialog.
The input file for the optimization process contains information about the number of rows, as well as primary and overflow page counts in a table. However, because these values are critical to correct operation, these input values are normally disregarded when creating statistics, leaving the catalog values untouched.
You can force the values in the input file to be used when loading the statistics by enabling the Read Row and Page check box in the Optimize Database dialog.
Important! This option must be used with extreme care, because it sets critical values.
This option can be useful for certain specialized processing, such as query modeling and performance problem debugging. Bear in mind that the row count value can be modified for the table and its indexes. However, the page count is modified for the table only—the index page count values remains unchanged.
You can copy a table and its associated optimizer statistics from one database to another using copydb and statistics that have been unloaded to a text file. This is usually much faster than copying only the table and rerunning the optimization process to recreate the statistics.
Note: Doing this makes sense only if the statistics are up-to-date.
First, unload the table and its statistics to text files, as described in the steps below:
copydb olddb arel
sql olddb <copy.out
Next, copy the table and statistics back into the new database:
sql newdb <copy.in
The optimization process allows you to create sampled optimizer statistics on database tables. For large tables, sampled statistics are usually much faster to generate than full statistics, and if the percentage of rows to be sampled is chosen appropriately, they can be nearly as accurate.
Sampled statistics are generated by only looking at a certain percentage of the rows in a table. The percentage must be chosen so that all significant variations in the data distribution are likely to be sampled.
The sampled rows are selected by randomly generating values for the tuple identifier (tid), so tid values are required to support this functionality.
To specify sampled statistics and the percentage of rows to be sampled, you use the Statistics on Sampled Data check box and the Percentage control in the Optimize Database dialog. For example, to optimize the table bigtable, sampling 3% of the rows, perform the following steps. For more information, see online.
When sampling, the query optimizer chooses rows randomly from the base table and inserts them into a temporary table. Rows are selected in such a way that uniqueness of column values are preserved (conceptually, when sampling, a row can be selected not more than once). Full statistics, or minmax if requested, are created on the temporary table and stored in the catalogs as statistics for the base table. The temporary table is deleted. Be sure you have enough free disk space to store this temporary table, and that create_table permission has been granted to the user running the optimization process. For information on granting permissions, see the chapter "Ensuring Access Security."
You have control over the percentage of rows that are sampled. It is worthwhile to experiment with this percentage. When the percentages are too small for a good sampling, the statistics created change as percentage figures change. As you increase the percentage, eventually a plateau is reached where the statistics begin coming out almost the same. The smallest percentage that provides stable statistics is the most efficient number.