This appendix describes the features and enhancements introduced in Ingres II 2.5, including:
Changes were made to improve the performance of both the in-memory (QEF) sort and disk (DMF) sort of Ingres II.
QEF was improved by fine-tuning the sort algorithm, resulting in fewer comparisons between sort rows. The sort algorithm is a major consumer of CPU time in a sort. QEF was also improved with a change that results in the rows being partially sorted as they arrive in the sort.
This change introduces two distinct benefits:
The first set of DMF sort enhancements also involve fine-tuning of the sort algorithms, which should result in a 5 to 10 percent reduction in CPU time of typical sorts. As with the QEF sort, duplicate rows are detected and discarded sooner in duplicates removal sorts. This should result in smaller disk work files and faster overall sort performance.
Prior to Release 2.5, the entire result of a DMF sort was spooled to an internal temporary table before the sorted rows were returned to the caller. In Ingres II 2.5, the temporary table has been eliminated and the rows are returned directly from the sort structures to the caller. This has the same effect as the early return of sorted rows described above for the QEF sort. That is, the first rows should be returned much sooner than they were in previous releases.
The final DMF sort enhancement is the introduction of a "parallel sort" technique. Sorts that exceed a user-configurable threshold spawn additional threads. The sort is split up and its rows delivered to the sub-threads for sorting. The sorted subsets of the rows are then delivered back to the parent thread executing the query, where they are merged to form a single sorted stream of rows.
On multi-CPU machines, this results in a significant reduction in the elapsed time required to sort (between 25 to 50 percent in testing). Even single CPU machines benefit somewhat, because sort I/O and sort computation can be overlapped. An added benefit to the parallel sort technique is that it is encapsulated within the DMF sort. This sort is used for the execution of queries with sorting requirements (such as for order by, group by, and distinct requests, or for implementing certain join algorithms). However, it is also used to sort rows for index creation or update in modify, create index, and copy operations. All users of the DMF sort derive the performance benefit of the parallel sort.
The "parallel sort" technique outlined in DMF Sort Enhancements is used to sort rows for parallel index creation, greatly reducing the time taken for index creation in multi-CPU environments.
Ingres referential and unique/primary key constraints result in the creation of indexes "under-the-covers" to improve the performance of the constraint enforcement mechanisms. Prior to Release 2.5, these indexes were plain B-tree indexes stored in the default location of the database. However, B-tree is not always the best choice (for example, hash is better for many unique key applications), and use of the default location can degrade performance if many large indexes are created.
Ingres II 2.5 solves these and other problems by including a "with" clause for constraint definition. The "with" clause allows the overriding of default index options with anything normally coded in an index creation "with" clause. For example, the index structure and location, as well as fillfactor and other index options can be explicitly specified for each constraint. The "with" clause applies to column and table constraints defined with both the create and alter table statements. A unique/primary key constraint can be generated to use the base table structure for its enforcement rather than a separate secondary index.
Ingres II 2.5 also introduces the ANSI/ISO notion of referential actions for the definition of referential (foreign key) constraints. In releases prior to Ingres II 2.5, the attempt to delete a referenced row for which matching referencing rows exist, or to update the primary key of a referenced row to some other value while matching referencing rows still exist for the old value, was met with an error and the request was aborted. Either operation had to be preceded by a delete of the matching referencing rows or an update of the foreign keys to some value that exists in another referenced row.
Ingres II 2.5 allows the definition of referential actions for each referential constraint, which defines alternative actions to be taken in the circumstances defined above. A separate action can be defined for both the delete case (deletion of a referenced row with matching referencing rows) and the update case (updating the key of a referenced row with matching referencing rows). The options include cascade, in which case the delete or update is cascaded to the matching referencing rows (so that the referencing rows are also deleted or updated to the same value), and set null, in which case the foreign key of the matching referencing rows is set to null. These actions permit a more complete definition of the semantics of the referential relationship and allow the application to execute more efficiently.
In Ingres II 2.5, the total number of pages in all caches has been revised from an un-enforced limit of 65536 to 2**32-1. Ingres II 2.5 supports a 4 GB cache.
In previous releases, when those pages belonging to a specific table needed to be located, the buffer manager sequentially searched every buffer in every cache to find them. Even in installations with small caches, this was an expensive operation, especially in those frequent instances in which there were no table-pages in any cache. This operation occurred, for example, when a table's TCB was about to be released, typically when all referencing transactions had no immediate need to use the table. Delays caused by this operation could show up in unexpected situations, such as the use of statement level rules.
In Ingres II 2.5, a cross-cache table hash queue has been added to the buffer manager to which pages are added as they are faulted in and removed when they are tossed. Thus, when the need to know a table's pages arises, a hash on the table's database ID and table ID is made and that list searched for matching pages. This change results in a significant decrease in the number of cache pages visited and is most dramatic in installations configured with very large or multiple caches.
In releases prior to Ingres II 2.5, a fixed number of Write Behind threads were configured in each server in an installation and initiated when the server started. These threads served all caches and were awakened when the number of modified pages in any cache exceeded a predefined threshold. In a shared cache environment, all Write Behind threads in all servers were simultaneously activated when this threshold was reached. This led to a "thundering herd" phenomenon in which n Write Behind threads concurrently pounded through the caches, competing for modified pages to flush.
The optimum number of Write Behind threads is the minimum number required to:
The optimum number of Write Behind threads varies with the instantaneous demand for free pages in a particular cache; it always begins at one when the threshold is first breached and a Write Behind event signaled. In Ingres II 2.5, if the single Write Behind thread is unable to keep up with the demand, additional Write Behind threads are created until either equilibrium is achieved or the upper limit on thread numbers is reached. If better than equilibrium is achieved (more modified pages are being freed than are in demand), the excess Write Behind threads terminate one-by-one, while the remaining threads continue to monitor the free buffer demand to achieve the write-behind end threshold.
Ingres II 2.5 introduces cache-specific Write Behind threads, resulting in increased concurrency and eliminating the chance of free page starvation.
The structure of the log file in Ingres II 2.5 is changed from a single file to 1-16 logically striped files of equal size. Properly configured, partitioning in this manner encourages better log performance by spreading disk contention across multiple disks instead of concentrating it on a single device. Ingres system administration is made simpler, as the transaction log file can be expanded by adding another partition, instead of resizing an existing file or partition. The full log file paths are now defined through Configuration Manager or CBF rather than through the symbol table.
A variety of enhancements have been made to optimizedb.
The flag -zlr causes optimizedb to retain the original repetition factor when rebuilding an existing histogram. This is useful when a histogram (and its repetition factor) is built once by reading the whole table, then updated later using sampling (which can produce inaccurate repetition factors).
A minor bug was fixed to allow the "l" flag to request an exclusive lock on the database during optimizedb processing (just as for other command line utilities).
The current limit of 1000 parameters coded in a separate file using the -zf parameter has been lifted. There is now no limit to the number of such parameters.
An -o filename option (similar to that in statdump) has been added to optimizedb. It creates a statdump-style output file, which can then be input back to optimizedb with the -i filename option. But more importantly, it does not update the catalog iistatistics and iihistogram tables. This allows a busy shop to construct histograms at anytime, with no worry about update conflicts. Then at a convenient later time, optimizedb can be run with the -i option to add the histograms to the catalog.
In addition to the flag enhancements, an enhancement has been introduced to allow more accurate histograms to be built on columns with significant skew in their value sets. Specifically, a column with many distinct values, which would generate an inexact histogram, now produces exact cells for values that occur significantly more than the average. This permits much more accurate estimates in the compilation of queries with restrictions on such columns, and, therefore, better query plans.
The query compiler has been enhanced to compile more efficient strategies for complex queries involving aggregate views and unions.