Bulk Copy

The copy statement to reload a table whose characteristics allow a bulk copy to be performed has the following format:

copy [table] [schema.]tablename

([column_name = format [with null [(value)]]

{, column_name = format [with null[(value)]]}])

from 'input_filename'

[with [standard-with-clauses]

[, allocation = n] [, extend = n] [, row_estimate = n]

[, fillfactor=n] [, minpages=n] [, maxpages=n]

[, leaffill=n] [, nonleaffill=n]]

If the file is formatted, you must specify columns in the column list as described for reloading formatted data. If the file is binary, use an empty column list.

Bulk Copying Requirements

To perform a bulk copy when loading data from a file into a table, the table must have the following characteristics:

The table has no secondary indexes.
The table is not journaled.
The table is not partitioned.
The table is either a heap table (the data from the file is appended to the end of the heap table) or empty and less than 18 pages in size if the table is hash, B-tree, or ISAM (the table is rebuilt with the new data from the file).

If these requirements are not met, the copy is performed in incremental mode.

Transaction Logging During Bulk and Incremental Copy

The bulk copy requires only minimal transaction logging.

Note: A transaction is still entered into the log file and normal logging occurs for the associated system catalogs.

In contrast, an incremental copy can generate a large amount of transaction log records. The incremental copy requires logging for every record transfer and every structural change to the table.

An alternative method to reducing logging is described in Large Data Loads with the Set Nologging Statement.

Bulk and Incremental Copy Processing

The processing for a bulk copy is similar to a modify statement, except that the data comes from an external source rather than an existing table. For a bulk copy, the copy statement:

Reads all data from the source.
Deposits the data in the Data Manipulation Facility (DMF) sorter.
The sorter sorts all data into the required order.
Note: For a heap table, no sorting is done.
The copy extracts the data from the sorter, builds the table, and populates the table.

In contrast, for an incremental copy, the sequence is to:

Read one record from the external file.
Add to the table as an insert.
Repeat these steps until the data has been copied.

Bulk Copy With-Clauses

The with clause options on the copy statement for bulk copy operate like the corresponding clauses in the modify statement.

If these clauses are omitted, the table default values in the system catalogs are used. If any of these clauses are specified, the values become the new defaults for the table in the system catalogs.

The following clauses can be used only with a bulk copy:

Note: If these clauses are used with a copy statement with columns specified, an error message is returned and the copy is not performed.

with allocation
A bulk copy from can preallocate table space with the allocation clause. This clause specifies how many pages are preallocated to the table. This clause can be used only on tables with B-tree, hash, or ISAM storage structures.For example, preallocate 1000 pages for bulk copying from the emp.data file into the emp table:

copy emp() from 'emp.data' with allocation = 1000;

VMS: Preallocating space with the allocation clause is important particularly in VMS installations to increase loading efficiency.
with extend
A bulk copy from can extend table space with the extend clause. This clause specifies how many pages the table is extended. This clause can be used only on tables with B-tree, hash, or ISAM storage structures.

For example, extend table emp by 100 pages for bulk copying from the emp.data file:

copy emp() from 'emp.data' with extend = 100;
with row_estimate
A bulk copy can specify an estimated number of rows to be copied during the bulk copy. It can be a value from 0 to 2,147,483,647 ((231-1). This clause can be used only on tables with B-tree, hash, or ISAM storage structures.

For example, set the row estimate on the emp table to one million for bulk copy from the emp.data file:

copy emp() from 'emp.data' with row_estimate = 1000000;

Providing a row estimate can enhance the performance of the bulk copy by allowing the sorter to allocate a realistic amount of resources (such as in-memory buffers), disk block size, and whether to use multiple locations for the sort. In addition, it is used for loading hash tables in determining the number of hash buckets. If you omit this parameter, the default value is 0, in which case the sorter makes its own estimates for disk and memory requirements.

To obtain a reasonable row estimate value, use known data volumes, the help table statement, and information from the system catalogs. For more information, see the chapter "Maintaining Storage Structures." An over-estimate causes excess resources of memory and disk space to be reserved for the copy. An under-estimate (the more typical case, particularly for the default value of 0 rows) causes more sort I/O to be required.
with fillfactor
A bulk copy from can specify an alternate fillfactor. This clause specifies the percentage (from 1 to 100) of each primary data page that must be filled with rows during the copy. This clause can be used only on tables with B-tree, hash, or ISAM storage structures.

For example, set the fillfactor on the emp table to 10% for bulk copy from the emp.data file:

copy emp() from 'emp.data' with fillfactor = 10;
with leaffill
A bulk copy from can specify a leaffill value. This clause specifies the percentage (from 1 to 100) of each Btree leaf page that must be filled with rows during the copy. This clause can be used only on tables with a B-tree storage structure.

For example, set the leaffill percentage on the emp table to 10% for bulk copy from the emp.data file:

copy emp() from 'emp.data' with leaffill = 10;
with nonleaffill
A bulk copy from can specify a nonleaffill value. This clause specifies the percentage (from 1 to 100) of each B-tree non-leaf index page that must be filled with rows during the copy. This clause can be used only on tables with a B-tree storage structure.

For example, set the nonleaffill percentage on the emp table to 10% for bulk copy from the emp.data file:

copy emp() from 'emp.data' with nonleaffill = 10;
with minpages, maxpages
A bulk copy from can specify minpages and maxpages values. The minpages clause specifies the minimum number of primary pages that a hash table must have. The maxpages clause specifies the maximum number of primary pages that a hash table must have. This clause can be used only on tables with a hash storage structure.

If these clauses are not specified, the primary page count for the bulk copy is determined as follows:
- If the copy statement has a row_estimate clause, that size, along with the row width and fill factor, is used to generate the number of primary pages.
- Otherwise, the table's default in the system catalogs is used.
  The following example sets the number of primary data pages (hash buckets) for bulk copying from the emp.data file into the emp table:
  
  copy emp() from 'emp.data' with minpages = 16384, maxpages = 16384

For further details on these with clause options, see the chapter "Maintaining Storage Structures."

Example: Perform a Bulk Copy to Create a Hash Table

The following sequence of statements allows a bulk copy to be performed. This example creates a hash table:

create table tmp1

(col1 integer not null,

col2 char(25))

with nojournaling;

modify tmp1 to hash;

copy tmp1() from 'tmp1.saved'

with row_estimate = 10000,

maxpages = 1000,

allocation = 1000;

Bulk copy is chosen for the copy because the table is not journaled (it is created with nojournaling), it has no indexes (none have yet been created), and the table has under 18 pages (it is a newly created table that has not yet been populated with data).

The modify to hash operation is quick because the table is empty at this point. The row_estimate parameter allows a more efficient sort than if the default estimate (of 0 rows) is used. Additionally, for the hash table, row_estimate enables the number of hash buckets to be calculated efficiently. This calculation uses the row width (set by the create table statement), row_estimate, max_pages, and the (default) fillfactor. The copy statement includes an allocation clause to preallocate disk space for the table to grow, increasing efficiency of later row additions.

Example: Perform Bulk Copy and Create B-tree Table

The following example of a bulk copy uses a B-tree table:

create table tmp2

(col1 integer not null,

col2 char(25));. . .

Populate table. . .

Save any data needed in table

modify tmp2 to truncated;

modify tmp2 to btree;

set nojournaling on tmp2;

copy tmp2() from 'tmp2.saved'

with row_estimate = 10000,

leaffill = 70,

fillfactor = 95,

allocation = 1000;

The existing table tmp2 is truncated to assure it has fewer than 18 pages. This also removes any indexes. Journaling is disabled for the table with the set nojournaling statement. The table meets the bulk copy requirements and a bulk copy is performed.

The copy statement includes a row estimate for efficient sorting during the copy. The leaffill and fillfactor clauses allow the Btree table to be set up as specified during the copy operation. The allocation clause provides for efficient future table growth.

Example: Perform Bulk Copy into a Heap Table

When a heap table is unjournaled and has no indexes, all copies are performed using bulk copy, regardless of the size of the table. Bulk copying into a non-empty heap table is allowed by logging the last valid page before starting the copy. If an error or rollback occurs, all new pages are marked as free.

Note: The table is not returned to its original size.

For example, in the following sequence of statements, all of the copy operations into the heap table are done as bulk copies:

Create a heap table
Copy to the heap table (table is empty)
Copy to the heap table (append to the table)
Perform inserts, updates, and deletes
Copy to the heap table (append to the table)