Advanced Use of the Copy Statement

You can perform advanced functions with the copy statement using variations of the copy command.

Examples in this section use a database that looks like the following:

Table Name	Column Name	Data Type
Header	Orderno Date Suppno Status	integer2 date integer2 char(1)
Suppinfo	Suppno Suppinfo	integer2 char(35)
Detail	Orderno Invno Quan	integer2 integer2 integer2
Iteminfo	Invno Descript	integer2 char(20)
Priceinfo	Invno Suppno Catno Price	integer2 integer2 integer2 money

Populate Multiple Database Tables Using Multiple FIles

Suppose that the information for the database previously described was stored in data files outside Ingres, and that those files, "file1" and "file2," have the record formats shown below:

orderno,date,suppno,suppinfo,status

orderno,invno,catno,descript,price,quan

The copy statement can be used to load the data from these files into a five-table database. Assume that the files are entirely in ASCII character format, with fields of varying length terminated by commas, except for the last field, which is terminated by a newline.

The following copy statement loads the header table from the first file:

copy table header

(orderno = char(0)comma,

date = char(0)comma,

suppno = char(0)comma,

dummy = d0comma,

status = char(0)nl)

from 'file1';

Each column of the header table is copied from a variable-length character field in the file. All columns except the last are delimited by a comma; the last column is delimited by a newline.

Specification of the delimiter, although included in the statement, is not needed because the copy statement looks for the first comma, tab, or newline as the field delimiter by default.

The notation d0 used in place of char(0) tells the copy statement to ignore the variable-length field in that position in the file, rather than copying it. Copy ignores the column name (in this case dummy) associated with the field described as d format.

Load a Table from Multiple Files

Loading the priceinfo table presents special difficulties. The copy statement can read only one file at a time, but the data needed to load the table resides in two files.

The solution to this kind of problem varies with the file and table designs in any particular situation. In general, a good solution is to copy from the file containing most of the data into a temporary table containing as many columns of information as needed to complete the rows of the final table.

To load data from the files into the priceinfo table, do the following:

Create a temporary table named pricetemp that contains the orderno column, in addition to all the columns of the priceinfo table:
create table pricetemp (orderno integer2,

    invno integer2,

    suppno integer2,

    catno integer2,

    price money);

Adding the orderno column to the temporary table is that it enables you to join the temporary table to the header table to get the supplier number for each row.
Copy the data from file2 into the pricetemp table:
copy table pricetemp (orderno = char(0), invno =
char(0), catno = char(0), dummy = d0, price =
char(0), dummy = d0) from 'file2';
Insert into the priceinfo table all rows that result from joining the pricetemp table to the header table. (In VDBA, use an SQL Scratchpad window to execute the statement.)
insert into priceinfo (invno, suppno, catno,price)
    select p.invno, h.suppno, p.catno,
    p.price from header h, pricetemp p
    where p.orderno = h.orderno;
Destroy the temporary table pricetemp:
drop pricetemp;

Multi-line File Records

Another feature of the copy statement is that it can read multiline records from a file into a single row in a table. For instance, suppose that for viewing convenience, the detail file is formatted so that each record requires three lines. That file looks like this:

1, 5173
10179A, No.2 Rainbow Pencils
0.29
1, 5175
73122Z, 1998 Rainbow Calendars
4.90

Load these values into the pricetemp table with the following copy statement:

copy table pricetemp (orderno = char(0)comma, invno = char(0)

nl, catno = char(0)comma, descript = d0nl, price =
char(0)nl) from 'file2';

It does not matter that newlines have been substituted for commas as delimiters within each record. The only requirement is that the data fields be uniform in number and order, the same as for single-line records.

Load Fixed-Length and Binary Records

The copy statement can also load data from fixed-length records without any delimiters in or between the data. In addition, numeric items in the file can be stored in true binary format. For example, the value 256 can be stored in a 2-byte integer instead of 3 characters. The order header file has the following record layout:

orderno date suppno suppinfo status

The data type formats for each of the fields is as follows:

(2-byte int) (8 chars) (2-byte int)
(35 chars) (1 char)

In this case, you code the copy statement to load the header table as follows:

copy table header (orderno = integer2,
    date = char(8),
    suppno = integer2,
    dummy = char(35),
    status = char(1))
    from 'file1';

It is also possible to copy data from files that contain both fixed and variable-length fields.

Considerations When Loading Large Objects

Large objects are long varchar and long byte data types. Long varchar is a character data type, and long byte is a binary data type with a maximum length of 2 GB.

There are considerations when copying large objects into a table.

Considerations for Copying Formatted Large Objects

A column with large objects is specified for copying with the formats:

long varchar(0)
long byte(0)
long nvarchar(0)

Note: You cannot use a length specifier or a delimiter.

To handle the large size, copy deals with these data types in a similar manner as the data handlers: the data is broken up into segments for copying to a data file.

Each segment consists of the length, followed by a space delimiter, followed by the data. There is no space following a data segment (because copy knows how many bytes of data to read).

The basic structure of a formatted segment is:

integer = length of segment
space = delimiter
char|byte(len) = data

The last segment, or an empty object, is denoted by a zero length, followed by its space delimiter:

0 = length of segment
space = delimiter

Thus, the data is segmented as:

length1 segment1 segment1length2 segment2...lengthn segmentn0
^ ^ ^ ^ ^ ^
space space space space space space

The segments of the long nvarchar are UTF-8 transformation of Unicode values.

For formatted copies on large object data that contain nulls, the with null clause must be specified with a value.

Example: Copying Formatted Large Objects

Consider the sample table, big_table, that was created with the following create table statement:

create table big_table
object_id integer,
big_col long varchar);

This table can be copied to the big_file file with the following copy statement:

copy table big_table (object_id integer, big_col long varchar) into 'big_file';

Considerations for Binary Copying a Large Object

The data file format is slightly different when you copy a large object in a binary copy (that is, without column specifications).

The binary file has an extra character after the end of the last segment of a nullable column. (A nullable column is one that was created with null). The length is not followed by a space character. The basic structure of a binary segment is:

integer2 = length of segment
char|byte(len) = data

The last segment, or an empty object, is denoted by a zero length, followed (if the column is nullable) by a character indicating whether the column is null (=0) or not null (=1):

0 = length of segment
[char(1) = 0 column is not null
1 column is null]

Thus, a non-nullable column is segmented as:

length1 segment1 segment1 length2 segment2...lengthn segmentn0

A nullable column is segmented as:

length1 segment1 segment1 length2 segment2...lengthn segmentn0 0
^
1 character

Empty and null strings appear as follows:

A non-nullable empty string consists solely of the zero integer length.
A nullable empty string consists solely of the end zeros: the zero integer length and the zero "not null" character indicator.
A null indicator consists of "01": an end zero integer length and the null character indicator of "1."