Second, they don't modify the query tree in place. Instead they
create zero or more new query trees and can throw away the
original one.
Keep the syntax
CREATE [ OR REPLACE ] RULE name AS ON event
TO table [ WHERE condition ]
DO [ ALSO | INSTEAD ] { NOTHING | command | ( command ; command ... ) }
in mind.
In the following, update rules means rules that are defined
on INSERT, UPDATE, or DELETE.
Update rules get applied by the rule system when the result
relation and the command type of a query tree are equal to the
object and event given in the CREATE RULE command.
For update rules, the rule system creates a list of query trees.
Initially the query-tree list is empty.
There can be zero (NOTHING key word), one, or multiple actions.
To simplify, we will look at a rule with one action. This rule
can have a qualification or not and it can be INSTEAD or ALSO (default).
What is a rule qualification? It is a restriction that tells
when the actions of the rule should be done and when not. This
qualification can only reference the pseudorelations NEW and/or OLD,
which basically represent the relation that was given as object (but with a
special meaning).
So we have four cases that produce the following query trees for
a one-action rule.
- No qualification and ALSO
the query tree from the rule action with the original query
tree's qualification added
- No qualification but INSTEAD
the query tree from the rule action with the original query
tree's qualification added
- Qualification given and ALSO
the query tree from the rule action with the rule
qualification and the original query tree's qualification
added
- Qualification given and INSTEAD
the query tree from the rule action with the rule
qualification and the original query tree's qualification; and
the original query tree with the negated rule qualification
added
Finally, if the rule is ALSO, the unchanged original query tree is
added to the list. Since only qualified INSTEAD rules already add the
original query tree, we end up with either one or two output query trees
for a rule with one action.
For ON INSERT rules, the original query (if not suppressed by INSTEAD)
is done before any actions added by rules. This allows the actions to
see the inserted row(s). But for ON UPDATE and ON
DELETE rules, the original query is done after the actions added by rules.
This ensures that the actions can see the to-be-updated or to-be-deleted
rows; otherwise, the actions might do nothing because they find no rows
matching their qualifications.
The query trees generated from rule actions are thrown into the
rewrite system again, and maybe more rules get applied resulting
in more or less query trees.
So a rule's actions must have either a different
command type or a different result relation than the rule itself is
on, otherwise this recursive process will end up in an infinite loop.
(Recursive expansion of a rule will be detected and reported as an
error.)
The query trees found in the actions of the
pg_rewrite system catalog are only
templates. Since they can reference the range-table entries for
NEW and OLD, some substitutions have to be made before they can be
used. For any reference to NEW, the target list of the original
query is searched for a corresponding entry. If found, that
entry's expression replaces the reference. Otherwise, NEW means the
same as OLD (for an UPDATE) or is replaced by
a null value (for an INSERT). Any reference to OLD is
replaced by a reference to the range-table entry that is the
result relation.
After the system is done applying update rules, it applies view rules to the
produced query tree(s). Views cannot insert new update actions so
there is no need to apply update rules to the output of view rewriting.
Say we want to trace changes to the sl_avail column in the
shoelace_data relation. So we set up a log table
and a rule that conditionally writes a log entry when an
UPDATE is performed on
shoelace_data.
CREATE TABLE shoelace_log (
sl_name text, -- shoelace changed
sl_avail integer, -- new available value
log_who text, -- who did it
log_when timestamp -- when
);
CREATE RULE log_shoelace AS ON UPDATE TO shoelace_data
WHERE NEW.sl_avail <> OLD.sl_avail
DO INSERT INTO shoelace_log VALUES (
NEW.sl_name,
NEW.sl_avail,
current_user,
current_timestamp
);
Now someone does:
UPDATE shoelace_data SET sl_avail = 6 WHERE sl_name = 'sl7';
and we look at the log table:
SELECT * FROM shoelace_log;
sl_name | sl_avail | log_who | log_when
---------+----------+---------+----------------------------------
sl7 | 6 | Al | Tue Oct 20 16:14:45 1998 MET DST
(1 row)
That's what we expected. What happened in the background is the following.
The parser created the query tree
UPDATE shoelace_data SET sl_avail = 6
FROM shoelace_data shoelace_data
WHERE shoelace_data.sl_name = 'sl7';
There is a rule log_shoelace that is ON UPDATE with the rule
qualification expression
NEW.sl_avail <> OLD.sl_avail
and the action
INSERT INTO shoelace_log VALUES (
*NEW*.sl_name, *NEW*.sl_avail,
current_user, current_timestamp )
FROM shoelace_data *NEW*, shoelace_data *OLD*;
(This looks a little strange since you can't normally write
INSERT ... VALUES ... FROM. The FROM
clause here is just to indicate that there are range-table entries
in the query tree for *NEW* and *OLD*.
These are needed so that they can be referenced by variables in
the INSERT command's query tree.)
The rule is a qualified ALSO rule, so the rule system
has to return two query trees: the modified rule action and the original
query tree. In step 1, the range table of the original query is
incorporated into the rule's action query tree. This results in:
INSERT INTO shoelace_log VALUES (
*NEW*.sl_name, *NEW*.sl_avail,
current_user, current_timestamp )
FROM shoelace_data *NEW*, shoelace_data *OLD*,
shoelace_data shoelace_data;
In step 2, the rule qualification is added to it, so the result set
is restricted to rows where sl_avail changes:
INSERT INTO shoelace_log VALUES (
*NEW*.sl_name, *NEW*.sl_avail,
current_user, current_timestamp )
FROM shoelace_data *NEW*, shoelace_data *OLD*,
shoelace_data shoelace_data
WHERE *NEW*.sl_avail <> *OLD*.sl_avail;
(This looks even stranger, since INSERT ... VALUES doesn't have
a WHERE clause either, but the planner and executor will have no
difficulty with it. They need to support this same functionality
anyway for INSERT ... SELECT.)
In step 3, the original query tree's qualification is added,
restricting the result set further to only the rows that would have been touched
by the original query:
INSERT INTO shoelace_log VALUES (
*NEW*.sl_name, *NEW*.sl_avail,
current_user, current_timestamp )
FROM shoelace_data *NEW*, shoelace_data *OLD*,
shoelace_data shoelace_data
WHERE *NEW*.sl_avail <> *OLD*.sl_avail
AND shoelace_data.sl_name = 'sl7';
Step 4 replaces references to NEW by the target list entries from the
original query tree or by the matching variable references
from the result relation:
INSERT INTO shoelace_log VALUES (
shoelace_data.sl_name, 6,
current_user, current_timestamp )
FROM shoelace_data *NEW*, shoelace_data *OLD*,
shoelace_data shoelace_data
WHERE 6 <> *OLD*.sl_avail
AND shoelace_data.sl_name = 'sl7';
Step 5 changes OLD references into result relation references:
INSERT INTO shoelace_log VALUES (
shoelace_data.sl_name, 6,
current_user, current_timestamp )
FROM shoelace_data *NEW*, shoelace_data *OLD*,
shoelace_data shoelace_data
WHERE 6 <> shoelace_data.sl_avail
AND shoelace_data.sl_name = 'sl7';
That's it. Since the rule is ALSO, we also output the
original query tree. In short, the output from the rule system
is a list of two query trees that correspond to these statements:
INSERT INTO shoelace_log VALUES (
shoelace_data.sl_name, 6,
current_user, current_timestamp )
FROM shoelace_data
WHERE 6 <> shoelace_data.sl_avail
AND shoelace_data.sl_name = 'sl7';
UPDATE shoelace_data SET sl_avail = 6
WHERE sl_name = 'sl7';
These are executed in this order, and that is exactly what
the rule was meant to do.
The substitutions and the added qualifications
ensure that, if the original query would be, say,
UPDATE shoelace_data SET sl_color = 'green'
WHERE sl_name = 'sl7';
no log entry would get written. In that case, the original query
tree does not contain a target list entry for
sl_avail, so NEW.sl_avail will get
replaced by shoelace_data.sl_avail. Thus, the extra
command generated by the rule is
INSERT INTO shoelace_log VALUES (
shoelace_data.sl_name, shoelace_data.sl_avail,
current_user, current_timestamp )
FROM shoelace_data
WHERE shoelace_data.sl_avail <> shoelace_data.sl_avail
AND shoelace_data.sl_name = 'sl7';
and that qualification will never be true.
It will also work if the original query modifies multiple rows. So
if someone issued the command
UPDATE shoelace_data SET sl_avail = 0
WHERE sl_color = 'black';
four rows in fact get updated (sl1, sl2, sl3, and sl4).
But sl3 already has sl_avail = 0. In this case, the original
query trees qualification is different and that results
in the extra query tree
INSERT INTO shoelace_log
SELECT shoelace_data.sl_name, 0,
current_user, current_timestamp
FROM shoelace_data
WHERE 0 <> shoelace_data.sl_avail
AND shoelace_data.sl_color = 'black';
being generated by the rule. This query tree will surely insert
three new log entries. And that's absolutely correct.
Here we can see why it is important that the original query tree
is executed last. If the UPDATE had been
executed first, all the rows would have already been set to zero, so the
logging INSERT would not find any row where
0 <> shoelace_data.sl_avail.
A simple way to protect view relations from the mentioned
possibility that someone can try to run INSERT,
UPDATE, or DELETE on them is
to let those query trees get thrown away. So we create the rules
CREATE RULE shoe_ins_protect AS ON INSERT TO shoe
DO INSTEAD NOTHING;
CREATE RULE shoe_upd_protect AS ON UPDATE TO shoe
DO INSTEAD NOTHING;
CREATE RULE shoe_del_protect AS ON DELETE TO shoe
DO INSTEAD NOTHING;
If someone now tries to do any of these operations on the view
relation shoe, the rule system will
apply these rules. Since the rules have
no actions and are INSTEAD, the resulting list of
query trees will be empty and the whole query will become
nothing because there is nothing left to be optimized or
executed after the rule system is done with it.
A more sophisticated way to use the rule system is to
create rules that rewrite the query tree into one that
does the right operation on the real tables. To do that
on the shoelace view, we create
the following rules:
CREATE RULE shoelace_ins AS ON INSERT TO shoelace
DO INSTEAD
INSERT INTO shoelace_data VALUES (
NEW.sl_name,
NEW.sl_avail,
NEW.sl_color,
NEW.sl_len,
NEW.sl_unit
);
CREATE RULE shoelace_upd AS ON UPDATE TO shoelace
DO INSTEAD
UPDATE shoelace_data
SET sl_name = NEW.sl_name,
sl_avail = NEW.sl_avail,
sl_color = NEW.sl_color,
sl_len = NEW.sl_len,
sl_unit = NEW.sl_unit
WHERE sl_name = OLD.sl_name;
CREATE RULE shoelace_del AS ON DELETE TO shoelace
DO INSTEAD
DELETE FROM shoelace_data
WHERE sl_name = OLD.sl_name;
Now assume that once in a while, a pack of shoelaces arrives at
the shop and a big parts list along with it. But you don't want
to manually update the shoelace view every
time. Instead we setup two little tables: one where you can
insert the items from the part list, and one with a special
trick. The creation commands for these are:
CREATE TABLE shoelace_arrive (
arr_name text,
arr_quant integer
);
CREATE TABLE shoelace_ok (
ok_name text,
ok_quant integer
);
CREATE RULE shoelace_ok_ins AS ON INSERT TO shoelace_ok
DO INSTEAD
UPDATE shoelace
SET sl_avail = sl_avail + NEW.ok_quant
WHERE sl_name = NEW.ok_name;
Now you can fill the table shoelace_arrive with
the data from the parts list:
SELECT * FROM shoelace_arrive;
arr_name | arr_quant
----------+-----------
sl3 | 10
sl6 | 20
sl8 | 20
(3 rows)
Take a quick look at the current data:
SELECT * FROM shoelace;
sl_name | sl_avail | sl_color | sl_len | sl_unit | sl_len_cm
----------+----------+----------+--------+---------+-----------
sl1 | 5 | black | 80 | cm | 80
sl2 | 6 | black | 100 | cm | 100
sl7 | 6 | brown | 60 | cm | 60
sl3 | 0 | black | 35 | inch | 88.9
sl4 | 8 | black | 40 | inch | 101.6
sl8 | 1 | brown | 40 | inch | 101.6
sl5 | 4 | brown | 1 | m | 100
sl6 | 0 | brown | 0.9 | m | 90
(8 rows)
Now move the arrived shoelaces in:
INSERT INTO shoelace_ok SELECT * FROM shoelace_arrive;
and check the results:
SELECT * FROM shoelace ORDER BY sl_name;
sl_name | sl_avail | sl_color | sl_len | sl_unit | sl_len_cm
----------+----------+----------+--------+---------+-----------
sl1 | 5 | black | 80 | cm | 80
sl2 | 6 | black | 100 | cm | 100
sl7 | 6 | brown | 60 | cm | 60
sl4 | 8 | black | 40 | inch | 101.6
sl3 | 10 | black | 35 | inch | 88.9
sl8 | 21 | brown | 40 | inch | 101.6
sl5 | 4 | brown | 1 | m | 100
sl6 | 20 | brown | 0.9 | m | 90
(8 rows)
SELECT * FROM shoelace_log;
sl_name | sl_avail | log_who| log_when
---------+----------+--------+----------------------------------
sl7 | 6 | Al | Tue Oct 20 19:14:45 1998 MET DST
sl3 | 10 | Al | Tue Oct 20 19:25:16 1998 MET DST
sl6 | 20 | Al | Tue Oct 20 19:25:16 1998 MET DST
sl8 | 21 | Al | Tue Oct 20 19:25:16 1998 MET DST
(4 rows)
It's a long way from the one INSERT ... SELECT
to these results. And the description of the query-tree
transformation will be the last in this chapter. First, there is
the parser's output
INSERT INTO shoelace_ok
SELECT shoelace_arrive.arr_name, shoelace_arrive.arr_quant
FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok;
Now the first rule shoelace_ok_ins is applied and turns this
into
UPDATE shoelace
SET sl_avail = shoelace.sl_avail + shoelace_arrive.arr_quant
FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok,
shoelace_ok *OLD*, shoelace_ok *NEW*,
shoelace shoelace
WHERE shoelace.sl_name = shoelace_arrive.arr_name;
and throws away the original INSERT on
shoelace_ok. This rewritten query is passed to
the rule system again, and the second applied rule
shoelace_upd produces
UPDATE shoelace_data
SET sl_name = shoelace.sl_name,
sl_avail = shoelace.sl_avail + shoelace_arrive.arr_quant,
sl_color = shoelace.sl_color,
sl_len = shoelace.sl_len,
sl_unit = shoelace.sl_unit
FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok,
shoelace_ok *OLD*, shoelace_ok *NEW*,
shoelace shoelace, shoelace *OLD*,
shoelace *NEW*, shoelace_data shoelace_data
WHERE shoelace.sl_name = shoelace_arrive.arr_name
AND shoelace_data.sl_name = shoelace.sl_name;
Again it's an INSTEAD rule and the previous query tree is trashed.
Note that this query still uses the view shoelace.
But the rule system isn't finished with this step, so it continues
and applies the _RETURN rule on it, and we get
UPDATE shoelace_data
SET sl_name = s.sl_name,
sl_avail = s.sl_avail + shoelace_arrive.arr_quant,
sl_color = s.sl_color,
sl_len = s.sl_len,
sl_unit = s.sl_unit
FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok,
shoelace_ok *OLD*, shoelace_ok *NEW*,
shoelace shoelace, shoelace *OLD*,
shoelace *NEW*, shoelace_data shoelace_data,
shoelace *OLD*, shoelace *NEW*,
shoelace_data s, unit u
WHERE s.sl_name = shoelace_arrive.arr_name
AND shoelace_data.sl_name = s.sl_name;
Finally, the rule log_shoelace gets applied,
producing the extra query tree
INSERT INTO shoelace_log
SELECT s.sl_name,
s.sl_avail + shoelace_arrive.arr_quant,
current_user,
current_timestamp
FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok,
shoelace_ok *OLD*, shoelace_ok *NEW*,
shoelace shoelace, shoelace *OLD*,
shoelace *NEW*, shoelace_data shoelace_data,
shoelace *OLD*, shoelace *NEW*,
shoelace_data s, unit u,
shoelace_data *OLD*, shoelace_data *NEW*
shoelace_log shoelace_log
WHERE s.sl_name = shoelace_arrive.arr_name
AND shoelace_data.sl_name = s.sl_name
AND (s.sl_avail + shoelace_arrive.arr_quant) <> s.sl_avail;
After that the rule system runs out of rules and returns the
generated query trees.
So we end up with two final query trees that are equivalent to the
SQL statements
INSERT INTO shoelace_log
SELECT s.sl_name,
s.sl_avail + shoelace_arrive.arr_quant,
current_user,
current_timestamp
FROM shoelace_arrive shoelace_arrive, shoelace_data shoelace_data,
shoelace_data s
WHERE s.sl_name = shoelace_arrive.arr_name
AND shoelace_data.sl_name = s.sl_name
AND s.sl_avail + shoelace_arrive.arr_quant <> s.sl_avail;
UPDATE shoelace_data
SET sl_avail = shoelace_data.sl_avail + shoelace_arrive.arr_quant
FROM shoelace_arrive shoelace_arrive,
shoelace_data shoelace_data,
shoelace_data s
WHERE s.sl_name = shoelace_arrive.sl_name
AND shoelace_data.sl_name = s.sl_name;
The result is that data coming from one relation inserted into another,
changed into updates on a third, changed into updating
a fourth plus logging that final update in a fifth
gets reduced into two queries.
There is a little detail that's a bit ugly. Looking at the two
queries, it turns out that the shoelace_data
relation appears twice in the range table where it could
definitely be reduced to one. The planner does not handle it and
so the execution plan for the rule systems output of the
INSERT will be
Nested Loop
-> Merge Join
-> Seq Scan
-> Sort
-> Seq Scan on s
-> Seq Scan
-> Sort
-> Seq Scan on shoelace_arrive
-> Seq Scan on shoelace_data
while omitting the extra range table entry would result in a
Merge Join
-> Seq Scan
-> Sort
-> Seq Scan on s
-> Seq Scan
-> Sort
-> Seq Scan on shoelace_arrive
which produces exactly the same entries in the log table. Thus,
the rule system caused one extra scan on the table
shoelace_data that is absolutely not
necessary. And the same redundant scan is done once more in the
UPDATE. But it was a really hard job to make
that all possible at all.
Now we make a final demonstration of the
EnterpriseDB rule system and its power.
Say you add some shoelaces with extraordinary colors to your
database:
INSERT INTO shoelace VALUES ('sl9', 0, 'pink', 35.0, 'inch', 0.0);
INSERT INTO shoelace VALUES ('sl10', 1000, 'magenta', 40.0, 'inch', 0.0);
We would like to make a view to check which
shoelace entries do not fit any shoe in color.
The view for this is
CREATE VIEW shoelace_mismatch AS
SELECT * FROM shoelace WHERE NOT EXISTS
(SELECT shoename FROM shoe WHERE slcolor = sl_color);
Its output is
SELECT * FROM shoelace_mismatch;
sl_name | sl_avail | sl_color | sl_len | sl_unit | sl_len_cm
---------+----------+----------+--------+---------+-----------
sl9 | 0 | pink | 35 | inch | 88.9
sl10 | 1000 | magenta | 40 | inch | 101.6
Now we want to set it up so that mismatching shoelaces that are
not in stock are deleted from the database.
To make it a little harder for EnterpriseDB,
we don't delete it directly. Instead we create one more view
CREATE VIEW shoelace_can_delete AS
SELECT * FROM shoelace_mismatch WHERE sl_avail = 0;
and do it this way:
DELETE FROM shoelace WHERE EXISTS
(SELECT * FROM shoelace_can_delete
WHERE sl_name = shoelace.sl_name);
Voil:
SELECT * FROM shoelace;
sl_name | sl_avail | sl_color | sl_len | sl_unit | sl_len_cm
---------+----------+----------+--------+---------+-----------
sl1 | 5 | black | 80 | cm | 80
sl2 | 6 | black | 100 | cm | 100
sl7 | 6 | brown | 60 | cm | 60
sl4 | 8 | black | 40 | inch | 101.6
sl3 | 10 | black | 35 | inch | 88.9
sl8 | 21 | brown | 40 | inch | 101.6
sl10 | 1000 | magenta | 40 | inch | 101.6
sl5 | 4 | brown | 1 | m | 100
sl6 | 20 | brown | 0.9 | m | 90
(9 rows)
A DELETE on a view, with a subquery qualification that
in total uses 4 nesting/joined views, where one of them
itself has a subquery qualification containing a view
and where calculated view columns are used,
gets rewritten into
one single query tree that deletes the requested data
from a real table.
There are probably only a few situations out in the real world
where such a construct is necessary. But it makes you feel
comfortable that it works.