SQLAlchemy 0.3 Documentation

Version: 0.3.5 Last Updated: 03/18/07 18:39:07

Table of Contents

   (view full table)

Table of Contents: Full

   (view brief table)

This tutorial provides a relatively simple walking tour through the basic concepts of SQLAlchemy. You may wish to skip it and dive into the main manual which is more reference-oriented. The examples in this tutorial comprise a fully working interactive Python session, and are guaranteed to be functioning courtesy of doctest.

Installation

Installing SQLAlchemy

Installing SQLAlchemy from scratch is most easily achieved with setuptools. (setuptools installation). Just run this from the command-line:

# easy_install SQLAlchemy

This command will download the latest version of SQLAlchemy from the Python Cheese Shop and install it to your system.

Otherwise, you can install from the distribution using the setup.py script:

# python setup.py install
back to section top

Installing a Database API

SQLAlchemy is designed to operate with a DBAPI implementation built for a particular database, and includes support for the most popular databases. If you have one of the supported DBAPI implementations, you can proceed to the following section. Otherwise SQLite is an easy-to-use database to get started with, which works with plain files or in-memory databases.

SQLite is included with Python 2.5 and greater.

If you are working with Python 2.3 or 2.4, SQLite and the Python API for SQLite can be installed from the following packages:

Note that the SQLite library download is not required with Windows, as the Windows Pysqlite library already includes it linked in. Pysqlite and SQLite can also be installed on Linux or FreeBSD via pre-made packages or from sources.

back to section top

Getting Started

Imports

To start connecting to databases and begin issuing queries, we want to import the base of SQLAlchemy's functionality, which is provided under the module name of sqlalchemy. For the purposes of this tutorial, we will import its full list of symbols into our own local namespace.

>>> from sqlalchemy import *

Note that importing using the * operator pulls all the names from sqlalchemy into the local module namespace, which in a real application can produce name conflicts. Therefore its recommended in practice to either import the individual symbols desired (i.e. from sqlalchemy import Table, Column) or to import under a distinct namespace (i.e. import sqlalchemy as sa).

back to section top

Connecting to the Database

After our imports, the next thing we need is a handle to the desired database, represented by an Engine object. This object handles the business of managing connections and dealing with the specifics of a particular database. Below, we will make a SQLite connection to a file-based database called "tutorial.db".

>>> db = create_engine('sqlite:///tutorial.db')

Technically, the above statement did not make an actual connection to the sqlite database just yet. As soon as we begine working with the engine, it will start creating connections. In the case of SQLite, the tutorial.db file will actually be created at the moment it is first used, if the file does not exist already.

For full information on creating database engines, including those for SQLite and others, see Database Engines.

back to section top

SQLAlchemy is Two Libraries in One

Now that the basics of installing SQLAlchemy and connecting to our database are established, we can start getting in to actually doing something. But first, a little bit of explanation is required.

A central concept of SQLAlchemy is that it actually contains two distinct areas of functionality, one of which builds upon the other. One is a SQL Construction Language and the other is an Object Relational Mapper ("ORM" for short). The SQL construction language allows you to construct objects called ClauseElements which represent SQL expressions. These ClauseElements can then be executed against any database, where they are compiled into strings that are appropriate for the target database, and return an object called a ResultProxy, which is essentially a result set object that acts very much like a deluxe version of the dbapi cursor object.

The Object Relational Mapper (ORM) is a set of tools completely distinct from the SQL Construction Language which serve the purpose of mapping Python object instances into database rows, providing a rich selection interface with which to retrieve instances from tables as well as a comprehensive solution to persisting changes on those instances back into the database. When working with the ORM, its underlying workings as well as its public API make extensive use of the SQL Construction Language, however the general theory of operation is slightly different. Instead of working with database rows directly, you work with your own user-defined classes and object instances. Additionally, the method of issuing queries to the database is different, as the ORM handles the job of generating most of the SQL required, and instead requires more information about what kind of class instances you'd like to load and where you'd like to put them.

Where SA is somewhat unique, more powerful, and slightly more complicated is that the two areas of functionality can be mixed together in many ways. A key strategy to working with SA effectively is to have a solid awareness of these two distinct toolsets, and which concepts of SA belong to each - even some publications have confused the SQL Construction Language with the ORM. The key difference between the two is that when you're working with cursor-like result sets its the SQL Construction Language, and when working with collections of your own class instances its the Object Relational Mapper.

This tutorial will first focus on the basic configuration that is common to using both the SQL Construction Language as well as the ORM, which is to declare information about your database called table metadata. This will be followed by some constructed SQL examples, and then into usage of the ORM utilizing the same data we established in the SQL construction examples.

back to section top

Working with Database Objects

Defining Metadata, Binding to Engines

Configuring SQLAlchemy for your database consists of creating objects called Tables, each of which represent an actual table in the database. A collection of Table objects resides in a MetaData object which is essentially a table collection. We will create a handy form of MetaData that automatically connects to our Engine (connecting a schema object to an Engine is called binding):

>>> metadata = BoundMetaData(db)

An equivalent operation is to create the BoundMetaData object directly with an Engine URL, which calls the create_engine call for us:

>>> metadata = BoundMetaData('sqlite:///tutorial.db')

Now, when we tell "metadata" about the tables in our database, we can issue CREATE statements for those tables, as well as execute SQL statements derived from them, without needing to open or close any connections; that will be all done automatically.

Note that SQLALchemy fully supports the usage of explicit Connection objects for all SQL operations, which may be in conjunction with plain MetaData objects that are entirely unbound to any Engine, providing a more decoupled pattern that allows finer-grained control of connections than the "bound" approach this tutorial will present. For the purposes of this tutorial, we will stick with "bound" objects, as it allows us to focus more on SA's general concepts, leaving explicit connection management as a more advanced topic.

back to section top

Creating a Table

With metadata as our established home for tables, lets make a Table for it:

>>> users_table = Table('users', metadata,
...     Column('user_id', Integer, primary_key=True),
...     Column('user_name', String(40)),
...     Column('password', String(10))
... )

As you might have guessed, we have just defined a table named users which has three columns: user_id (which is a primary key column), user_name and password. Currently it is just an object that doesn't necessarily correspond to an existing table in our database. To actually create the table, we use the create() method. To make it interesting, we will have SQLAlchemy echo the SQL statements it sends to the database, by setting the echo flag on the Engine associated with our BoundMetaData:

>>> metadata.engine.echo = True
>>> users_table.create() 
CREATE TABLE users (
    user_id INTEGER NOT NULL,
    user_name VARCHAR(40),
    password VARCHAR(10),
    PRIMARY KEY (user_id)
)
...

Alternatively, the users table might already exist (such as, if you're running examples from this tutorial for the second time), in which case you can just skip the create() method call. You can even skip defining the individual columns in the users table and ask SQLAlchemy to load its definition from the database:

>>> users_table = Table('users', metadata, autoload=True)
>>> list(users_table.columns)[0].name
'user_id'

Loading a table's columns from the database is called reflection. Documentation on table metadata, including reflection, is available in Database Meta Data.

back to section top

Inserting Rows

Inserting is achieved via the insert() method, which defines a clause object (known as a ClauseElement) representing an INSERT statement:

>>> i = users_table.insert()
>>> i 
<sqlalchemy.sql._Insert object at 0x...>
>>> # the string form of the Insert object is a generic SQL representation
>>> print i
INSERT INTO users (user_id, user_name, password) VALUES (?, ?, ?)

Since we created this insert statement object from the users table which is bound to our Engine, the statement itself is also bound to the Engine, and supports executing itself. The execute() method of the clause object will compile the object into a string according to the underlying dialect of the Engine to which the statement is bound, and will then execute the resulting statement.

>>> # insert a single row
>>> i.execute(user_name='Mary', password='secure') 
INSERT INTO users (user_name, password) VALUES (?, ?)
['Mary', 'secure']
COMMIT
<sqlalchemy.engine.base.ResultProxy object at 0x...>

>>> # insert multiple rows simultaneously
>>> i.execute({'user_name':'Tom'}, {'user_name':'Fred'}, {'user_name':'Harry'}) 
INSERT INTO users (user_name) VALUES (?)
[['Tom'], ['Fred'], ['Harry']]
COMMIT
<sqlalchemy.engine.base.ResultProxy object at 0x...>

Note that the VALUES clause of each INSERT statement was automatically adjusted to correspond to the parameters sent to the execute() method. This is because the compilation step of a ClauseElement takes into account not just the constructed SQL object and the specifics of the type of database being used, but the execution parameters sent along as well.

When constructing clause objects, SQLAlchemy will bind all literal values into bind parameters. On the construction side, bind parameters are always treated as named parameters. At compilation time, SQLAlchemy will convert them into their proper format, based on the paramstyle of the underlying DBAPI. This works equally well for all named and positional bind parameter formats described in the DBAPI specification.

Documentation on inserting: Inserts.

back to section top

Selecting

Let's check that the data we have put into users table is actually there. The procedure is analogous to the insert example above, except you now call the select() method off the users table:

>>> s = users_table.select()
>>> print s
SELECT users.user_id, users.user_name, users.password 
FROM users
>>> r = s.execute()
SELECT users.user_id, users.user_name, users.password 
FROM users
[]

This time, we won't ignore the return value of execute(). Its an instance of ResultProxy, which is a result-holding object that behaves very similarly to the cursor object one deals with directly with a database API:

>>> r 
<sqlalchemy.engine.base.ResultProxy object at 0x...>
>>> r.fetchone()
(1, u'Mary', u'secure')
>>> r.fetchall()
[(2, u'Tom', None), (3, u'Fred', None), (4, u'Harry', None)]

Query criterion for the select is specified using Python expressions, using the Column objects in the Table as a base. All expressions constructed from Column objects are themselves instances of ClauseElements, just like the Select, Insert, and Table objects themselves.

>>> r = users_table.select(users_table.c.user_name=='Harry').execute()
SELECT users.user_id, users.user_name, users.password 
FROM users 
WHERE users.user_name = ?
['Harry']
>>> row = r.fetchone()
>>> print row
(4, u'Harry', None)

Pretty much the full range of standard SQL operations are supported as constructed Python expressions, including joins, ordering, grouping, functions, correlated subqueries, unions, etc. Documentation on selecting: Simple Select.

back to section top

Working with Rows

You can see that when we print out the rows returned by an execution result, it prints the rows as tuples. These rows in fact support both the list and dictionary interfaces. The dictionary interface allows the addressing of columns by string column name, or even the original Column object:

>>> row.keys()
['user_id', 'user_name', 'password']
>>> row['user_id'], row[1], row[users_table.c.password] 
(4, u'Harry', None)

Addressing the columns in a row based on the original Column object is especially handy, as it eliminates the need to work with literal column names altogether.

Result sets also support iteration. We'll show this with a slightly different form of select that allows you to specify the specific columns to be selected:

>>> for row in select([users_table.c.user_id, users_table.c.user_name]).execute(): 
...     print row
SELECT users.user_id, users.user_name
FROM users
[]
(1, u'Mary')
(2, u'Tom')
(3, u'Fred')
(4, u'Harry')
back to section top

Table Relationships

Lets create a second table, email_addresses, which references the users table. To define the relationship between the two tables, we will use the ForeignKey construct. We will also issue the CREATE statement for the table:

>>> email_addresses_table = Table('email_addresses', metadata,
...     Column('address_id', Integer, primary_key=True),
...     Column('email_address', String(100), nullable=False),
...     Column('user_id', Integer, ForeignKey('users.user_id')))
>>> email_addresses_table.create() 
CREATE TABLE email_addresses (
    address_id INTEGER NOT NULL,
    email_address VARCHAR(100) NOT NULL,
    user_id INTEGER,
    PRIMARY KEY (address_id),
    FOREIGN KEY(user_id) REFERENCES users (user_id)
)
...

Above, the email_addresses table is related to the users table via the ForeignKey('users.user_id'). The ForeignKey constructor can take a Column object or a string representing the table and column name. When using the string argument, the referenced table must exist within the same MetaData object; thats where it looks for the other table!

Next, lets put a few rows in:

>>> email_addresses_table.insert().execute(
...     {'email_address':'[email protected]', 'user_id':2},
...     {'email_address':'[email protected]', 'user_id':1}) 
INSERT INTO email_addresses (email_address, user_id) VALUES (?, ?)
[['[email protected]', 2], ['[email protected]', 1]]
COMMIT
<sqlalchemy.engine.base.ResultProxy object at 0x...>

With two related tables, we can now construct a join amongst them using the join method:

>>> r = users_table.join(email_addresses_table).select().execute()
SELECT users.user_id, users.user_name, users.password, email_addresses.address_id, email_addresses.email_address, email_addresses.user_id 
FROM users JOIN email_addresses ON users.user_id = email_addresses.user_id
[]
>>> print [row for row in r]
[(1, u'Mary', u'secure', 2, u'[email protected]', 1), (2, u'Tom', None, 1, u'[email protected]', 2)]

The join method is also a standalone function in the sqlalchemy namespace. The join condition is figured out from the foreign keys of the Table objects given. The condition (also called the "ON clause") can be specified explicitly, such as in this example where we locate all users that used their email address as their password:

>>> print join(users_table, email_addresses_table, 
...     and_(users_table.c.user_id==email_addresses_table.c.user_id, 
...     users_table.c.password==email_addresses_table.c.email_address)
...     )
users JOIN email_addresses ON users.user_id = email_addresses.user_id AND users.password = email_addresses.email_address
back to section top

Working with Object Mappers

Now that we have a little bit of Table and SQL operations covered, lets look into SQLAlchemy's ORM (object relational mapper). With the ORM, you associate Tables (and other Selectable units, like queries and table aliases) with Python classes, into units called Mappers. Then you can execute queries that return lists of object instances, instead of result sets. The object instances themselves are associated with an object called a Session, which automatically tracks changes on each object and supports a "save all at once" operation called a flush.

To start, we will import the names necessary to use SQLAlchemy's ORM, again using import * for simplicities sake, even though we all know that in real life we should be importing individual names via "from sqlalchemy.orm import symbol1, symbol2, ..." or "import sqlalchemy.orm as orm":

>>> from sqlalchemy.orm import *

It should be noted that the above step is technically not needed when working with the 0.3 series of SQLAlchemy; all symbols from the orm package are also included in the sqlalchemy package. However, a future release (most likely the 0.4 series) will make the separate orm import required in order to use the object relational mapper, so its a good practice for now.

Creating a Mapper

A Mapper is usually created once per Python class, and at its core primarily means to say, "objects of this class are to be stored as rows in this table". Lets create a class called User, which will represent a user object that is stored in our users table:

>>> class User(object):
...     def __repr__(self):
...        return "%s(%r,%r)" % (
...            self.__class__.__name__, self.user_name, self.password)

The class is a new style class (i.e. it extends object) and does not require a constructor (although one may be provided if desired). We just have one __repr__ method on it which will display basic information about the User. Note that the __repr__ method references the instance variables user_name and password which otherwise aren't defined. While we are free to explicitly define these attributes and treat them normally, this is optional; as SQLAlchemy's Mapper construct will manage them for us, since their names correspond to the names of columns in the users table. Lets create a mapper, and observe that these attributes are now defined:

>>> mapper(User, users_table) 
<sqlalchemy.orm.mapper.Mapper object at 0x...>
>>> u1 = User()
>>> print u1.user_name
None
>>> print u1.password
None

The mapper function returns a new instance of Mapper. As it is the first Mapper we have created for the User class, it is known as the classes' primary mapper. We generally don't need to hold onto the return value of the mapper function; SA can automatically locate this Mapper as needed when it deals with the User class.

back to section top

Obtaining a Session

After you create a Mapper, all operations with that Mapper require the usage of an important object called a Session. All objects loaded or saved by the Mapper must be attached to a Session object, which represents a kind of "workspace" of objects that are loaded into memory. A particular object instance can only be attached to one Session at a time (but of course can be moved around or detached altogether).

By default, you have to create a Session object explicitly before you can load or save objects. Theres several ways to manage sessions, but the most straightforward is to just create one, which we will do by saying, create_session():

>>> session = create_session()
>>> session 
<sqlalchemy.orm.session.Session object at 0x...>
back to section top

The Query Object

The Session has all kinds of methods on it to manage and inspect its collection of objects. The Session also provides an easy interface which can be used to query the database, by giving you an instance to a Query object corresponding to a particular Python class:

>>> query = session.query(User)
>>> print query.select_by(user_name='Harry')
SELECT users.user_name AS users_user_name, users.password AS users_password, users.user_id AS users_user_id 
FROM users 
WHERE users.user_name = ? ORDER BY users.oid
['Harry']
[User(u'Harry',None)]

All querying for objects is performed via an instance of Query. The various select methods on an instance of Mapper also use an underlying Query object to perform the operation. A Query is always bound to a specific Session.

Lets turn off the database echoing for a moment, and try out a few methods on Query. Methods that end with the suffix _by primarily take keyword arguments which correspond to properties on the object. Other methods take ClauseElement objects, which are constructed by using Column objects inside of Python expressions, in the same way as we did with our SQL select example in the previous section of this tutorial. Using ClauseElement structures to query objects is more verbose but more flexible:

>>> metadata.engine.echo = False
>>> print query.select(User.c.user_id==3)
[User(u'Fred',None)]
>>> print query.get(2)
User(u'Tom',None)
>>> print query.get_by(user_name='Mary')
User(u'Mary',u'secure')
>>> print query.selectfirst(User.c.password==None)
User(u'Tom',None)
>>> print query.count()
4

Notice that our User class has a special attribute c attached to it. This 'c' represents the columns on the User's mapper's Table object. Saying User.c.user_name is synonymous with saying users_table.c.user_name, recalling that User is the Python class and users is our Table object.

back to section top

Making Changes

With a little experience in loading objects, lets see what its like to make changes. First, lets create a new user "Ed". We do this by just constructing the new object. Then, we just add it to the session:

>>> ed = User()
>>> ed.user_name = 'Ed'
>>> ed.password = 'edspassword'
>>> session.save(ed)
>>> ed in session
True

Lets also make a few changes on some of the objects in the database. We will load them with our Query object, and then change some things.

>>> mary = query.get_by(user_name='Mary')
>>> harry = query.get_by(user_name='Harry')
>>> mary.password = 'marysnewpassword'
>>> harry.password = 'harrysnewpassword'

At the moment, nothing has been saved to the database; all of our changes are in memory only. What happens if some other part of the application also tries to load 'Mary' from the database and make some changes before we had a chance to save it ? Assuming that the same Session is used, loading 'Mary' from the database a second time will issue a second query in order locate the primary key of 'Mary', but will return the same object instance as the one already loaded. This behavior is due to an important property of the Session known as the identity map:

>>> mary2 = query.get_by(user_name='Mary')
>>> mary is mary2
True

With the identity map, a single Session can be relied upon to keep all loaded instances straight.

As far as the issue of the same object being modified in two different Sessions, that's an issue of concurrency detection; SQLAlchemy does some basic concurrency checks when saving objects, with the option for a stronger check using version ids. See advdatamapping_arguments for more details.

back to section top

Saving

With a new user "ed" and some changes made on "Mary" and "Harry", lets also mark "Fred" as deleted:

>>> fred = query.get_by(user_name='Fred')
>>> session.delete(fred)

Then to send all of our changes to the database, we flush() the Session. Lets turn echo back on to see this happen!:

>>> metadata.engine.echo = True
>>> session.flush()
BEGIN
UPDATE users SET password=? WHERE users.user_id = ?
['marysnewpassword', 1]
UPDATE users SET password=? WHERE users.user_id = ?
['harrysnewpassword', 4]
INSERT INTO users (user_name, password) VALUES (?, ?)
['Ed', 'edspassword']
DELETE FROM users WHERE users.user_id = ?
[3]
COMMIT
back to section top

Relationships

When our User object contains relationships to other kinds of information, such as a list of email addresses, we can indicate this by using a function when creating the Mapper called relation(). While there is a lot you can do with relations, we'll cover a simple one here. First, recall that our users table has a foreign key relationship to another table called email_addresses. A single row in email_addresses has a column user_id that references a row in the users table; since many rows in the email_addresses table can reference a single row in users, this is called a one to many relationship.

To illustrate this relationship, we will start with a new mapper configuration. Since our User class has a mapper assigned to it, we want to discard it and start over again. So we issue the clear_mappers() function first, which removes all mapping associations from classes:

>>> clear_mappers()

When removing mappers, it is usually best to remove all mappings at the same time, since mappers usually have relationships to each other which will become invalid if only part of the mapper collection is removed. In practice, a particular mapping setup will usually remain throughout the lifetime of an application. Clearing out the mappers and making new ones is a practice that is generally limited to writing mapper unit tests and experimenting from the console.

Next, we want to create a class/mapping that corresponds to the email_addresses table. We will create a new class Address which represents a single row in the email_addresses table, and a corresponding Mapper which will associate the Address class with the email_addresses table:

>>> class Address(object):
...     def __init__(self, email_address):
...         self.email_address = email_address
...     def __repr__(self):
...         return "%s(%r)" % (
...            self.__class__.__name__, self.email_address)    
>>> mapper(Address, email_addresses_table) 
<sqlalchemy.orm.mapper.Mapper object at 0x...>

We then create a mapper for the User class which contains a relationship to the Address class using the relation() function:

>>> mapper(User, users_table, properties={ 
...    'addresses':relation(Address)
... })
<sqlalchemy.orm.mapper.Mapper object at 0x...>

The relation() function takes either a class or a Mapper as its first argument, and has many options to further control its behavior. When this mapping relationship is used, each new User instance will contain an attribute called addresses. SQLAlchemy will automatically determine that this relationship is a one-to-many relationship, and will subsequently create addresses as a list. When a new User is created, this list will begin as empty.

The order in which the mapping definitions for User and Address is created is not significant. When the mapper() function is called, it creates an uncompiled mapping record corresponding to the given class/table combination. When the mappers are first used, the entire collection of mappers created up until that point will be compiled, which involves the establishment of class instrumentation as well as the resolution of all mapping relationships.

Lets try out this new mapping configuration, and see what we get for the email addresses already in the database. Since we have made a new mapping configuration, its best that we clear out our Session, which is currently holding onto every User object we have already loaded:

>>> session.clear()

We can then treat the addresses attribute on each User object like a regular list:

>>> mary = query.get_by(user_name='Mary') 
SELECT users.user_name AS users_user_name, users.password AS users_password, users.user_id AS users_user_id 
FROM users 
WHERE users.user_name = ? ORDER BY users.oid 
LIMIT 1 OFFSET 0
['Mary']
>>> print [a for a in mary.addresses]
SELECT email_addresses.user_id AS email_addresses_user_id, email_addresses.address_id AS email_addresses_address_id, email_addresses.email_address AS email_addresses_email_address 
FROM email_addresses 
WHERE ? = email_addresses.user_id ORDER BY email_addresses.oid
[1]
[Address(u'[email protected]')]

Adding to the list is just as easy. New Address objects will be detected and saved when we flush the Session:

>>> mary.addresses.append(Address('[email protected]'))
>>> session.flush() 
BEGIN
INSERT INTO email_addresses (email_address, user_id) VALUES (?, ?)
['[email protected]', 1]
COMMIT

Main documentation for using mappers: Data Mapping

back to section top

Transactions

You may have noticed from the example above that when we say session.flush(), SQLAlchemy indicates the names BEGIN and COMMIT to indicate a transaction with the database. The flush() method, since it may execute many statements in a row, will automatically use a transaction in order to execute these instructions. But what if we want to use flush() inside of a larger transaction? This is performed via the SessionTransaction object, which we can establish using session.create_transaction(). Below, we will perform a more complicated SELECT statement, make several changes to our collection of users and email addresess, and then create a new user with two email addresses, within the context of a transaction. We will perform a flush() in the middle of it to write the changes we have so far, and then allow the remaining changes to be written when we finally commit() the transaction. We enclose our operations within a try/except block to ensure that resources are properly freed:

>>> transaction = session.create_transaction()
>>> try: 
...     (ed, harry, mary) = session.query(User).select(
...         User.c.user_name.in_('Ed', 'Harry', 'Mary'), order_by=User.c.user_name
...     )
...     del mary.addresses[1]
...     harry.addresses.append(Address('[email protected]'))
...     session.flush()
...     print "***flushed the session***"
...     fred = User()
...     fred.user_name = 'fred_again'
...     fred.addresses.append(Address('[email protected]'))
...     fred.addresses.append(Address('[email protected]'))
...     session.save(fred)
...     transaction.commit()
... except:
...     transaction.rollback()
...     raise
BEGIN
SELECT users.user_name AS users_user_name, users.password AS users_password, users.user_id AS users_user_id 
FROM users 
WHERE users.user_name IN (?, ?, ?) ORDER BY users.user_name
['Ed', 'Harry', 'Mary']
SELECT email_addresses.user_id AS email_addresses_user_id, email_addresses.address_id AS email_addresses_address_id, email_addresses.email_address AS email_addresses_email_address 
FROM email_addresses 
WHERE ? = email_addresses.user_id ORDER BY email_addresses.oid
[4]
UPDATE email_addresses SET user_id=? WHERE email_addresses.address_id = ?
[None, 3]
INSERT INTO email_addresses (email_address, user_id) VALUES (?, ?)
['[email protected]', 4]
***flushed the session***    
INSERT INTO users (user_name, password) VALUES (?, ?)
['fred_again', None]
INSERT INTO email_addresses (email_address, user_id) VALUES (?, ?)
['[email protected]', 6]
INSERT INTO email_addresses (email_address, user_id) VALUES (?, ?)
['[email protected]', 6]
COMMIT

Main documentation: Session / Unit of Work

back to section top

Next Steps

That covers a quick tour through the basic idea of SQLAlchemy, in its simplest form. Beyond that, one should familiarize oneself with the basics of Sessions, the various patterns that can be used to define different kinds of Mappers and relations among them, the rudimentary SQL types that are available when constructing Tables, and the basics of Engines, SQL statements, and database Connections.

back to section top

A database engine is a subclass of sqlalchemy.sql.Engine, and is the starting point for where SQLAlchemy provides a layer of abstraction on top of the various DBAPI2 database modules. For all databases supported by SA, there is a specific "implementation" module, found in the sqlalchemy.databases package, that provides all the objects an Engine needs in order to perform its job. A typical user of SQLAlchemy never needs to deal with these modules directly. For many purposes, the only knowledge that's needed is how to create an Engine for a particular connection URL. When dealing with direct execution of SQL statements, one would also be aware of Result, Connection, and Transaction objects. The primary public facing objects are:

  • URL - represents the identifier for a particular database. URL objects are usually created automatically based on a given connect string passed to the create_engine() function.
  • Engine - Combines a connection-providing resource with implementation-provided objects that know how to generate, execute, and gather information about SQL statements. It also provides the primary interface by which Connections are obtained, as well as a context for constructed SQL objects and schema constructs to "implicitly execute" themselves, which is an optional feature of SQLAlchemy.

  • Connection - represents a connection to the database. The underlying connection object returned by a DBAPI's connect() method is referenced internally by the Connection object. Connection provides methods that handle the execution of SQLAlchemy's own SQL constructs, as well as literal string-based statements.

  • Transaction - represents a transaction on a single Connection. Includes begin(), commit() and rollback() methods that support basic "nestable" behavior, meaning an outermost transaction is maintained against multiple nested calls to begin/commit.
  • ResultProxy - Represents the results of an execution, and is most analgous to the cursor object in DBAPI. It primarily allows iteration over result sets, but also provides an interface to information about inserts/updates/deletes, such as the count of rows affected, last inserted IDs, etc.
  • RowProxy - Represents a single row returned by the fetchone() method on ResultProxy.

Underneath the public-facing API of Engine, several components are provided by database implementations to provide the full behavior, including:

  • Dialect - this object is provided by database implementations to describe the behavior of a particular database. It acts as a repository for metadata about a database's characteristics, and provides factory methods for other objects that deal with generating SQL strings and objects that handle some of the details of statement execution.

  • ConnectionProvider - this object knows how to return a DBAPI connection object. It typically talks to a connection pool which maintains one or more connections in memory for quick re-use.
  • ExecutionContext - this object is created for each execution of a single SQL statement, and tracks information about its execution such as primary keys inserted, the total count of rows affected, etc. It also may implement any special logic that various DBAPI implementations may require before or after a statement execution.
  • Compiled - represents a "compiled" SQL expression object. Includes a compile() method which receives SQL expression objects and assembles them into strings that are suitable for direct execution. Also collects default bind parameters into a datastructure that will be converted at execution time into a dictionary or list, depending on the dialect's paramstyle.

Supported Databases

Engines exist for SQLite, Postgres, MySQL, and Oracle, using the Pysqlite, Psycopg2 (Psycopg1 will work to a limited degree, but it is not supported), MySQLDB, and cx_Oracle modules. There is also preliminary support for MS-SQL using adodbapi or pymssql, as well as Firebird. For each engine, a distinct Python module exists in the sqlalchemy.databases package, which provides implementations of some of the objects mentioned in the previous section.

Downloads for each DBAPI at the time of this writing are as follows:

The SQLAlchemy Wiki contains a page of database notes, describing whatever quirks and behaviors have been observed. Its a good place to check for issues with specific databases. Database Notes

back to section top

Establishing a Database Engine

SQLAlchemy indicates the source of an Engine strictly via RFC-1738 style URLs, combined with optional keyword arguments to specify options for the Engine. The form of the URL is:

$ driver://username:password@host:port/database

Available drivernames are sqlite, mysql, postgres, oracle, mssql, and firebird. For sqlite, the database name is the filename to connect to, or the special name ":memory:" which indicates an in-memory database. The URL is typically sent as a string to the create_engine() function:

# postgres
pg_db = create_engine('postgres://scott:tiger@localhost:5432/mydatabase')

# sqlite (note the four slashes for an absolute path)
sqlite_db = create_engine('sqlite:////absolute/path/to/database.txt')
sqlite_db = create_engine('sqlite:///relative/path/to/database.txt')
sqlite_db = create_engine('sqlite://')  # in-memory database

# mysql
mysql_db = create_engine('mysql://localhost/foo')

# oracle via TNS name
oracle_db = create_engine('oracle://scott:tiger@dsn')

# oracle will feed host/port/SID into cx_oracle.makedsn
oracle_db = create_engine('oracle://scott:[email protected]:1521/sidname')

The Engine will create its first connection to the database when a SQL statement is executed. As concurrent statements are executed, the underlying connection pool will grow to a default size of five connections, and will allow a default "overflow" of ten. Since the Engine is essentially "home base" for the connection pool, it follows that you should keep a single Engine per database established within an application, rather than creating a new one for each connection.

Custom DBAPI keyword arguments

Custom arguments can be passed to the underlying DBAPI in three ways. String-based arguments can be passed directly from the URL string as query arguments:

db = create_engine('postgres://scott:tiger@localhost/test?argument1=foo&argument2=bar')

If SQLAlchemy's database connector is aware of a particular query argument, it may convert its type from string to its proper type.

create_engine also takes an argument connect_args which is an additional dictionary that will be passed to connect(). This can be used when arguments of a type other than string are required, and SQLAlchemy's database connector has no type conversion logic present for that parameter:

db = create_engine('postgres://scott:tiger@localhost/test', connect_args = {'argument1':17, 'argument2':'bar'})

The most customizable connection method of all is to pass a creator argument, which specifies a callable that returns a DBAPI connection:

def connect():
    return psycopg.connect(user='scott', host='localhost')

db = create_engine('postgres://', creator=connect)
back to section top

Database Engine Options

Keyword options can also be specified to create_engine(), following the string URL as follows:

db = create_engine('postgres://...', encoding='latin1', echo=True, module=psycopg1)

A list of all standard options, as well as several that are used by particular database dialects, is as follows:

  • convert_unicode=False - if set to True, all String/character based types will convert Unicode values to raw byte values going into the database, and all raw byte values to Python Unicode coming out in result sets. This is an engine-wide method to provide unicode conversion across the board. For unicode conversion on a column-by-column level, use the Unicode column type instead, described in The Types System.
  • echo=False - if True, the Engine will log all statements as well as a repr() of their parameter lists to the engines logger, which defaults to sys.stdout. The echo attribute of Engine can be modified at any time to turn logging on and off. If set to the string "debug", result rows will be printed to the standard output as well. This flag ultimately controls a Python logger; see Configuring Logging for information on how to configure logging directly.
  • echo_pool=False - if True, the connection pool will log all checkouts/checkins to the logging stream, which defaults to sys.stdout. This flag ultimately controls a Python logger; see Configuring Logging for information on how to configure logging directly.
  • encoding='utf-8' - the encoding to use for all Unicode translations, both by engine-wide unicode conversion as well as the Unicode type object.
  • module=None - used by database implementations which support multiple DBAPI modules, this is a reference to a DBAPI2 module to be used instead of the engine's default module. For Postgres, the default is psycopg2, or psycopg1 if 2 cannot be found. For Oracle, its cx_Oracle.
  • pool=None - an actual pool instance. Note that an already-constructed pool should already know how to create database connections, so this option supercedes any other connect options specified. Typically, it is an instance of sqlalchemy.pool.Pool to be used as the underlying source for connections. For more on connection pooling, see Connection Pooling.

Example of a manual invocation of pool.QueuePool (which is the pool instance used for all databases except sqlite):

from sqlalchemy import *
import sqlalchemy.pool as pool
import MySQLdb

def getconn():
    return MySQLdb.connect(user='ed', dbname='mydb')

engine = create_engine('mysql://', pool=pool.QueuePool(getconn, pool_size=20, max_overflow=40))
  • poolclass=None - a sqlalchemy.pool.Pool subclass that will be instantated in place of the default connection pool.
  • max_overflow=10 - the number of connections to allow in connection pool "overflow", that is connections that can be opened above and beyond the pool_size setting, which defaults to five. this is only used with QueuePool.
  • pool_size=5 - the number of connections to keep open inside the connection pool. This used with QueuePool as well as SingletonThreadPool.
  • pool_recycle=-1 - this setting causes the pool to recycle connections after the given number of seconds has passed. It defaults to -1, or no timeout. For example, setting to 3600 means connections will be recycled after one hour. Note that MySQL in particular will disconnect automatically if no activity is detected on a connection for eight hours (although this is configurable with the MySQLDB connection itself and the server configuration as well).
  • pool_timeout=30 - number of seconds to wait before giving up on getting a connection from the pool. This is only used with QueuePool.
  • strategy='plain' - the Strategy argument is used to select alternate implementations of the underlying Engine object, which coordinates operations between dialects, compilers, connections, and so on. Currently, the only alternate strategy besides the default value of "plain" is the "threadlocal" strategy, which selects the usage of the TLEngine class that provides a modified connection scope for implicit executions. Implicit execution as well as further detail on this setting are described in Implicit Execution.
  • threaded=True - used by cx_Oracle; sets the threaded parameter of the connection indicating thread-safe usage. cx_Oracle docs indicate setting this flag to False will speed performance by 10-15%. While this defaults to False in cx_Oracle, SQLAlchemy defaults it to True, preferring stability over early optimization.
  • use_ansi=True - used only by Oracle; when False, the Oracle driver attempts to support a particular "quirk" of Oracle versions 8 and previous, that the LEFT OUTER JOIN SQL syntax is not supported, and the "Oracle join" syntax of using &lt;column1&gt;(+)=&lt;column2&gt; must be used in order to achieve a LEFT OUTER JOIN.
  • use_oids=False - used only by Postgres, will enable the column name "oid" as the object ID column, which is also used for the default sort order of tables. Postgres as of 8.1 has object IDs disabled by default.
back to section top

Configuring Logging

As of the 0.3 series of SQLAlchemy, Python's standard logging module is used to implement informational and debug log output. This allows SQLAlchemy's logging to integrate in a standard way with other applications and libraries. The echo and echo_pool flags that are present on create_engine(), as well as the echo_uow flag used on Session, all interact with regular loggers.

This section assumes familiarity with the above linked logging module. All logging performed by SQLAlchemy exists underneath the sqlalchemy namespace, as used by logging.getLogger('sqlalchemy'). When logging has been configured (i.e. such as via logging.basicConfig()), the general namespace of SA loggers that can be turned on is as follows:

  • sqlalchemy.engine - controls SQL echoing. set to logging.INFO for SQL query output, logging.DEBUG for query + result set output.
  • sqlalchemy.pool - controls connection pool logging. set to logging.INFO or lower to log connection pool checkouts/checkins.
  • sqlalchemy.orm - controls logging of various ORM functions. set to logging.INFO for configurational logging as well as unit of work dumps, logging.DEBUG for extensive logging during query and flush() operations. Subcategories of sqlalchemy.orm include:
    • sqlalchemy.orm.attributes - logs certain instrumented attribute operations, such as triggered callables
    • sqlalchemy.orm.mapper - logs Mapper configuration and operations
    • sqlalchemy.orm.unitofwork - logs flush() operations, including dependency sort graphs and other operations
    • sqlalchemy.orm.strategies - logs relation loader operations (i.e. lazy and eager loads)
    • sqlalchemy.orm.sync - logs synchronization of attributes from parent to child instances during a flush()

For example, to log SQL queries as well as unit of work debugging:

import logging

logging.basicConfig()
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)
logging.getLogger('sqlalchemy.orm.unitofwork').setLevel(logging.DEBUG)

By default, the log level is set to logging.ERROR within the entire sqlalchemy namespace so that no log operations occur, even within an application that has logging enabled otherwise.

The echo flags present as keyword arguments to create_engine() and others as well as the echo property on Engine, when set to True, will first attempt to insure that logging is enabled. Unfortunately, the logging module provides no way of determining if output has already been configured (note we are referring to if a logging configuration has been set up, not just that the logging level is set). For this reason, any echo=True flags will result in a call to logging.basicConfig() using sys.stdout as the destination. It also sets up a default format using the level name, timestamp, and logger name. Note that this configuration has the affect of being configured in addition to any existing logger configurations. Therefore, when using Python logging, insure all echo flags are set to False at all times, to avoid getting duplicate log lines.

back to section top

Using Connections

In this section we describe the SQL execution interface available from an Engine instance. Note that when using the Object Relational Mapper (ORM) as well as when dealing with with "bound" metadata objects, SQLAlchemy deals with the Engine and Connections for you and you generally don't need to know much about it; in those cases, you can skip this section and go to Database Meta Data. "Bound" metadata is described in Binding MetaData to an Engine.

The Engine provides a connect() method which returns a Connection object. Connection is a proxy object which maintains a reference to a DBAPI connection instance. This object provides methods by which literal SQL text as well as SQL clause constructs can be compiled and executed.

engine = create_engine('sqlite:///:memory:')
connection = engine.connect()
result = connection.execute("select * from mytable where col1=:col1", col1=5)
for row in result:
    print row['col1'], row['col2']
connection.close()

The close method on Connection does not actually remove the underlying connection to the database, but rather indicates that the underlying resources can be returned to the connection pool. When using the connect() method, the DBAPI connection referenced by the Connection object is not referenced anywhere else.

In both execution styles above, the Connection object will also automatically return its resources to the connection pool when the object is garbage collected, i.e. its __del__() method is called. When using the standard C implementation of Python, this method is usually called immediately as soon as the object is dereferenced. With other Python implementations such as Jython, this is not so guaranteed.

The execute method on Engine and Connection can also receive SQL clause constructs as well, which are described in Constructing SQL Queries via Python Expressions:

connection = engine.connect()
result = connection.execute(select([table1], table1.c.col1==5))
for row in result:
    print row['col1'], row['col2']
connection.close()

Both Connection and Engine fulfill an interface known as Connectable which specifies common functionality between the two objects, such as getting a Connection and executing queries. Therefore, most SQLAlchemy functions which take an Engine as a parameter with which to execute SQL will also accept a Connection (and the name of the argument is typically called connectable):

Specify Engine or Connection
engine = create_engine('sqlite:///:memory:')

# specify some Table metadata
metadata = MetaData()
table = Table('sometable', metadata, Column('col1', Integer))

# create the table with the Engine
table.create(connectable=engine)

# drop the table with a Connection off the Engine
connection = engine.connect()
table.drop(connectable=connection)

Connection facts:

  • the Connection object is not threadsafe. While a Connection can be shared among threads using properly synchronized access, this is also not recommended as many DBAPIs have issues with, if not outright disallow, sharing of connection state between threads.
  • The Connection object represents a single dbapi connection checked out from the connection pool. In this state, the connection pool has no affect upon the connection, including its expiration or timeout state. For the connection pool to properly manage connections, connections should be returned to the connection pool (i.e. Connection.close()) whenever the connection is not in use. If your application has a need for management of multiple connections or is otherwise long running (this includes all web applications, threaded or not), don't hold a single connection open at the module level.
back to section top

Transactions

The Connection object provides a begin() method which returns a Transaction object. This object is usually used within a try/except clause so that it is guaranteed to rollback() or commit():

trans = connection.begin()
try:
    r1 = connection.execute(table1.select())
    connection.execute(table1.insert(), col1=7, col2='this is some data')
    trans.commit()
except:
    trans.rollback()
    raise

The Transaction object also handles "nested" behavior by keeping track of the outermost begin/commit pair. In this example, two functions both issue a transaction on a Connection, but only the outermost Transaction object actually takes effect when it is committed.

# method_a starts a transaction and calls method_b
def method_a(connection):
    trans = connection.begin() # open a transaction
    try:
        method_b(connection)
        trans.commit()  # transaction is committed here
    except:
        trans.rollback() # this rolls back the transaction unconditionally
        raise

# method_b also starts a transaction
def method_b(connection):
    trans = connection.begin() # open a transaction - this runs in the context of method_a's transaction
    try:
        connection.execute("insert into mytable values ('bat', 'lala')")
        connection.execute(mytable.insert(), col1='bat', col2='lala')
        trans.commit()  # transaction is not committed yet
    except:
        trans.rollback() # this rolls back the transaction unconditionally
        raise

# open a Connection and call method_a
conn = engine.connect()                
method_a(conn)
conn.close()

Above, method_a is called first, which calls connection.begin(). Then it calls method_b. When method_b calls connection.begin(), it just increments a counter that is decremented when it calls commit(). If either method_a or method_b calls rollback(), the whole transaction is rolled back. The transaction is not committed until method_a calls the commit() method.

Note that SQLAlchemy's Object Relational Mapper also provides a way to control transaction scope at a higher level; this is described in SessionTransaction.

Transaction Facts:

  • the Transaction object, just like its parent Connection, is not threadsafe.
back to section top

Implicit Execution

Implicit execution refers to the execution of SQL without the explicit usage of a Connection object. This occurs when you call the execute() method off of an Engine object or off of a SQL expression or table that is associated with "bound" metadata.

Implicit Execution Using Engine
engine = create_engine('sqlite:///:memory:')
result = engine.execute("select * from mytable where col1=:col1", col1=5)
for row in result:
    print row['col1'], row['col2']
result.close()

Using "bound" metadata:

Implicit Execution Using Engine-Bound SQL Construct
engine = create_engine('sqlite:///:memory:')
meta = BoundMetaData(engine)
table = Table('mytable', meta, Column('col1', Integer), Column('col2', String(20)))
r = table.insert().execute(col1=5, col2='some record')

Notice in the above two examples, no connect() method is ever called nor do we ever see a Connection anywhere; the Connection is created for you automatically via the execute() method, and a handle to the execution's cursor remains open in the returned result set. When the result set is closed via the close() method, or if the result set object falls out of scope and is garbage collected, the underlying cursor is closed, the Connection is discarded and the underlying DBAPI connection is returned to the connection pool.

The purpose of the "implicit" connection is strictly one of convenience; while in SQLAlchemy 0.1 it was the only style of operation, it is now optional.

Implicit Execution Strategies

The internal behavior of engine during implicit execution can be affected by the strategy keyword argument to create_engine(). Generally this setting can be left at its default value of plain. However, for the advanced user, the threadlocal option can provide the service of managing connections against the current thread in which they were pulled from the connection pool, where the same underlying DBAPI connection as well as a single database-level transaction can then be shared by many operations without explicitly passing a Connection or Transaction object around. It also may reduce the number of connections checked out from the connection pool at a given time.

Note that this setting does not affect the fact that Connection and Transaction objects are not threadsafe. The "threadlocal" strategy affects the selection of DBAPI connections which are pulled from the connection pool when a Connection object is created, but does not synchronize method access to the Connection or Transaction instances themselves, which are only proxy objects. It is instead intended that many Connection instances would share access to a single "connection" object that is referenced in relation to the current thread.

When strategy is set to plain, each implicit execution requests a unique connection from the connection pool, which is returned to the pool when the resulting ResultProxy falls out of scope (i.e. __del__() is called) or its close() method is called. If a second implicit execution occurs while the ResultProxy from the previous execution is still open, then a second connection is pulled from the pool.

When strategy is set to threadlocal, the Engine still checks out a connection which is closeable in the same manner via the ResultProxy, except the connection it checks out will be the same connection as one which is already checked out, assuming the operation is in the same thread. When all ResultProxy objects are closed in a particular thread, the connection is returned to the pool normally.

An additional feature of the threadlocal selection is that Transaction objects can be managed implicitly as well, by calling the begin(),commit() and rollback() methods off of the Engine, or by using Transaction objects from the thread-local connection.

It is crucial to note that the plain and threadlocal contexts do not impact the connect() method on the Engine. connect() always returns a unique connection. Implicit connections use a different method off of Engine for their operations called contextual_connect().

By default, every call to execute pulls a dedicated DBAPI connection from the connection pool:

Plain Strategy
db = create_engine('mysql://localhost/test', strategy='plain')

# execute one statement and receive results.  r1 now references a DBAPI connection resource.
r1 = db.execute("select * from table1")

# execute a second statement and receive results.  r2 now references a *second* DBAPI connection resource.
r2 = db.execute("select * from table2")
for row in r1:
    ...
for row in r2:
    ...
# release connection 1
r1.close()

# release connection 2
r2.close()

Using the "threadlocal" strategy, all calls to execute within the same thread will be guaranteed to use the same underlying DBAPI connection, which is only returned to the connection pool when all ResultProxy instances have been closed.

Threadlocal Strategy
db = create_engine('mysql://localhost/test', strategy='threadlocal')

# execute one statement and receive results.  r1 now references a DBAPI connection resource.
r1 = db.execute("select * from table1")

# execute a second statement and receive results.  r2 now references the *same* resource as r1
r2 = db.execute("select * from table2")

for row in r1:
    ...
for row in r2:
    ...
# dereference r1.  the connection is still held by r2.
r1 = None

# dereference r2.  with no more references to the underlying connection resources, they
# are returned to the pool.
r2 = None

To get at the actual Connection object which is used by implicit executions, call the contextual_connection() method on Engine:

Contextual Connection
# threadlocal strategy
db = create_engine('mysql://localhost/test', strategy='threadlocal')

conn1 = db.contextual_connection()
conn2 = db.contextual_connection()

>>> conn1.connection is conn2.connection
True

When the plain strategy is used, the contextual_connection() method is synonymous with the connect() method; both return a distinct connection from the pool.

One programming pattern that the threadlocal strategy supports is transparent connection and transaction sharing.

threadlocal connection sharing
db = create_engine('mysql://localhost/test', strategy='threadlocal')

def dosomethingimplicit():
    table1.execute("some sql")
    table1.execute("some other sql")

def dosomethingelse():
    table2.execute("some sql")
    conn = db.contextual_connection()
    # do stuff with conn
    conn.execute("some other sql")
    conn.close()

def dosomethingtransactional():
    conn = db.contextual_connection()
    trans = conn.begin()
     # do stuff
    trans.commit()

db.create_transaction()
try:
    dosomethingimplicit()
    dosomethingelse()
    dosomethingtransactional()
    db.commit()
except:
    db.rollback()

In the above example, the program calls three functions dosomethingimplicit(), dosomethingelse() and dosomethingtransactional(). In all three functions, either implicit execution is used, or an explicit Connection is used via the contextual_connection() method. This indicates that they all will share the same underlying dbapi connection as well as the same parent Transaction instance, which is created in the main body of the program via the call to db.create_transaction(). So while there are several calls that return "new" Transaction or Connection objects, in reality only one "real" connection is ever used, and there is only one transaction (i.e. one begin/commit pair) executed.

back to section top

Describing Databases with MetaData

The core of SQLAlchemy's query and object mapping operations is database metadata, which are Python objects that describe tables and other schema-level objects. Metadata objects can be created by explicitly naming the various components and their properties, using the Table, Column, ForeignKey, Index, and Sequence objects imported from sqlalchemy.schema. There is also support for reflection, which means you only specify the name of the entities and they are recreated from the database automatically.

A collection of metadata entities is stored in an object aptly named MetaData. This object takes an optional name parameter:

from sqlalchemy import *

metadata = MetaData(name='my metadata')

Then to construct a Table, use the Table class:

users = Table('users', metadata, 
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(16), nullable = False),
    Column('email_address', String(60), key='email'),
    Column('password', String(20), nullable = False)
)

user_prefs = Table('user_prefs', metadata, 
    Column('pref_id', Integer, primary_key=True),
    Column('user_id', Integer, ForeignKey("users.user_id"), nullable=False),
    Column('pref_name', String(40), nullable=False),
    Column('pref_value', String(100))
)

The specific datatypes for each Column, such as Integer, String, etc. are described in The Types System, and exist within the module sqlalchemy.types as well as the global sqlalchemy namespace.

Foreign keys are most easily specified by the ForeignKey object within a Column object. For a composite foreign key, i.e. a foreign key that contains multiple columns referencing multiple columns to a composite primary key, an explicit syntax is provided which allows the correct table CREATE statements to be generated:

# a table with a composite primary key
invoices = Table('invoices', metadata, 
    Column('invoice_id', Integer, primary_key=True),
    Column('ref_num', Integer, primary_key=True),
    Column('description', String(60), nullable=False)
)

# a table with a composite foreign key referencing the parent table
invoice_items = Table('invoice_items', metadata, 
    Column('item_id', Integer, primary_key=True),
    Column('item_name', String(60), nullable=False),
    Column('invoice_id', Integer, nullable=False),
    Column('ref_num', Integer, nullable=False),
    ForeignKeyConstraint(['invoice_id', 'ref_num'], ['invoices.invoice_id', 'invoices.ref_num'])
)

Above, the invoice_items table will have ForeignKey objects automatically added to the invoice_id and ref_num Column objects as a result of the additional ForeignKeyConstraint object.

The MetaData object supports some handy methods, such as getting a list of Tables in the order (or reverse) of their dependency:

>>> for t in metadata.table_iterator(reverse=False):
...    print t.name
users
user_prefs

And Table provides an interface to the table's properties as well as that of its columns:

employees = Table('employees', metadata, 
    Column('employee_id', Integer, primary_key=True),
    Column('employee_name', String(60), nullable=False, key='name'),
    Column('employee_dept', Integer, ForeignKey("departments.department_id"))
)

# access the column "EMPLOYEE_ID":
employees.columns.employee_id

# or just
employees.c.employee_id

# via string
employees.c['employee_id']

# iterate through all columns
for c in employees.c:
    # ...

# get the table's primary key columns
for primary_key in employees.primary_key:
    # ...

# get the table's foreign key objects:
for fkey in employees.foreign_keys:
    # ...

# access the table's MetaData:
employees.metadata

# access the table's Engine, if its MetaData is bound:
employees.engine

# access a column's name, type, nullable, primary key, foreign key
employees.c.employee_id.name
employees.c.employee_id.type
employees.c.employee_id.nullable
employees.c.employee_id.primary_key
employees.c.employee_dept.foreign_key

# get the "key" of a column, which defaults to its name, but can 
# be any user-defined string:
employees.c.name.key

# access a column's table:
employees.c.employee_id.table is employees
>>> True

# get the table related by a foreign key
fcolumn = employees.c.employee_dept.foreign_key.column.table

Binding MetaData to an Engine

A MetaData object can be associated with one or more Engine instances. This allows the MetaData and the elements within it to perform operations automatically, using the connection resources of that Engine. This includes being able to "reflect" the columns of tables, as well as to perform create and drop operations without needing to pass an Engine or Connection around. It also allows SQL constructs to be created which know how to execute themselves (called "implicit execution").

To bind MetaData to a single Engine, use BoundMetaData:

engine = create_engine('sqlite://', **kwargs)

# create BoundMetaData from an Engine
meta = BoundMetaData(engine)

# create the Engine and MetaData in one step
meta = BoundMetaData('postgres://db/', **kwargs)

Another form of MetaData exists which allows connecting to any number of engines, within the context of the current thread. This is DynamicMetaData:

meta = DynamicMetaData()

meta.connect(engine)    # connect to an existing Engine

meta.connect('mysql://user@host/dsn')   # create a new Engine and connect

DynamicMetaData is ideal for applications that need to use the same set of Tables for many different database connections in the same process, such as a CherryPy web application which handles multiple application instances in one process.

back to section top

Using the global Metadata object

Some users prefer to create Table objects without specifying a MetaData object, having Tables scoped on an application-wide basis. For them the default_metadata object and the global_connect() function is supplied. default_metadata is simply an instance of DynamicMetaData that exists within the sqlalchemy namespace, and global_connect() is a synonym for default_metadata.connect(). Defining a Table that has no MetaData argument will automatically use this default metadata as follows:

from sqlalchemy import *

# a Table with just a name and its Columns
mytable = Table('mytable', 
    Column('col1', Integer, primary_key=True),
    Column('col2', String(40))
    )

# connect all the "anonymous" tables to a postgres uri in the current thread    
global_connect('postgres://foo:bar@lala/test')

# create all tables in the default metadata
default_metadata.create_all()

# the table is bound
mytable.insert().execute(col1=5, col2='some value')
back to section top

Reflecting Tables

Once you have a BoundMetaData or a connected DynamicMetaData, you can create Table objects without specifying their columns, just their names, using autoload=True:

>>> messages = Table('messages', meta, autoload = True)
>>> [c.name for c in messages.columns]
['message_id', 'message_name', 'date']

At the moment the Table is constructed, it will query the database for the columns and constraints of the messages table.

Note that if a reflected table has a foreign key referencing another table, then the metadata for the related table will be loaded as well, even if it has not been defined by the application:

>>> shopping_cart_items = Table('shopping_cart_items', meta, autoload = True)
>>> print shopping_cart_items.c.cart_id.table.name
shopping_carts

To get direct access to 'shopping_carts', simply instantiate it via the Table constructor. You'll get the same instance of the shopping cart Table as the one that is attached to shopping_cart_items:

>>> shopping_carts = Table('shopping_carts', meta)
>>> shopping_carts is shopping_cart_items.c.cart_id.table
True

This works because when the Table constructor is called for a particular name and MetaData object, if the table has already been created then the instance returned will be the same as the original. This is a singleton constructor:

>>> news_articles = Table('news', meta, 
... Column('article_id', Integer, primary_key = True),
... Column('url', String(250), nullable = False)
... )
>>> othertable = Table('news', meta)
>>> othertable is news_articles
True

Overriding Reflected Columns

Individual columns can be overridden with explicit values when reflecting tables; this is handy for specifying custom datatypes, constraints such as primary keys that may not be configured within the database, etc.

>>> mytable = Table('mytable', meta,
... Column('id', Integer, primary_key=True),   # override reflected 'id' to have primary key
... Column('mydata', Unicode(50)),    # override reflected 'mydata' to be Unicode
... autoload=True)
back to section top

Specifying the Schema Name

Some databases support the concept of multiple schemas. A Table can reference this by specifying the schema keyword argument:

financial_info = Table('financial_info', meta,
    Column('id', Integer, primary_key=True),
    Column('value', String(100), nullable=False),
    schema='remote_banks'
)

Within the MetaData collection, this table will be identified by the combination of financial_info and remote_banks. If another table called financial_info is referenced without the remote_banks schema, it will refer to a different Table. ForeignKey objects can reference columns in this table using the form remote_banks.financial_info.id.

back to section top

ON UPDATE and ON DELETE

ON UPDATE and ON DELETE clauses to a table create are specified within the ForeignKeyConstraint object, using the onupdate and ondelete keyword arguments:

foobar = Table('foobar', meta,
    Column('id', Integer, primary_key=True),
    Column('lala', String(40)),
    ForeignKeyConstraint(['lala'],['hoho.lala'], onupdate="CASCADE", ondelete="CASCADE"))

Note that these clauses are not supported on SQLite, and require InnoDB tables when used with MySQL. They may also not be supported on other databases.

back to section top

Enabling Table / Column Quoting

Feature Status: Alpha Implementation

Many table, schema, or column names require quoting to be enabled. Reasons for this include names that are the same as a database reserved word, or for identifiers that use MixedCase, where the database would normally "fold" the case convention into lower or uppercase (such as Postgres). SQLAlchemy will attempt to automatically determine when quoting should be used. It will determine a value for every identifier name called case_sensitive, which defaults to False if the identifer name uses no uppercase letters, or True otherwise. This flag may be explicitly set on any schema item as well (schema items include Table, Column, MetaData, Sequence, etc.) to override this default setting, where objects will inherit the setting from an enclosing object if not explicitly overridden.

When case_sensitive is True, the dialect will do what it has to in order for the database to recognize the casing. For Postgres and Oracle, this means using quoted identifiers.

Identifiers that match known SQL reserved words (such as "asc", "union", etc.) will also be quoted according to the dialect's quoting convention regardless of the case_sensitive setting.

To force quoting for an identifier, set the "quote=True" flag on Column or Table, as well as the quote_schema=True flag for Table.

table2 = Table('WorstCase2', metadata,
    # desc is a reserved word, which will be quoted.
    Column('desc', Integer, primary_key=True),

    # if using a reserved word which SQLAlchemy doesn't know about,
    # specify quote=True
    Column('some_reserved_word', Integer, quote=True, primary_key=True),

    # MixedCase uses a mixed case convention. 
    # it will be automatically quoted since it is case sensitive
    Column('MixedCase', Integer),

    # Union is both a reserved word and mixed case
    Column('Union', Integer),

    # normal_column doesnt require quoting
    Column('normal_column', String(30)))

# to use tables where case_sensitive is False by default regardless
# of idenfifier casings, set "case_sensitive" to false at any level
# (or true to force case sensitive for lowercase identifiers as well)
lowercase_metadata = MetaData(case_sensitive=False)
back to section top

Other Options

Tables may support database-specific options, such as MySQL's engine option that can specify "MyISAM", "InnoDB", and other backends for the table:

addresses = Table('engine_email_addresses', meta,
    Column('address_id', Integer, primary_key = True),
    Column('remote_user_id', Integer, ForeignKey(users.c.user_id)),
    Column('email_address', String(20)),
    mysql_engine='InnoDB'
)
back to section top

Creating and Dropping Database Tables

Creating and dropping individual tables can be done via the create() and drop() methods of Table; these methods take an optional connectable parameter which references an Engine or a Connection. If not supplied, the Engine bound to the MetaData will be used, else an error is raised:

meta = BoundMetaData('sqlite:///:memory:')
employees = Table('employees', meta, 
    Column('employee_id', Integer, primary_key=True),
    Column('employee_name', String(60), nullable=False, key='name'),
    Column('employee_dept', Integer, ForeignKey("departments.department_id"))
)
sqlemployees.create()

drop() method:

sqlemployees.drop(connectable=e)

The create() and drop() methods also support an optional keyword argument checkfirst which will issue the database's appropriate pragma statements to check if the table exists before creating or dropping:

employees.create(connectable=e, checkfirst=True)
employees.drop(checkfirst=False)

Entire groups of Tables can be created and dropped directly from the MetaData object with create_all() and drop_all(). These methods always check for the existence of each table before creating or dropping. Each method takes an optional connectable keyword argument which can reference an Engine or a Connection. If no engine is specified, the underlying bound Engine, if any, is used:

engine = create_engine('sqlite:///:memory:')

metadata = MetaData()

users = Table('users', metadata, 
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(16), nullable = False),
    Column('email_address', String(60), key='email'),
    Column('password', String(20), nullable = False)
)

user_prefs = Table('user_prefs', metadata, 
    Column('pref_id', Integer, primary_key=True),
    Column('user_id', Integer, ForeignKey("users.user_id"), nullable=False),
    Column('pref_name', String(40), nullable=False),
    Column('pref_value', String(100))
)

sqlmetadata.create_all(connectable=engine)
back to section top

Column Defaults and OnUpdates

SQLAlchemy includes flexible constructs in which to create default values for columns upon the insertion of rows, as well as upon update. These defaults can take several forms: a constant, a Python callable to be pre-executed before the SQL is executed, a SQL expression or function to be pre-executed before the SQL is executed, a pre-executed Sequence (for databases that support sequences), or a "passive" default, which is a default function triggered by the database itself upon insert, the value of which can then be post-fetched by the engine, provided the row provides a primary key in which to call upon.

Pre-Executed Insert Defaults

A basic default is most easily specified by the "default" keyword argument to Column. This defines a value, function, or SQL expression that will be pre-executed to produce the new value, before the row is inserted:

# a function to create primary key ids
i = 0
def mydefault():
    global i
    i += 1
    return i

t = Table("mytable", meta, 
    # function-based default
    Column('id', Integer, primary_key=True, default=mydefault),

    # a scalar default
    Column('key', String(10), default="default")
)

The "default" keyword can also take SQL expressions, including select statements or direct function calls:

t = Table("mytable", meta, 
    Column('id', Integer, primary_key=True),

    # define 'create_date' to default to now()
    Column('create_date', DateTime, default=func.now()),

    # define 'key' to pull its default from the 'keyvalues' table
    Column('key', String(20), default=keyvalues.select(keyvalues.c.type='type1', limit=1))
    )

The "default" keyword argument is shorthand for using a ColumnDefault object in a column definition. This syntax is optional, but is required for other types of defaults, futher described below:

Column('mycolumn', String(30), ColumnDefault(func.get_data()))
back to section top

Pre-Executed OnUpdate Defaults

Similar to an on-insert default is an on-update default, which is most easily specified by the "onupdate" keyword to Column, which also can be a constant, plain Python function or SQL expression:

t = Table("mytable", meta, 
    Column('id', Integer, primary_key=True),

    # define 'last_updated' to be populated with current_timestamp (the ANSI-SQL version of now())
    Column('last_updated', DateTime, onupdate=func.current_timestamp()),
)

To use an explicit ColumnDefault object to specify an on-update, use the "for_update" keyword argument:

Column('mycolumn', String(30), ColumnDefault(func.get_data(), for_update=True))
back to section top

Inline Default Execution: PassiveDefault

A PassiveDefault indicates an column default that is executed upon INSERT by the database. This construct is used to specify a SQL function that will be specified as "DEFAULT" when creating tables.

t = Table('test', meta, 
    Column('mycolumn', DateTime, PassiveDefault(text("sysdate")))
)

A create call for the above table will produce:

CREATE TABLE test (
    mycolumn datetime default sysdate
)

PassiveDefault also sends a message to the Engine that data is available after an insert. The object-relational mapper system uses this information to post-fetch rows after the insert, so that instances can be refreshed with the new data. Below is a simplified version:

# table with passive defaults
mytable = Table('mytable', engine, 
    Column('my_id', Integer, primary_key=True),

    # an on-insert database-side default
    Column('data1', Integer, PassiveDefault(text("d1_func()"))),
)
# insert a row
r = mytable.insert().execute(name='fred')

# check the result: were there defaults fired off on that row ?
if r.lastrow_has_defaults():
    # postfetch the row based on primary key.
    # this only works for a table with primary key columns defined
    primary_key = r.last_inserted_ids()
    row = table.select(table.c.id == primary_key[0])

When Tables are reflected from the database using autoload=True, any DEFAULT values set on the columns will be reflected in the Table object as PassiveDefault instances.

The Catch: Postgres Primary Key Defaults always Pre-Execute

Current Postgres support does not rely upon OID's to determine the identity of a row. This is because the usage of OIDs has been deprecated with Postgres and they are disabled by default for table creates as of PG version 8. Pyscopg2's "cursor.lastrowid" function only returns OIDs. Therefore, when inserting a new row which has passive defaults set on the primary key columns, the default function is still pre-executed since SQLAlchemy would otherwise have no way of retrieving the row just inserted.

back to section top

Defining Sequences

A table with a sequence looks like:

table = Table("cartitems", meta, 
    Column("cart_id", Integer, Sequence('cart_id_seq'), primary_key=True),
    Column("description", String(40)),
    Column("createdate", DateTime())
)

The Sequence is used with Postgres or Oracle to indicate the name of a database sequence that will be used to create default values for a column. When a table with a Sequence on a column is created in the database by SQLAlchemy, the database sequence object is also created. Similarly, the database sequence is dropped when the table is dropped. Sequences are typically used with primary key columns. When using Postgres, if an integer primary key column defines no explicit Sequence or other default method, SQLAlchemy will create the column with the SERIAL keyword, and will pre-execute a sequence named "tablename_columnname_seq" in order to retrieve new primary key values, if they were not otherwise explicitly stated. Oracle, which has no "auto-increment" keyword, requires that a Sequence be created for a table if automatic primary key generation is desired.

A Sequence object can be defined on a Table that is then used for a non-sequence-supporting database. In that case, the Sequence object is simply ignored. Note that a Sequence object is entirely optional for all databases except Oracle, as other databases offer options for auto-creating primary key values, such as AUTOINCREMENT, SERIAL, etc. SQLAlchemy will use these default methods for creating primary key values if no Sequence is present on the table metadata.

A sequence can also be specified with optional=True which indicates the Sequence should only be used on a database that requires an explicit sequence, and not those that supply some other method of providing integer values. At the moment, it essentially means "use this sequence only with Oracle and not Postgres".

back to section top

Defining Constraints and Indexes

UNIQUE Constraint

Unique constraints can be created anonymously on a single column using the unique keyword on Column. Explicitly named unique constraints and/or those with multiple columns are created via the UniqueConstraint table-level construct.

meta = MetaData()
mytable = Table('mytable', meta,

    # per-column anonymous unique constraint
    Column('col1', Integer, unique=True),

    Column('col2', Integer),
    Column('col3', Integer),

    # explicit/composite unique constraint.  'name' is optional.
    UniqueConstraint('col2', 'col3', name='uix_1')
    )
back to section top

CHECK Constraint

Check constraints can be named or unnamed and can be created at the Column or Table level, using the CheckConstraint construct. The text of the check constraint is passed directly through to the database, so there is limited "database independent" behavior. Column level check constraints generally should only refer to the column to which they are placed, while table level constraints can refer to any columns in the table.

Note that some databases do not actively support check constraints such as MySQL and sqlite.

meta = MetaData()
mytable = Table('mytable', meta,

    # per-column CHECK constraint
    Column('col1', Integer, CheckConstraint('col1>5')),

    Column('col2', Integer),
    Column('col3', Integer),

    # table level CHECK constraint.  'name' is optional.
    CheckConstraint('col2 > col3 + 5', name='check1')
    )
back to section top

Indexes

Indexes can be created anonymously (using an auto-generated name "ix_") for a single column using the inline index keyword on Column, which also modifies the usage of unique to apply the uniqueness to the index itself, instead of adding a separate UNIQUE constraint. For indexes with specific names or which encompass more than one column, use the Index construct, which requires a name.

Note that the Index construct is created externally to the table which it corresponds, using Column objects and not strings.

meta = MetaData()
mytable = Table('mytable', meta,
    # an indexed column, with index "ix_mytable_col1"
    Column('col1', Integer, index=True),

    # a uniquely indexed column with index "ix_mytable_col2"
    Column('col2', Integer, index=True, unique=True),

    Column('col3', Integer),
    Column('col4', Integer),

    Column('col5', Integer),
    Column('col6', Integer),
    )

# place an index on col3, col4
Index('idx_col34', mytable.c.col3, mytable.c.col4)

# place a unique index on col5, col6
Index('myindex', mytable.c.col5, mytable.c.col6, unique=True)

The Index objects will be created along with the CREATE statements for the table itself. An index can also be created on its own independently of the table:

# create a table
sometable.create()

# define an index
i = Index('someindex', sometable.c.col5)

# create the index, will use the table's connectable, or specify the connectable keyword argument
i.create()
back to section top

Adapting Tables to Alternate Metadata

A Table object created against a specific MetaData object can be re-created against a new MetaData using the tometadata method:

# create two metadata
meta1 = BoundMetaData('sqlite:///querytest.db')
meta2 = MetaData()

# load 'users' from the sqlite engine
users_table = Table('users', meta1, autoload=True)

# create the same Table object for the plain metadata
users_table_2 = users_table.tometadata(meta2)
back to section top

Note: This section describes how to use SQLAlchemy to construct SQL queries and receive result sets. It does not cover the object relational mapping capabilities of SQLAlchemy; that is covered later on in Data Mapping. However, both areas of functionality work similarly in how selection criterion is constructed, so if you are interested just in ORM, you should probably skim through basic WHERE Clause construction before moving on.

Once you have used the sqlalchemy.schema module to construct your tables and/or reflect them from the database, performing SQL queries using those table meta data objects is done via the sqlalchemy.sql package. This package defines a large set of classes, each of which represents a particular kind of lexical construct within a SQL query; all are descendants of the common base class sqlalchemy.sql.ClauseElement. A full query is represented via a structure of ClauseElements. A set of reasonably intuitive creation functions is provided by the sqlalchemy.sql package to create these structures; these functions are described in the rest of this section.

Executing a ClauseElement structure can be performed in two general ways. You can use an Engine or a Connection object's execute() method to which you pass the query structure; this is known as explicit style. Or, if the ClauseElement structure is built upon Table metadata which is bound to an Engine directly, you can simply call execute() on the structure itself, known as implicit style. In both cases, the execution returns a cursor-like object (more on that later). The same clause structure can be executed repeatedly. The ClauseElement is compiled into a string representation by an underlying Compiler object which is associated with the Engine via its Dialect.

The examples below all include a dump of the generated SQL corresponding to the query object, as well as a dump of the statement's bind parameters. In all cases, bind parameters are shown as named parameters using the colon format (i.e. ':name'). When the statement is compiled into a database-specific version, the named-parameter statement and its bind values are converted to the proper paramstyle for that database automatically.

For this section, we will mostly use the implcit style of execution, meaning the Table objects are associated with an instance of BoundMetaData, and constructed ClauseElement objects support self-execution. Assume the following configuration:

from sqlalchemy import *
metadata = BoundMetaData('sqlite:///mydb.db', echo=True)

# a table to store users
users = Table('users', metadata,
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(40)),
    Column('password', String(80))
)

# a table that stores mailing addresses associated with a specific user
addresses = Table('addresses', metadata,
    Column('address_id', Integer, primary_key = True),
    Column('user_id', Integer, ForeignKey("users.user_id")),
    Column('street', String(100)),
    Column('city', String(80)),
    Column('state', String(2)),
    Column('zip', String(10))
)

# a table that stores keywords
keywords = Table('keywords', metadata,
    Column('keyword_id', Integer, primary_key = True),
    Column('name', VARCHAR(50))
)

# a table that associates keywords with users
userkeywords = Table('userkeywords', metadata,
    Column('user_id', INT, ForeignKey("users")),
    Column('keyword_id', INT, ForeignKey("keywords"))
)

Simple Select

A select is done by constructing a Select object with the proper arguments, adding any extra arguments if desired, then calling its execute() method.

Basic Select
from sqlalchemy import *

# use the select() function defined in the sql package
s = select([users])

# or, call the select() method off of a Table object
s = users.select()

# then, call execute on the Select object:
sqlresult = s.execute()
# the SQL text of any clause object can also be viewed via the str() call:
>>> str(s)
SELECT users.user_id, users.user_name, users.password FROM users

Explicit Execution

As mentioned above, ClauseElement structures can also be executed with a Connection object explicitly:

engine = create_engine('sqlite:///myfile.db')
conn = engine.connect()

s = users.select()
sqlresult = conn.execute(s)
conn.close()
back to section top

Binding ClauseElements to Engines

For queries that don't contain any tables, ClauseElements that represent a fully executeable statement support an engine keyword parameter which can bind the object to an Engine, thereby allowing implicit execution:

# select a literal
sqlselect(["current_time"], engine=myengine).execute()
# select a function
sqlselect([func.now()], engine=db).execute()
back to section top

Getting Results

The object returned by execute() is a sqlalchemy.engine.ResultProxy object, which acts much like a DBAPI cursor object in the context of a result set, except that the rows returned can address their columns by ordinal position, column name, or even column object:

Using the ResultProxy
# select rows, get resulting ResultProxy object
sqlresult = users.select().execute()
# get one row
row = result.fetchone()

# get the 'user_id' column via integer index:
user_id = row[0]

# or column name
user_name = row['user_name']

# or column object
password = row[users.c.password]

# or column accessor
password = row.password

# ResultProxy object also supports fetchall()
rows = result.fetchall()

# or get the underlying DBAPI cursor object
cursor = result.cursor

# close the result.  If the statement was implicitly executed 
# (i.e. without an explicit Connection), this will
# return the underlying connection resources back to 
# the connection pool.  de-referencing the result
# will also have the same effect.  if an explicit Connection was 
# used, then close() just closes the underlying cursor object.
result.close()
back to section top

Using Column Labels

A common need when writing statements that reference multiple tables is to create labels for columns, thereby separating columns from different tables with the same name. The Select construct supports automatic generation of column labels via the use_labels=True parameter:

use_labels Flag
sqlc = select([users, addresses], 
users.c.user_id==addresses.c.address_id, 
use_labels=True).execute()

The table name part of the label is affected if you use a construct such as a table alias:

use_labels with an Alias
person = users.alias('person')
sqlc = select([person, addresses], 
    person.c.user_id==addresses.c.address_id, 
    use_labels=True).execute()

Labels are also generated in such a way as to never go beyond 30 characters. Most databases support a limit on the length of symbols, such as Postgres, and particularly Oracle which has a rather short limit of 30:

use_labels Generates Abbreviated Labels
long_named_table = users.alias('this_is_the_person_table')
sqlc = select([long_named_table], use_labels=True).execute()

You can also specify custom labels on a per-column basis using the label() function:

label() Function on Column
sqlc = select([users.c.user_id.label('id'), 
           users.c.user_name.label('name')]).execute()
back to section top

Table/Column Specification

Calling select off a table automatically generates a column clause which includes all the table's columns, in the order they are specified in the source Table object.

But in addition to selecting all the columns off a single table, any set of columns can be specified, as well as full tables, and any combination of the two:

Specify Columns to Select
# individual columns
sqlc = select([users.c.user_id, users.c.user_name]).execute()
# full tables
sqlc = select([users, addresses]).execute()
# combinations
sqlc = select([users, addresses.c.zip]).execute()
back to section top

WHERE Clause


The WHERE condition is the named keyword argument whereclause, or the second positional argument to the select() constructor and the first positional argument to the select() method of Table.

WHERE conditions are constructed using column objects, literal values, and functions defined in the sqlalchemy.sql module. Column objects override the standard Python operators to provide clause compositional objects, which compile down to SQL operations:

Basic WHERE Clause
sqlc = users.select(users.c.user_id == 7).execute()

Notice that the literal value "7" was broken out of the query and placed into a bind parameter. Databases such as Oracle must parse incoming SQL and create a "plan" when new queries are received, which is an expensive process. By using bind parameters, the same query with various literal values can have its plan compiled only once, and used repeatedly with less overhead.

More where clauses:

# another comparison operator
sqlc = select([users], users.c.user_id>7).execute()
# OR keyword
sqlc = users.select(or_(users.c.user_name=='jack', users.c.user_name=='ed')).execute()
# AND keyword
sqlc = users.select(and_(users.c.user_name=='jack', users.c.password=='dog')).execute()
# NOT keyword
sqlc = users.select(not_(
        or_(users.c.user_name=='jack', users.c.password=='dog')
    )).execute()
# IN clause
sqlc = users.select(users.c.user_name.in_('jack', 'ed', 'fred')).execute()
# join users and addresses together
sqlc = select([users, addresses], users.c.user_id==addresses.c.address_id).execute()
# join users and addresses together, but dont specify "addresses" in the 
# selection criterion.  The WHERE criterion adds it to the FROM list 
# automatically.
sqlc = select([users], and_(
                users.c.user_id==addresses.c.user_id,
                users.c.user_name=='fred'
            )).execute()

Select statements can also generate a WHERE clause based on the parameters you give it. If a given parameter, which matches the name of a column or its "label" (the combined tablename + "_" + column name), and does not already correspond to a bind parameter in the select object, it will be added as a comparison against that column. This is a shortcut to creating a full WHERE clause:

# specify a match for the "user_name" column
sqlc = users.select().execute(user_name='ed')
# specify a full where clause for the "user_name" column, as well as a
# comparison for the "user_id" column
sqlc = users.select(users.c.user_name=='ed').execute(user_id=10)

Operators

Supported column operators so far are all the numerical comparison operators, i.e. '==', '>', '>=', etc., as well as like(), startswith(), endswith(), between(), and in(). Boolean operators include not_(), and_() and or_(), which also can be used inline via '~', '&', and '|'. Math operators are '+', '-', '*', '/'. Any custom operator can be specified via the op() function shown below.

# "like" operator
users.select(users.c.user_name.like('%ter'))

# equality operator
users.select(users.c.user_name == 'jane')

# in opertator
users.select(users.c.user_id.in_(1,2,3))

# and_, endswith, equality operators
users.select(and_(addresses.c.street.endswith('green street'),
                addresses.c.zip=='11234'))

# & operator subsituting for 'and_'
users.select(addresses.c.street.endswith('green street') & (addresses.c.zip=='11234'))

# + concatenation operator
select([users.c.user_name + '_name'])

# NOT operator
users.select(~(addresses.c.street == 'Green Street'))

# any custom operator
select([users.c.user_name.op('||')('_category')])

# "null" comparison via == (converts to IS)
sqlusers.select(users.c.user_name==None).execute()
# or via explicit null() construct
sqlusers.select(users.c.user_name==null()).execute()
back to section top

Functions

Functions can be specified using the func keyword:

sqlselect([func.count(users.c.user_id)]).execute()
sqlusers.select(func.substr(users.c.user_name, 1) == 'J').execute()

Functions also are callable as standalone values:

# call the "now()" function
time = func.now(engine=myengine).scalar()

# call myfunc(1,2,3)
myvalue = func.myfunc(1, 2, 3, engine=db).execute()

# or call them off the engine
db.func.now().scalar()
back to section top

Literals

You can drop in a literal value anywhere there isnt a column to attach to via the literal keyword:

sqlselect([literal('foo') + literal('bar'), users.c.user_name]).execute()
# literals have all the same comparison functions as columns
sqlselect([literal('foo') == literal('bar')], engine=myengine).scalar()

Literals also take an optional type parameter to give literals a type. This can sometimes be significant, for example when using the "+" operator with SQLite, the String type is detected and the operator is converted to "||":

sqlselect([literal('foo', type=String) + 'bar'], engine=e).execute()
back to section top

Order By

The ORDER BY clause of a select statement can be specified as individual columns to order by within an array specified via the order_by parameter, and optional usage of the asc() and desc() functions:

# straight order by
sqlc = users.select(order_by=[users.c.user_name]).execute()
# descending/ascending order by on multiple columns
sqlc = users.select(
    users.c.user_name>'J', 
    order_by=[desc(users.c.user_id), asc(users.c.user_name)]).execute()
back to section top

DISTINCT, LIMIT and OFFSET

These are specified as keyword arguments:

sqlc = select([users.c.user_name], distinct=True).execute()
sqlc = users.select(limit=10, offset=20).execute()

The Oracle driver does not support LIMIT and OFFSET directly, but instead wraps the generated query into a subquery and uses the "rownum" variable to control the rows selected (this is somewhat experimental). Similarly, the Firebird and MSSQL drivers convert LIMIT into queries using FIRST and TOP, respectively.

back to section top

Inner and Outer Joins

As some of the examples indicated above, a regular inner join can be implicitly stated, just like in a SQL expression, by just specifying the tables to be joined as well as their join conditions:

sqladdresses.select(addresses.c.user_id==users.c.user_id).execute()

There is also an explicit join constructor, which can be embedded into a select query via the from_obj parameter of the select statement:

sqladdresses.select(from_obj=[
    addresses.join(users, addresses.c.user_id==users.c.user_id)
]).execute()

The join constructor can also be used by itself:

sqljoin(users, addresses, users.c.user_id==addresses.c.user_id).select().execute()

The join criterion in a join() call is optional. If not specified, the condition will be derived from the foreign key relationships of the two tables. If no criterion can be constructed, an exception will be raised.

sqljoin(users, addresses).select().execute()

Notice that this is the first example where the FROM criterion of the select statement is explicitly specified. In most cases, the FROM criterion is automatically determined from the columns requested as well as the WHERE clause. The from_obj keyword argument indicates a list of explicit FROM clauses to be used in the statement.

A join can be created on its own using the join or outerjoin functions, or can be created off of an existing Table or other selectable unit via the join or outerjoin methods:

sqlouterjoin(users, addresses, 
           users.c.user_id==addresses.c.address_id).select().execute()
sqlusers.select(keywords.c.name=='running', from_obj=[
        users.join(
            userkeywords, userkeywords.c.user_id==users.c.user_id).join(
                keywords, keywords.c.keyword_id==userkeywords.c.keyword_id)
        ]).execute()

Joins also provide a keyword argument fold_equivalents on the select() function which allows the column list of the resulting select to be "folded" to the minimal list of columns, based on those columns that are known to be equivalent from the "onclause" of the join. This saves the effort of constructing column lists manually in conjunction with databases like Postgres which can be picky about "ambiguous columns". In this example, only the "users.user_id" column, but not the "addresses.user_id" column, shows up in the column clause of the resulting select:

sqlusers.join(addresses).select(fold_equivalents=True).execute()

The fold_equivalents argument will recursively apply to "chained" joins as well, i.e. a.join(b).join(c)....

back to section top

Table Aliases

Aliases are used primarily when you want to use the same table more than once as a FROM expression in a statement:

address_b = addresses.alias('addressb')
sql# select users who have an address on Green street as well as Orange street
users.select(and_(
    users.c.user_id==addresses.c.user_id,
    addresses.c.street.like('%Green%'),
    users.c.user_id==address_b.c.user_id,
    address_b.c.street.like('%Orange%')
)).execute()
back to section top

Subqueries

SQLAlchemy allows the creation of select statements from not just Table objects, but from a whole class of objects that implement the Selectable interface. This includes Tables, Aliases, Joins and Selects. Therefore, if you have a Select, you can select from the Select:

>>> s = users.select()
>>> str(s)
SELECT users.user_id, users.user_name, users.password FROM users

>>> s = s.select()
>>> str(s)
SELECT user_id, user_name, password
FROM (SELECT users.user_id, users.user_name, users.password FROM users)

Any Select, Join, or Alias object supports the same column accessors as a Table:

>>> s = users.select()
>>> [c.key for c in s.columns]
['user_id', 'user_name', 'password']

When you use use_labels=True in a Select object, the label version of the column names become the keys of the accessible columns. In effect you can create your own "view objects":

s = select([users, addresses], users.c.user_id==addresses.c.user_id, use_labels=True)
sqlselect([
    s.c.users_user_name, s.c.addresses_street, s.c.addresses_zip
], s.c.addresses_city=='San Francisco').execute()

To specify a SELECT statement as one of the selectable units in a FROM clause, it usually should be given an alias.

sqls = users.select().alias('u')
select([addresses, s]).execute()

Select objects can be used in a WHERE condition, in operators such as IN:

# select user ids for all users whos name starts with a "p"
s = select([users.c.user_id], users.c.user_name.like('p%'))

# now select all addresses for those users
sqladdresses.select(addresses.c.user_id.in_(s)).execute()

The sql package supports embedding select statements into other select statements as the criterion in a WHERE condition, or as one of the "selectable" objects in the FROM list of the query. It does not at the moment directly support embedding a SELECT statement as one of the column criterion for a statement, although this can be achieved via direct text insertion, described later.

Scalar Column Subqueries

Subqueries can be used in the column clause of a select statement by specifying the scalar=True flag:

sqlselect([table2.c.col1, table2.c.col2, 
                select([table1.c.col1], table1.c.col2==7, scalar=True)])
back to section top

Correlated Subqueries

When a select object is embedded inside of another select object, and both objects reference the same table, SQLAlchemy makes the assumption that the table should be correlated from the child query to the parent query. To disable this behavior, specify the flag correlate=False to the Select statement.

# make an alias of a regular select.   
s = select([addresses.c.street], addresses.c.user_id==users.c.user_id).alias('s')
>>> str(s)
SELECT addresses.street FROM addresses, users 
WHERE addresses.user_id = users.user_id

# now embed that select into another one.  the "users" table is removed from
# the embedded query's FROM list and is instead correlated to the parent query
s2 = select([users, s.c.street])
>>> str(s2)
SELECT users.user_id, users.user_name, users.password, s.street
FROM users, (SELECT addresses.street FROM addresses
WHERE addresses.user_id = users.user_id) s

EXISTS Clauses

An EXISTS clause can function as a higher-scaling version of an IN clause, and is usually used in a correlated fashion:

# find all users who have an address on Green street:
sqlusers.select(
    exists(
        [addresses.c.address_id], 
        and_(
            addresses.c.user_id==users.c.user_id, 
            addresses.c.street.like('%Green%')
        )
    )
)
back to section top

Unions

Unions come in two flavors, UNION and UNION ALL, which are available via module level functions or methods off a Selectable:

sqlunion(
    addresses.select(addresses.c.street=='123 Green Street'),
    addresses.select(addresses.c.street=='44 Park Ave.'),
    addresses.select(addresses.c.street=='3 Mill Road'),
    order_by=[addresses.c.street]
).execute()
sqlusers.select(
    users.c.user_id==7
  ).union_all(
      users.select(
          users.c.user_id==9
      ), 
      order_by=[users.c.user_id]   # order_by is an argument to union_all()
  ).execute()
back to section top

Custom Bind Parameters

Throughout all these examples, SQLAlchemy is busy creating bind parameters wherever literal expressions occur. You can also specify your own bind parameters with your own names, and use the same statement repeatedly. The bind parameters, shown here in the "named" format, will be converted to the appropriate named or positional style according to the database implementation being used.

Custom Bind Params
s = users.select(users.c.user_name==bindparam('username'))

# execute implicitly
sqls.execute(username='fred')
# execute explicitly
conn = engine.connect()
sqlconn.execute(s, username='fred')

executemany() is also available by supplying multiple dictionary arguments instead of keyword arguments to the execute() method of ClauseElement or Connection. Examples can be found later in the sections on INSERT/UPDATE/DELETE.

Precompiling a Query

By throwing the compile() method onto the end of any query object, the query can be "compiled" by the SQLEngine into a sqlalchemy.sql.Compiled object just once, and the resulting compiled object reused, which eliminates repeated internal compilation of the SQL string:

s = users.select(users.c.user_name==bindparam('username')).compile()
s.execute(username='fred')
s.execute(username='jane')
s.execute(username='mary')
back to section top

Literal Text Blocks

The sql package tries to allow free textual placement in as many ways as possible. In the examples below, note that the from_obj parameter is used only when no other information exists within the select object with which to determine table metadata. Also note that in a query where there isnt even table metadata used, the SQLEngine to be used for the query has to be explicitly specified:

# strings as column clauses
sqlselect(["user_id", "user_name"], from_obj=[users]).execute()
# strings for full column lists
sqlselect(
        ["user_id, user_name, password, addresses.*"], 
        from_obj=[users.alias('u'), addresses]).execute()
# functions, etc.
sqlselect([users.c.user_id, "process_string(user_name)"]).execute()
# where clauses
sqlusers.select(and_(users.c.user_id==7, "process_string(user_name)=27")).execute()
# subqueries
sqlusers.select(
    "exists (select 1 from addresses where addresses.user_id=users.user_id)").execute()
# custom FROM objects
sqlselect(
        ["*"], 
        from_obj=["(select user_id, user_name from users)"], 
        engine=db).execute()
# a full query
sqltext("select user_name from users", engine=db).execute()

Using Bind Parameters in Text Blocks

Use the format ':paramname' to define bind parameters inside of a text block. They will be converted to the appropriate format upon compilation:

t = engine.text("select foo from mytable where lala=:hoho")
r = t.execute(hoho=7)

Bind parameters can also be explicit, which allows typing information to be added. Just specify them as a list with keys that match those inside the textual statement:

t = engine.text("select foo from mytable where lala=:hoho", 
                bindparams=[bindparam('hoho', type=types.String)])
    r = t.execute(hoho="im hoho")

Result-row type processing can be added via the typemap argument, which is a dictionary of return columns mapped to types:

# specify DateTime type for the 'foo' column in the result set
# sqlite, for example, uses result-row post-processing to construct dates
t = engine.text("select foo from mytable where lala=:hoho", 
        bindparams=[bindparam('hoho', type=types.String)],
        typemap={'foo':types.DateTime}
        )
r = t.execute(hoho="im hoho")

# 'foo' is a datetime
year = r.fetchone()['foo'].year
back to section top

Building Select Objects

One of the primary motivations for a programmatic SQL library is to allow the piecemeal construction of a SQL statement based on program variables. All the above examples typically show Select objects being created all at once. The Select object also includes "builder" methods to allow building up an object. The below example is a "user search" function, where users can be selected based on primary key, user name, street address, keywords, or any combination:

def find_users(id=None, name=None, street=None, keywords=None):
    statement = users.select()
    if id is not None:
        statement.append_whereclause(users.c.user_id==id)
    if name is not None:
        statement.append_whereclause(users.c.user_name==name)
    if street is not None:
        # append_whereclause joins "WHERE" conditions together with AND
        statement.append_whereclause(users.c.user_id==addresses.c.user_id)
        statement.append_whereclause(addresses.c.street==street)
    if keywords is not None:
        statement.append_from(
                users.join(userkeywords, users.c.user_id==userkeywords.c.user_id).join(
                        keywords, userkeywords.c.keyword_id==keywords.c.keyword_id))
        statement.append_whereclause(keywords.c.name.in_(keywords))
        # to avoid multiple repeats, set query to be DISTINCT:
        statement.distinct=True
    return statement.execute()

sqlfind_users(id=7)
sqlfind_users(street='123 Green Street')
sqlfind_users(name='Jack', keywords=['jack','foo'])
back to section top

Inserts

An INSERT involves just one table. The Insert object is used via the insert() function, and the specified columns determine what columns show up in the generated SQL. If primary key columns are left out of the criterion, the SQL generator will try to populate them as specified by the particular database engine and sequences, i.e. relying upon an auto-incremented column or explicitly calling a sequence beforehand. Insert statements, as well as updates and deletes, can also execute multiple parameters in one pass via specifying an array of dictionaries as parameters.

The values to be populated for an INSERT or an UPDATE can be specified to the insert()/update() functions as the values named argument, or the query will be compiled based on the values of the parameters sent to the execute() method.

Using insert()
# basic insert
sqlusers.insert().execute(user_id=1, user_name='jack', password='asdfdaf')
# insert just user_name, NULL for others
# will auto-populate primary key columns if they are configured
# to do so
sqlusers.insert().execute(user_name='ed')
# INSERT with a list:
sqlusers.insert(values=(3, 'jane', 'sdfadfas')).execute()
# INSERT with user-defined bind parameters
i = users.insert(
    values={'user_name':bindparam('name'), 'password':bindparam('pw')}
    )
sqli.execute(name='mary', pw='adas5fs')
# INSERT many - if no explicit 'values' parameter is sent,
# the first parameter list in the list determines
# the generated SQL of the insert (i.e. what columns are present)
# executemany() is used at the DBAPI level
sqlusers.insert().execute(
    {'user_id':7, 'user_name':'jack', 'password':'asdfasdf'},
    {'user_id':8, 'user_name':'ed', 'password':'asdffcadf'},
    {'user_id':9, 'user_name':'fred', 'password':'asttf'},
)
back to section top

Updates

Updates work a lot like INSERTS, except there is an additional WHERE clause that can be specified.

Using update()
# change 'jack' to 'ed'
sqlusers.update(users.c.user_name=='jack').execute(user_name='ed')
# use bind parameters
u = users.update(users.c.user_name==bindparam('name'), 
                values={'user_name':bindparam('newname')})
sqlu.execute(name='jack', newname='ed')
# update a column to another column
sqlusers.update(values={users.c.password:users.c.user_name}).execute()
# multi-update
sqlusers.update(users.c.user_id==bindparam('id')).execute(
        {'id':7, 'user_name':'jack', 'password':'fh5jks'},
        {'id':8, 'user_name':'ed', 'password':'fsr234ks'},
        {'id':9, 'user_name':'mary', 'password':'7h5jse'},
    )

Correlated Updates

A correlated update lets you update a table using selection from another table, or the same table:

s = select([addresses.c.city], addresses.c.user_id==users.c.user_id)
sqlusers.update(
    and_(users.c.user_id>10, users.c.user_id<20), 
    values={users.c.user_name:s}
).execute()
back to section top

Deletes

A delete is formulated like an update, except theres no values:

users.delete(users.c.user_id==7).execute()
users.delete(users.c.user_name.like(bindparam('name'))).execute(
        {'name':'%Jack%'},
        {'name':'%Ed%'},
        {'name':'%Jane%'},
    )
users.delete(exists())
back to section top

Basic Data Mapping

Data mapping describes the process of defining Mapper objects, which associate table metadata with user-defined classes.

The Mapper's role is to perform SQL operations upon the database, associating individual table rows with instances of those classes, and individual database columns with properties upon those instances, to transparently associate in-memory objects with a persistent database representation.

When a Mapper is created to associate a Table object with a class, all of the columns defined in the Table object are associated with the class via property accessors, which add overriding functionality to the normal process of setting and getting object attributes. These property accessors keep track of changes to object attributes; these changes will be stored to the database when the application "flushes" the current state of objects (known as a Unit of Work).

Two objects provide the primary interface for interacting with Mappers and the "unit of work", which are the Query object and the Session object. Query deals with selecting objects from the database, whereas Session provides a context for loaded objects and the ability to communicate changes on those objects back to the database.

The primary method on Query for loading objects is its select() method, which has similar arguments to a sqlalchemy.sql.Select object. But this select method executes automatically and returns results, instead of awaiting an execute() call. Instead of returning a cursor-like object, it returns an array of objects.

The three configurational elements to be defined, i.e. the Table metadata, the user-defined class, and the Mapper, are typically defined as module-level variables, and may be defined in any fashion suitable to the application, with the only requirement being that the class and table metadata are described before the mapper. For the sake of example, we will be defining these elements close together, but this should not be construed as a requirement; since SQLAlchemy is not a framework, those decisions are left to the developer or an external framework.

Also, keep in mind that the examples in this section deal with explicit Session objects mapped directly to Engine objects, which represents the most explicit style of using the ORM. Options exist for how this is configured, including binding Table objects directly to Engines (described in Binding MetaData to an Engine), as well as using the "Threadlocal" plugin which provides various code shortcuts by using an implicit Session associated to the current thread (described in threadlocal).

back to section top

Synopsis

First, the metadata/mapper configuration code:

from sqlalchemy import *

# metadata
meta = MetaData()

# table object
users_table = Table('users', meta, 
    Column('user_id', Integer, primary_key=True),
    Column('user_name', String(16)),
    Column('password', String(20))
)

# class definition 
class User(object):
    pass

# create a mapper and associate it with the User class.
# technically we dont really need the 'usermapper' variable.
usermapper = mapper(User, users_table)

Note that no database definitions are required. Next we will define an Engine and connect a Session to it, and perform a simple select:

# engine
engine = create_engine("sqlite://mydb.db")

# session
session = create_session(bind_to=engine)

# select
sqluser = session.query(User).select_by(user_name='fred')[0]
# modify
user.user_name = 'fred jones'

# flush - saves everything that changed
sqlsession.flush()
back to section top

The Query Object

The method session.query(class_or_mapper) returns a Query object. Below is a synopsis of things you can do with Query:

# get a query from a Session based on class:
query = session.query(User)

# get a query from a Session given a Mapper:
query = session.query(usermapper)

# select_by, which takes keyword arguments.  the
# keyword arguments represent property names and the values
# represent values which will be compared via the = operator.
# the comparisons are joined together via "AND".
result = query.select_by(name='john', street='123 green street')

# select_by can also combine ClauseElements with key/value properties.
# all ClauseElements and keyword-based criterion are combined together
# via "AND". 
result = query.select_by(users_table.c.user_name=='john', 
        addresses_table.c.zip_code=='12345', street='123 green street')

# get_by, which takes the same arguments as select_by
# returns a single scalar result or None if no results
user = query.get_by(id=12)

# "dynamic" versions of select_by and get_by - everything past the 
# "select_by_" or "get_by_" is used as the key, and the function argument
# as the value
result = query.select_by_name('fred')
u = query.get_by_name('fred')

# get an object directly from its primary key.  this will bypass the SQL
# call if the object has already been loaded
u = query.get(15)

# get an object that has a composite primary key of three columns.
# the order of the arguments matches that of the table meta data.
myobj = query.get((27, 3, 'receipts'))

# using a WHERE criterion
result = query.select(or_(users_table.c.user_name == 'john', users_table.c.user_name=='fred'))

# using a WHERE criterion to get a scalar
u = query.selectfirst(users_table.c.user_name=='john')

# selectone() is a stricter version of selectfirst() which
# will raise an exception if there is not exactly one row
u = query.selectone(users_table.c.user_name=='john')

# using a full select object
result = query.select(users_table.select(users_table.c.user_name=='john'))

Some of the above examples above illustrate the usage of the mapper's Table object to provide the columns for a WHERE Clause. These columns are also accessible off of the mapped class directly. When a mapper is assigned to a class, it also attaches a special property accessor c to the class itself, which can be used just like the table metadata to access the columns of the table:

userlist = session.query(User).select(User.c.user_id==12)

Full documentation for Query's API : Query.

back to section top

Saving Objects

When objects corresponding to mapped classes are created or manipulated, all changes are logged by the Session object. The changes are then written to the database when an application calls flush(). This pattern is known as a Unit of Work, and has many advantages over saving individual objects or attributes on those objects with individual method invocations. Domain models can be built with far greater complexity with no concern over the order of saves and deletes, excessive database round-trips and write operations, or deadlocking issues. The flush() operation batches its SQL statements into a transaction, and can also perform optimistic concurrency checks (using a version id column) to ensure the proper number of rows were in fact affected (not supported with the current MySQL drivers).

The Unit of Work is a powerful tool, and has some important concepts that should be understood in order to use it effectively. See the Session / Unit of Work section for a full description on all its operations.

When a mapper is created, the target class has its mapped properties decorated by specialized property accessors that track changes. New objects by default must be explicitly added to the Session, however this can be made automatic by using threadlocal or SessionContext.

mapper(User, users_table)

# create a new User
myuser = User()
myuser.user_name = 'jane'
myuser.password = 'hello123'

# create another new User      
myuser2 = User()
myuser2.user_name = 'ed'
myuser2.password = 'lalalala'

# create a Session and save them
sess = create_session()
sess.save(myuser)
sess.save(myuser2)

# load a third User from the database            
sqlmyuser3 = sess.query(User).select(User.c.user_name=='fred')[0]
myuser3.user_name = 'fredjones'

# save all changes            
sqlsession.flush()

The mapped class can also specify whatever methods and/or constructor it wants:

class User(object):
    def __init__(self, user_name, password):
        self.user_id = None
        self.user_name = user_name
        self.password = password
    def get_name(self):
        return self.user_name
    def __repr__(self):
        return "User id %s name %s password %s" % (repr(self.user_id), 
            repr(self.user_name), repr(self.password))
mapper(User, users_table)

sess = create_session()
u = User('john', 'foo')
sess.save(u)
sqlsession.flush()
>>> u
User id 1 name 'john' password 'foo'

Note that the __init__() method is not called when the instance is loaded. This is so that classes can define operations that are specific to their initial construction which are not re-called when the object is restored from the database, and is similar in concept to how Python's pickle module calls __new__() when deserializing instances. To allow __init__() to be called at object load time, or to define any other sort of on-load operation, create a MapperExtension which supplies the create_instance() method (see Extending Mapper, as well as the example in the FAQ).

SQLAlchemy will only put modified object attributes columns into the UPDATE statements generated upon flush. This is to conserve database traffic and also to successfully interact with a "deferred" attribute, which is a mapped object attribute against the mapper's primary table that isnt loaded until referenced by the application.

back to section top

Defining and Using Relationships

So that covers how to map the columns in a table to an object, how to load objects, create new ones, and save changes. The next step is how to define an object's relationships to other database-persisted objects. This is done via the relation function provided by the orm module.

One to Many

So with our User class, lets also define the User has having one or more mailing addresses. First, the table metadata:

from sqlalchemy import *

metadata = MetaData()

# define user table
users_table = Table('users', metadata, 
    Column('user_id', Integer, primary_key=True),
    Column('user_name', String(16)),
    Column('password', String(20))
)

# define user address table
addresses_table = Table('addresses', metadata,
    Column('address_id', Integer, primary_key=True),
    Column('user_id', Integer, ForeignKey("users.user_id")),
    Column('street', String(100)),
    Column('city', String(80)),
    Column('state', String(2)),
    Column('zip', String(10))
)

Of importance here is the addresses table's definition of a foreign key relationship to the users table, relating the user_id column into a parent-child relationship. When a Mapper wants to indicate a relation of one object to another, the ForeignKey relationships are the default method by which the relationship is determined (options also exist to describe the relationships explicitly).

So then lets define two classes, the familiar User class, as well as an Address class:

class User(object):
    def __init__(self, user_name, password):
        self.user_name = user_name
        self.password = password

class Address(object):
    def __init__(self, street, city, state, zip):
        self.street = street
        self.city = city
        self.state = state
        self.zip = zip

And then a Mapper that will define a relationship of the User and the Address classes to each other as well as their table metadata. We will add an additional mapper keyword argument properties which is a dictionary relating the names of class attributes to database relationships, in this case a relation object against a newly defined mapper for the Address class:

mapper(Address, addresses_table)
mapper(User, users_table, properties = {
        'addresses' : relation(Address)
    }
  )

Lets do some operations with these classes and see what happens:

engine = create_engine('sqlite:///mydb.db')
metadata.create_all(engine)

session = create_session(bind_to=engine)

u = User('jane', 'hihilala')
u.addresses.append(Address('123 anywhere street', 'big city', 'UT', '76543'))
u.addresses.append(Address('1 Park Place', 'some other city', 'OK', '83923'))

session.save(u)
session.flush()
INSERT INTO users (user_name, password) VALUES (:user_name, :password)
{'password': 'hihilala', 'user_name': 'jane'}
INSERT INTO addresses (user_id, street, city, state, zip) VALUES (:user_id, :street, :city, :state, :zip)
{'city': 'big city', 'state': 'UT', 'street': '123 anywhere street', 'user_id':1, 'zip': '76543'}
INSERT INTO addresses (user_id, street, city, state, zip) VALUES (:user_id, :street, :city, :state, :zip)
{'city': 'some other city', 'state': 'OK', 'street': '1 Park Place', 'user_id':1, 'zip': '83923'}

A lot just happened there! The Mapper figured out how to relate rows in the addresses table to the users table, and also upon flush had to determine the proper order in which to insert rows. After the insert, all the User and Address objects have their new primary and foreign key attributes populated.

Also notice that when we created a Mapper on the User class which defined an addresses relation, the newly created User instance magically had an "addresses" attribute which behaved like a list. This list is in reality a property function which returns an instance of sqlalchemy.util.HistoryArraySet. This object fulfills the full set of Python list accessors, but maintains a unique set of objects (based on their in-memory identity), and also tracks additions and deletions to the list:

del u.addresses[1]
u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839'))

session.flush()
UPDATE addresses SET user_id=:user_id
WHERE addresses.address_id = :addresses_address_id
[{'user_id': None, 'addresses_address_id': 2}]
INSERT INTO addresses (user_id, street, city, state, zip)
VALUES (:user_id, :street, :city, :state, :zip)
{'city': 'Houston', 'state': 'TX', 'street': '27 New Place', 'user_id': 1, 'zip': '34839'}

Note that when creating a relation with the relation() function, the target can either be a class, in which case the primary mapper for that class is used as the target, or a Mapper instance itself, as returned by the mapper() function.

back to section top

Lifecycle Relations

In the previous example, a single address was removed from the addresses attribute of a User object, resulting in the corresponding database row being updated to have a user_id of None. But now, theres a mailing address with no user_id floating around in the database of no use to anyone. How can we avoid this ? This is acheived by using the cascade parameter of relation:

clear_mappers()  # clear mappers from the previous example
mapper(Address, addresses_table)
mapper(User, users_table, properties = {
        'addresses' : relation(Address, cascade="all, delete-orphan")
    }
  )

del u.addresses[1]
u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839'))

session.flush()
INSERT INTO addresses (user_id, street, city, state, zip)
VALUES (:user_id, :street, :city, :state, :zip)
{'city': 'Houston', 'state': 'TX', 'street': '27 New Place', 'user_id': 1, 'zip': '34839'}
DELETE FROM addresses WHERE addresses.address_id = :address_id
[{'address_id': 2}]

In this case, with the delete-orphan cascade rule set, the element that was removed from the addresses list was also removed from the database. Specifying cascade="all, delete-orphan" means that every persistence operation performed on the parent object will be cascaded to the child object or objects handled by the relation, and additionally that each child object cannot exist without being attached to a parent. Such a relationship indicates that the lifecycle of the Address objects are bounded by that of their parent User object.

Cascading is described fully in Cascade rules.

back to section top

Backreferences

By creating relations with the backref keyword, a bi-directional relationship can be created which will keep both ends of the relationship updated automatically, independently of database operations. Below, the User mapper is created with an addresses property, and the corresponding Address mapper receives a "backreference" to the User object via the property name user:

Address = mapper(Address, addresses_table)
User = mapper(User, users_table, properties = {
                'addresses' : relation(Address, backref='user')
            }
          )

u = User('fred', 'hi')
a1 = Address('123 anywhere street', 'big city', 'UT', '76543')
a2 = Address('1 Park Place', 'some other city', 'OK', '83923')

# append a1 to u
u.addresses.append(a1)

# attach u to a2
a2.user = u

# the bi-directional relation is maintained
>>> u.addresses == [a1, a2]
True
>>> a1.user is user and a2.user is user
True

The backreference feature also works with many-to-many relationships, which are described later. When creating a backreference, a corresponding property (i.e. a second relation()) is placed on the child mapper. The default arguments to this property can be overridden using the backref() function:

mapper(User, users_table)
mapper(Address, addresses_table, properties={
    'user':relation(User, backref=backref('addresses', cascade="all, delete-orphan"))
})

The backref() function is often used to set up a bi-directional one-to-one relationship. This is because the relation() function by default creates a "one-to-many" relationship when presented with a primary key/foreign key relationship, but the backref() function can redefine the uselist property to make it a scalar:

mapper(User, users_table)
mapper(Address, addresses_table, properties={
    'user' : relation(User, backref=backref('address', uselist=False))
})
back to section top

Selecting from Relationships

We've seen how the relation specifier affects the saving of an object and its child items, how does it affect selecting them? By default, the relation keyword indicates that the related property should be attached a lazy loader when instances of the parent object are loaded from the database; this is just a callable function that when accessed will invoke a second SQL query to load the child objects of the parent.

# define a mapper
mapper(User, users_table, properties = {
      'addresses' : relation(mapper(Address, addresses_table))
    })

# select users where username is 'jane', get the first element of the list
# this will incur a load operation for the parent table
sqluser = session.query(User).select(User.c.user_name=='jane')[0]
# iterate through the User object's addresses.  this will incur an
# immediate load of those child items
sqlfor a in user.addresses:
print repr(a)

Selecting With Joins

When using mappers that have relationships to other mappers, the need to specify query criterion across multiple tables arises. SQLAlchemy provides several core techniques which offer this functionality.

When specifying columns to the select method of Query, if the columns are attached to a table other than the mapped table, that table is automatically added to the "FROM" clause of the query. This is the same behavior that occurs when creating a non-ORM select object. Using this feature, joins can be created when querying:

sqll = session.query(User).select(and_(users.c.user_id==addresses.c.user_id, 
             addresses.c.street=='123 Green Street'))

Above, we specified selection criterion that included columns from both the users and the addresses table. Note that in this case, we had to specify not just the matching condition to the street column on addresses, but also the join condition between the users and addresses table. Even though the User mapper has a relationship to the Address mapper where the join condition is known, the select method does not assume how you want to construct joins. If we did not include the join clause, we would get:

# this is usually not what you want to do
sqll = session.query(User).select(addresses.c.street=='123 Green Street')

The above join will return all rows of the users table, even those that do not correspond to the addresses table, in a cartesian product with the matching rows from the addresses table.

Another way to specify joins more explicitly is to use the from_obj parameter of select(). This allows you to explicitly place elements in the FROM clause of the query, which could include lists of tables and/or Join constructs:

sqll = session.query(User).select(addresses.c.street=='123 Green Street', 
                from_obj=[users.join(addresses)])

In the above example, the join function by default creates a natural join between the two tables, so we were able to avoid having to specify the join condition between users and addresses explicitly.

back to section top

Creating Joins Using select_by()

Another way that joins can be created is by using the select_by method of Query, which has the ability to create joins across relationships automatically. This method is in many circumstances more convenient than, but not as flexible as, the more SQL-level approach using the select() method described in the previous section.

To issue a join using select_by, just specify a key in the argument list which is not present in the primary mapper's list of properties or columns, but is present in the property list of some relationship down the line of objects. The Query object will recursively traverse along the mapped relationships starting with the lead class and descending into child classes, until it finds a property matching the given name. For each new mapper it encounters along the path to the located property, it constructs a join across that relationship:

sqll = session.query(User).select_by(street='123 Green Street')

The above example is shorthand for:

l = session.query(User).select(and_(
         Address.c.user_id==User.c.user_id, 
         Address.c.street=='123 Green Street')
   )

All keyword arguments sent to select_by are used to create query criterion. This means that familiar select keyword options like order_by and limit are not directly available. To enable these options with select_by, you can try the SelectResults extension which offers transitive methods off the result of a select or select_by such as order_by(), limit(), etc.

Note that the select_by method, while it primarily uses keyword arguments, also can accomodate ClauseElement objects positionally; recall that a ClauseElement is genearated when producing a comparison off of a Column expression, such as users.c.name=='ed'. When using ClauseElements with select_by, these clauses are passed directly to the generated SQL and are not used to further locate join criterion. If criterion is being constructed with these kinds of expressions, consider using the select() method which is better designed to accomodate these expressions.

As of SA 0.3.4, select_by() and related functions can compare not only column-based attributes to column-based values, but also relations to object instances:

# get an instance of Address
someaddress = session.query(Address).get_by(street='123 Green Street')

# look for User instances which have the 
# "someaddress" instance in their "addresses" collection
l = session.query(User).select_by(addresses=someaddress)

Where above, the comparison denoted by addresses=someaddress is constructed by comparing all the primary key columns in the Address mapper to each corresponding primary key value in the someaddress entity. In other words, its equivalent to saying select_by(address_id=someaddress.address_id) (Alpha API).

back to section top

Generating Join Criterion Using join_to, join_via

Feature Status: Alpha API

The join_to method of Query is a component of the select_by operation, and is given a keyname in order to return a "join path" from the Query's mapper to the mapper which is referenced by a relation() of the given name:

>>> q = session.query(User)
>>> j = q.join_to('addresses')
>>> print j
users.user_id=addresses.user_id

join_to can also be given the name of a column-based property, in which case it will locate a path to the nearest mapper which has that property as a column:

>>> q = session.query(User)
>>> j = q.join_to('street')
>>> print j
users.user_id=addresses.user_id

Also available is the join_via function, which is similar to join_to, except instead of traversing through all properties to find a path to the given key, its given an explicit path to the target property:

>>> q = session.query(User)
>>> j = q.join_via(['orders', 'items'])
>>> print j
users.c.user_id==orders.c.user_id AND orders.c.item_id==items.c.item_id

Expressions produced by join_to and join_via can be used with select to create more complicated query criterion across multiple relations:

>>> l = q.select(
    (addresses_table.c.street=='some address') &
    (items_table.c.item_name=='item #4') &
    q.join_to('addresses') &
    q.join_via(['orders', 'items'])
    )

Note that the from_obj parameter of select(), described previously, allows finer grained control of joins, allowing any combination of inner and outer joins.

back to section top

Eager Loading

Eager Loading describes the loading of parent and child objects across a relation using a single query. The purpose of eager loading is strictly one of performance enhancement; eager loading has no impact on the results of a query, except that when traversing child objects within the results, lazy loaders will not need to issue separate queries to load those child objects.

Eager Loading is enabled on a per-relationship basis, either as the default for a particular relationship, or for a single query using query options, described later.

With just a single parameter lazy=False specified to the relation object, the parent and child SQL queries can be joined together.

mapper(Address, addresses_table)
mapper(User, users_table, properties = {
        'addresses' : relation(Address, lazy=False)
    }
  )

sqlusers = session.query(User).select(User.c.user_name=='Jane')
for u in users:
    print repr(u)
    for a in u.addresses:
        print repr(a)

Above, a pretty ambitious query is generated just by specifying that the User should be loaded with its child Addresses in one query. When the mapper processes the results, it uses an Identity Map to keep track of objects that were already loaded, based on their primary key identity. Through this method, the redundant rows produced by the join are organized into the distinct object instances they represent.

Recall that eager loading has no impact on the results of the query. What if our query included our own join criterion? The eager loading query accomodates this using aliases, and is immune to the effects of additional joins being specified in the original query. To use our select_by example above, joining against the "addresses" table to locate users with a certain street results in this behavior:

sqlusers = session.query(User).select_by(street='123 Green Street')

The join implied by passing the "street" parameter is separate from the join produced by the eager join, which is "aliasized" to prevent conflicts.

back to section top

Using Options to Change the Loading Strategy

The options method on the Query object provides an easy way to get alternate forms of a mapper query from an original one. The most common use of this feature is to change the "eager/lazy" loading behavior of a particular mapper, via the functions eagerload(), lazyload() and noload():

# user mapper with lazy addresses
mapper(User, users_table, properties = {
             'addresses' : relation(mapper(Address, addresses_table))
         }
)

# query object
query = session.query(User)

# make an eager loading query
eagerquery = query.options(eagerload('addresses'))
u = eagerquery.select()

# make another query that wont load the addresses at all
plainquery = query.options(noload('addresses'))

# multiple options can be specified
myquery = oldquery.options(lazyload('tracker'), noload('streets'), eagerload('members'))

# to specify a relation on a relation, separate the property names by a "."
myquery = oldquery.options(eagerload('orders.items'))
back to section top

One to One/Many to One

The above examples focused on the "one-to-many" relationship. To do other forms of relationship is easy, as the relation function can usually figure out what you want:

metadata = MetaData()

# a table to store a user's preferences for a site
prefs_table = Table('user_prefs', metadata,
    Column('pref_id', Integer, primary_key = True),
    Column('stylename', String(20)),
    Column('save_password', Boolean, nullable = False),
    Column('timezone', CHAR(3), nullable = False)
)

# user table with a 'preference_id' column
users_table = Table('users', metadata, 
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(16), nullable = False),
    Column('password', String(20), nullable = False),
    Column('preference_id', Integer, ForeignKey("user_prefs.pref_id"))
)

# engine and some test data
engine = create_engine('sqlite:///', echo=True)
metadata.create_all(engine)
engine.execute(prefs_table.insert(), dict(pref_id=1, stylename='green', save_password=1, timezone='EST'))
engine.execute(users_table.insert(), dict(user_name = 'fred', password='45nfss', preference_id=1))

# classes
class User(object):
    def __init__(self, user_name, password):
        self.user_name = user_name
        self.password = password

class UserPrefs(object):
    pass

mapper(UserPrefs, prefs_table)

mapper(User, users_table, properties = dict(
    preferences = relation(UserPrefs, lazy=False, cascade="all, delete-orphan"),
))

# select
session = create_session(bind_to=engine)
sqluser = session.query(User).get_by(user_name='fred')
save_password = user.preferences.save_password

# modify
user.preferences.stylename = 'bluesteel'

# flush
sqlsession.flush()
back to section top

Many to Many

The relation function handles a basic many-to-many relationship when you specify the association table:

metadata = MetaData()

articles_table = Table('articles', metadata,
    Column('article_id', Integer, primary_key = True),
    Column('headline', String(150), key='headline'),
    Column('body', TEXT, key='body'),
)

keywords_table = Table('keywords', metadata,
    Column('keyword_id', Integer, primary_key = True),
    Column('keyword_name', String(50))
)

itemkeywords_table = Table('article_keywords', metadata,
    Column('article_id', Integer, ForeignKey("articles.article_id")),
    Column('keyword_id', Integer, ForeignKey("keywords.keyword_id"))
)

engine = create_engine('sqlite:///')
metadata.create_all(engine)

# class definitions
class Keyword(object):
    def __init__(self, name):
        self.keyword_name = name

class Article(object):
    pass

mapper(Keyword, keywords_table)

# define a mapper that does many-to-many on the 'itemkeywords' association 
# table
mapper(Article, articles_table, properties = dict(
    keywords = relation(Keyword, secondary=itemkeywords_table, lazy=False)
    )
)

session = create_session(bind_to=engine)

article = Article()
article.headline = 'a headline'
article.body = 'this is the body'
article.keywords.append(Keyword('politics'))
article.keywords.append(Keyword('entertainment'))
session.save(article)

sqlsession.flush()
# select articles based on a keyword.  select_by will handle the extra joins.
sqlarticles = session.query(Article).select_by(keyword_name='politics')
a = articles[0]

# clear out keywords with a new list
a.keywords = []
a.keywords.append(Keyword('topstories'))
a.keywords.append(Keyword('government'))

# flush
sqlsession.flush()
back to section top

Association Object

Many to Many can also be done with an association object, that adds additional information about how two items are related. In this pattern, the "secondary" option to relation() is no longer used; instead, the association object becomes a mapped entity itself, mapped to the association table. If the association table has no explicit primary key columns defined, you also have to tell the mapper what columns will compose its "primary key", which are typically the two (or more) columns involved in the association. Also, the relation between the parent and association mapping is typically set up with a cascade of all, delete-orphan. This is to ensure that when an association object is removed from its parent collection, it is deleted (otherwise, the unit of work tries to null out one of the foreign key columns, which raises an error condition since that column is also part of its "primary key").

from sqlalchemy import *
metadata = MetaData()

users_table = Table('users', metadata, 
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(16), nullable = False),
)

articles_table = Table('articles', metadata,
    Column('article_id', Integer, primary_key = True),
    Column('headline', String(150), key='headline'),
    Column('body', TEXT, key='body'),
)

keywords_table = Table('keywords', metadata,
    Column('keyword_id', Integer, primary_key = True),
    Column('keyword_name', String(50))
)

# add "attached_by" column which will reference the user who attached this keyword
itemkeywords_table = Table('article_keywords', metadata,
    Column('article_id', Integer, ForeignKey("articles.article_id")),
    Column('keyword_id', Integer, ForeignKey("keywords.keyword_id")),
    Column('attached_by', Integer, ForeignKey("users.user_id"))
)

engine = create_engine('sqlite:///', echo=True)
metadata.create_all(engine)

# class definitions
class User(object):
    pass
class Keyword(object):
    def __init__(self, name):
        self.keyword_name = name
class Article(object):
    pass
class KeywordAssociation(object):
    pass

# Article mapper, relates to Keyword via KeywordAssociation
mapper(Article, articles_table, properties={
    'keywords':relation(KeywordAssociation, lazy=False, cascade="all, delete-orphan")
    }
)

# mapper for KeywordAssociation
# specify "primary key" columns manually
mapper(KeywordAssociation, itemkeywords_table,
    primary_key=[itemkeywords_table.c.article_id, itemkeywords_table.c.keyword_id],
    properties={
        'keyword' : relation(Keyword, lazy=False), 
        'user' : relation(User, lazy=False) 
    }
)

# user mapper
mapper(User, users_table)

# keyword mapper
mapper(Keyword, keywords_table)

session = create_session(bind_to=engine)
# select by keyword
sqlalist = session.query(Article).select_by(keyword_name='jacks_stories')
# user is available
for a in alist:
    for k in a.keywords:
        if k.keyword.name == 'jacks_stories':
            print k.user.user_name

Keep in mind that the association object works a little differently from a plain many-to-many relationship. Members have to be added to the list via instances of the association object, which in turn point to the associated object:

user = User()
user.user_name = 'some user'

article = Article()

assoc = KeywordAssociation()
assoc.keyword = Keyword('blue')
assoc.user = user

assoc2 = KeywordAssociation()
assoc2.keyword = Keyword('green')
assoc2.user = user

article.keywords.append(assoc)
article.keywords.append(assoc2)

session.save(article)

session.flush()

SQLAlchemy includes an extension module which can be used in some cases to decrease the explicitness of the association object pattern; this extension is described in associationproxy.

back to section top

Overview

The concept behind Unit of Work is to track modifications to a field of objects, and then be able to flush those changes to the database in a single operation. Theres a lot of advantages to this, including that your application doesn't need to worry about individual save operations on objects, nor about the required order for those operations, nor about excessive repeated calls to save operations that would be more efficiently aggregated into one step. It also simplifies database transactions, providing a neat package with which to insert into the traditional database begin/commit phase.

SQLAlchemy's unit of work includes these functions:

  • The ability to monitor scalar and list attributes on object instances, as well as object creates. This is handled via the attributes package.
  • The ability to maintain and process a list of modified objects, and based on the relationships set up by the mappers for those objects as well as the foreign key relationships of the underlying tables, figure out the proper order of operations so that referential integrity is maintained, and also so that on-the-fly values such as newly created primary keys can be propigated to dependent objects that need them before they are saved. The central algorithm for this is the topological sort.
  • The ability to define custom functionality that occurs within the unit-of-work flush phase, such as "before insert", "after insert", etc. This is accomplished via MapperExtension.
  • an Identity Map, which is a dictionary storing the one and only instance of an object for a particular table/primary key combination. This allows many parts of an application to get a handle to a particular object without any chance of modifications going to two different places.
  • The sole interface to the unit of work is provided via the Session object. Transactional capability, which rides on top of the transactions provided by Engine objects, is provided by the SessionTransaction object.
  • Thread-locally scoped Session behavior is available as an option, which allows new objects to be automatically added to the Session corresponding to by the default Session context. Without a default Session context, an application must explicitly create a Session manually as well as add new objects to it. The default Session context, disabled by default, can also be plugged in with other user-defined schemes, which may also take into account the specific class being dealt with for a particular operation.
  • The Session object borrows conceptually from that of Hibernate, a leading ORM for Java that was a great influence on the creation of the JSR-220 specification. SQLAlchemy, under no obligation to conform to EJB specifications, is in general very different from Hibernate, providing a different paradigm for producing queries, a SQL API that is useable independently of the ORM, and of course Pythonic configuration as opposed to XML; however, JSR-220/Hibernate makes some pretty good suggestions with regards to the mechanisms of persistence.
back to section top

Object States

When dealing with mapped instances with regards to Sessions, an instance may be attached or unattached to a particular Session. An instance also may or may not correspond to an actual row in the database. The product of these two binary conditions yields us four general states a particular instance can have within the perspective of the Session:

  • Transient - a transient instance exists within memory only and is not associated with any Session. It also has no database identity and does not have a corresponding record in the database. When a new instance of a class is constructed, and no default session context exists with which to automatically attach the new instance, it is a transient instance. The instance can then be saved to a particular session in which case it becomes a pending instance. If a default session context exists, new instances are added to that Session by default and therefore become pending instances immediately.

  • Pending - a pending instance is a Session-attached object that has not yet been assigned a database identity. When the Session is flushed (i.e. changes are persisted to the database), a pending instance becomes persistent.

  • Persistent - a persistent instance has a database identity and a corresponding record in the database, and is also associated with a particular Session. By "database identity" we mean the object is associated with a table or relational concept in the database combined with a particular primary key in that table. Objects that are loaded by SQLAlchemy in the context of a particular session are automatically considered persistent, as are formerly pending instances which have been subject to a session flush().

  • Detached - a detached instance is an instance which has a database identity and corresponding row in the database, but is not attached to any Session. This occurs when an instance has been removed from a Session, either because the session itself was cleared or closed, or the instance was explicitly removed from the Session. The object can be re-attached to a session in which case it becomes Persistent again; any un-persisted changes that exist on the instance, whether they occurred during its previous persistent state or during its detached state will be detected and maintained by the new session. Detached instances are useful when an application needs to represent a long-running operation across multiple Sessions, needs to store an object in a serialized state and then restore it later (such as within an HTTP "session" object), or in some cases where code needs to load instances locally which will later be associated with some other Session.

back to section top

Acquiring a Session

A new Session object is constructed via the create_session() function:

session = create_session()

A common option used with create_session() is to specify a specific Engine or Connection to be used for all operations performed by this Session:

# create an engine
e = create_engine('postgres://some/url')

# create a Session that will use this engine for all operations.
# it will open and close Connections as needed.
session = create_session(bind_to=e)

# open a Connection
conn = e.connect()

# create a Session that will use this specific Connection for all operations
session = create_session(bind_to=conn)

The session to which an object is attached can be acquired via the object_session() function, which returns the appropriate Session if the object is pending or persistent, or None if the object is transient or detached:

session = object_session(obj)

Session Facts:

  • the Session object is not threadsafe. For thread-local management of Sessions, the recommended approch is to use the SessionContext extension module.

We will now cover some of the key concepts used by Sessions and its underlying Unit of Work.

back to section top

Introduction to the Identity Map

A primary concept of the Session's underlying Unit of Work is that it is keeps track of all persistent instances; recall that a persistent instance has a database identity and is attached to a Session. In particular, the Unit of Work must ensure that only one copy of a particular persistent instance exists within the Session at any given time. The UOW accomplishes this task using a dictionary known as an Identity Map.

When a Query is used to issue select or get requests to the database, it will in nearly all cases result in an actual SQL execution to the database, and a corresponding traversal of rows received from that execution. However, when the underlying mapper actually creates objects corresponding to the result set rows it receives, it will check the session's identity map first before instantating a new object, and return the same instance already present in the identity map if it already exists, essentially ignoring the object state represented by that row. There are several ways to override this behavior and truly refresh an already-loaded instance which are described later, but the main idea is that once your instance is loaded into a particular Session, it will never change its state without your explicit approval, regardless of what the database says about it.

For example; below, two separate calls to load an instance with database identity "15" are issued, and the results assigned to two separate variables. However, since the same Session was used, the two instances are the same instance:

mymapper = mapper(MyClass, mytable)

session = create_session()
obj1 = session.query(MyClass).selectfirst(mytable.c.id==15)
obj2 = session.query(MyClass).selectfirst(mytable.c.id==15)

>>> obj1 is obj2
True

The Identity Map is an instance of dict by default. (This is new as of version 0.3.2). As an option, you can specify the flag weak_identity_map=True to the create_session function so that it will use a weakref.WeakValueDictionary, so that when an in-memory object falls out of scope, it will be removed automatically, thereby providing some automatic management of memory. However, this may not be instant if there are circular references upon the object. To guarantee that an instance is removed from the identity map before removing references to it, use the expunge() method, described later, to remove it. Additionally, note that an object that has changes marked on it (i.e. "dirty") can still fall out of scope when using weak_identity_map.

The Session supports an iterator interface in order to see all objects in the identity map:

for obj in session:
    print obj

As well as __contains__():

if obj in session:
    print "Object is present"

The identity map itself is accessible via the identity_map accessor:

>>> session.identity_map.values()
[<__main__.User object at 0x712630>, <__main__.Address object at 0x712a70>]

The identity of each object instance is available via the _instance_key property attached to each object instance, and is a tuple consisting of the object's class and an additional tuple of primary key values, in the order that they appear within the table definition:

>>> obj._instance_key 
(<class 'test.tables.User'>, (7,))

At the moment that an object is assigned this key within a flush() operation, it is also added to the session's identity map.

The get() method on Query, which retrieves an object based on primary key identity, also checks in the Session's identity map first to save a database round-trip if possible. In the case of an object lazy-loading a single child object, the get() method is used as well, so scalar-based lazy loads may in some cases not query the database; this is particularly important for backreference relationships as it can save a lot of queries.

back to section top

Whats Changed ?

The next concept is that in addition to the Session storing a record of all objects loaded or saved, it also stores lists of all newly created (i.e. pending) objects and lists of all persistent objects that have been marked as deleted. These lists are used when a flush() call is issued to save all changes. During a flush operation, it also scans its list of persistent instances for changes which are marked as dirty.

These records are all tracked by collection functions that are also viewable off the Session as properties:

# pending objects recently added to the Session
session.new

# persistent objects which currently have changes detected
# (this collection is now created on the fly each time the property is called)
session.dirty

# persistent objects that have been marked as deleted via session.delete(obj)
session.deleted

Note that if a session is created with the weak_identity_map flag, an item which is marked as "dirty" will be silently removed from the session if the item falls out of scope in the user application. This is because the unit of work does not look for "dirty" changes except for within a flush operation (or any time the session.dirty collection is accessed).

As for objects inside of new and deleted, if you abandon all references to new or modified objects within a session, they are still present in either of those two lists, and will be saved on the next flush operation, unless they are removed from the Session explicitly (more on that later).

back to section top

The Session API

query()

The query() function takes a class or Mapper as an argument, along with an optional entity_name parameter, and returns a new Query object which will issue mapper queries within the context of this Session. If a Mapper is passed, then the Query uses that mapper. Otherwise, if a class is sent, it will locate the primary mapper for that class which is used to construct the Query.

# query from a class
session.query(User).select_by(name='ed')

# query from a mapper
query = session.query(usermapper)
x = query.get(1)

# query from a class mapped with entity name 'alt_users'
q = session.query(User, entity_name='alt_users')
y = q.options(eagerload('orders')).select()

entity_name is an optional keyword argument sent with a class object, in order to further qualify which primary mapper to be used; this only applies if there was a Mapper created with that particular class/entity name combination, else an exception is raised. All of the methods on Session which take a class or mapper argument also take the entity_name argument, so that a given class can be properly matched to the desired primary mapper.

All instances retrieved by the returned Query object will be stored as persistent instances within the originating Session.

back to section top

get()

Given a class or mapper, a scalar or tuple-based identity, and an optional entity_name keyword argument, creates a Query corresponding to the given mapper or class/entity_name combination, and calls the get() method with the given identity value. If the object already exists within this Session, it is simply returned, else it is queried from the database. If the instance is not found, the method returns None.

# get Employer primary key 5
employer = session.get(Employer, 5)

# get Report composite primary key 7,12, using mapper 'report_mapper_b'
report = session.get(Report, (7,12), entity_name='report_mapper_b')
back to section top

load()

load() is similar to get() except it will raise an exception if the instance does not exist in the database. It will also load the object's data from the database in all cases, and overwrite all changes on the object if it already exists in the session with the latest data from the database.

# load Employer primary key 5
employer = session.load(Employer, 5)

# load Report composite primary key 7,12, using mapper 'report_mapper_b'
report = session.load(Report, (7,12), entity_name='report_mapper_b')
back to section top

save()

save() is called with a single transient (unsaved, unattached) instance as an argument, which is then added to the Session and becomes pending. When the session is next flushed, the instance will be saved to the database uponwhich it becomes persistent (saved, attached). If the given instance is not transient, meaning it is either attached to an existing Session or it has a database identity, an exception is raised.

user1 = User(name='user1')
user2 = User(name='user2')
session.save(user1)
session.save(user2)

session.flush()     # write changes to the database

save() is called automatically for new instances by the classes' associated mapper, if a default Session context is in effect (such as a thread-local session), which means that newly created instances automatically become pending. If there is no default session available, then the instance remains transient (unattached) until it is explicitly added to a Session via the save() method.

A transient instance also can be automatically saveed if it is associated with a parent object which specifies save-update within its cascade rules, and that parent is already attached or becomes attached to a Session. For more information on cascade, see the next section.

The save_or_update() method, covered later, is a convenience method which will call the save() or update() methods appropriately dependening on whether or not the instance has a database identity (but the instance still must be unattached).

back to section top

flush()

This is the main gateway to what the Unit of Work does best, which is save everything ! It should be clear by now what a flush looks like:

session.flush()

It also can be called with a list of objects; in this form, the flush operation will be limited only to the objects specified in the list, as well as any child objects within private relationships for a delete operation:

# saves only user1 and address2.  all other modified
# objects remain present in the session.
session.flush([user1, address2])

This second form of flush should be used carefully as it will not necessarily locate other dependent objects within the session, whose database representation may have foreign constraint relationships with the objects being operated upon.

Notes on Flush

A common misconception about the flush() operation is that once performed, the newly persisted instances will automatically have related objects attached to them, based on the values of primary key identities that have been assigned to the instances before they were persisted. An example would be, you create a new Address object, set address.user_id to 5, and then flush() the session. The erroneous assumption would be that there is now a User object of identity "5" attached to the Address object, but in fact this is not the case. If you were to refresh() the Address, invalidating its current state and re-loading, then it would have the appropriate User object present.

This misunderstanding is related to the observed behavior of backreferences (Backreferences), which automatically associates an instance "A" with another instance "B", in response to the manual association of instance "B" to instance "A" by the user. The backreference operation occurs completely externally to the flush() operation, and is pretty much the only example of a SQLAlchemy feature that manipulates the relationships of persistent objects.

The primary guideline for dealing with flush() is, the developer is responsible for maintaining in-memory objects and their relationships to each other, the unit of work is responsible for maintaining the database representation of the in-memory objects. The typical pattern is that the manipulation of objects is the way that changes get communicated to the unit of work, so that when the flush occurs, the objects are already in their correct in-memory representation and problems dont arise. The manipulation of identifier attributes like integer key values as well as deletes in particular are a frequent source of confusion.

back to section top

close()

This method first calls clear(), removing all objects from this Session, and then ensures that any transactional resources are closed.

back to section top

delete()

The delete method places an instance into the Unit of Work's list of objects to be marked as deleted:

# mark two objects to be deleted
session.delete(obj1)
session.delete(obj2)

# flush
session.flush()

The delete operation will have an effect on instances that are attached to the deleted instance according to the cascade style of the relationship; cascade rules are described further in the following section. By default, associated instances may need to be updated in the database to reflect that they no longer are associated with the parent object, before the parent is deleted. If the relationship specifies cascade="delete", then the associated instance will also be deleted upon flush, assuming it is still attached to the parent. If the relationship additionally includes the delete-orphan cascade style, the associated instance will be deleted if it is still attached to the parent, or is unattached to any other parent.

The delete() operation has no relationship to the in-memory status of the instance, including usage of the del Python statement. An instance marked as deleted and flushed will still exist within memory until references to it are freed; similarly, removing an instance from memory via the del statement will have no effect, since the persistent instance will still be referenced by its Session. Obviously, if the instance is removed from the Session and then totally dereferenced, it will no longer exist in memory, but also won't exist in any Session and is therefore not deleted from the database.

Note that the "in-memory status" of an instance also refers to its presence in any other collection. SQLAlchemy does not track the collections to which an instance is a member, and will not remove an instance from its parent collections that were not directly involved in a deletion operation. The operational and memory overhead implied by this would be too great (such as, if an object belonged to hundreds of collections). This means if an object A is attached to both an object B and an object C, if you delete() A and flush, A still remains attached to both B and C in a deleted state and must be removed by the application. Similarly, if a delete on B cascades to A, this does not affect A still being present on C - again it must be manually removed.

back to section top

clear()

This method detaches all instances from the Session, sending them to the detached or transient state as applicable, and replaces the underlying UnitOfWork with a new one.

session.clear()

The clear() method is particularly useful with a "default context" session such as a thread-local session, which can stay attached to the current thread to handle a new field of objects without having to re-attach a new Session.

back to section top

refresh() / expire()

To assist with the Unit of Work's "sticky" behavior, individual objects can have all of their attributes immediately re-loaded from the database, or marked as "expired" which will cause a re-load to occur upon the next access of any of the object's mapped attributes. This includes all relationships, so lazy-loaders will be re-initialized, eager relationships will be repopulated. Any changes marked on the object are discarded:

# immediately re-load attributes on obj1, obj2
session.refresh(obj1)
session.refresh(obj2)

# expire objects obj1, obj2, attributes will be reloaded
# on the next access:
session.expire(obj1)
session.expire(obj2)
back to section top

expunge()

Expunge removes an object from the Session, sending persistent instances to the detached state, and pending instances to the transient state:

session.expunge(obj1)

Use expunge when youd like to remove an object altogether from memory, such as before calling del on it, which will prevent any "ghost" operations occuring when the session is flushed.

back to section top

bind_mapper() / bind_table()

Both of these methods receive two arguments; in the case of bind_mapper(), it is a Mapper and an Engine or Connection instance; in the case of bind_table(), it is a Table instance or other Selectable (such as an Alias, Select, etc.), and an Engine or Connection instance.

engine1 = create_engine('sqlite:///file1.db')
engine2 = create_engine('mysql://localhost')

sqlite_conneciton = engine1.connect()

sess = create_session()

sess.bind_mapper(mymapper, sqlite_connection)  # bind mymapper operations to a single SQLite connection
sess.bind_table(email_addresses_table, engine2) # bind operations with the email_addresses_table to mysql

Normally, when a Session is created via create_session() with no arguments, the Session has no awareness of individual Engines, and when mappers use the Session to retrieve connections, the underlying MetaData each Table is associated with is expected to be "bound" to an Engine, else no engine can be located and an exception is raised. A second form of create_session() takes the argument bind_to=engine_or_connection, where all SQL operations performed by this Session use the single Engine or Connection (collectively known as a Connectable) passed to the constructor. With bind_mapper() and bind_table(), the operations of individual mapper and/or tables are bound to distinct engines or connections, thereby overriding not only the engine which may be "bound" to the underlying MetaData, but also the Engine or Connection which may have been passed to the create_session() function. Configurations which interact with multiple explicit database connections at one time must use either or both of these methods in order to associate Session operations with the appropriate connection resource.

Binding a Mapper to a resource takes precedence over a Table bind, meaning if mapper A is associated with table B, and the Session binds mapper A to connection X and table B to connection Y, an operation with mapper A will use connection X, not connection Y.

back to section top

update()

The update() method is used only with detached instances. A detached instance only exists if its Session was cleared or closed, or the instance was expunge()d from its session. update() will re-attach the detached instance with this Session, bringing it back to the persistent state, and allowing any changes on the instance to be saved when the Session is next flushed. If the instance is already attached to an existing Session, an exception is raised.

A detached instance also can be automatically updateed if it is associated with a parent object which specifies save-update within its cascade rules, and that parent is already attached or becomes attached to a Session. For more information on cascade, see the next section.

The save_or_update() method is a convenience method which will call the save() or update() methods appropriately dependening on whether or not the instance has a database identity (but the instance still must be unattached).

back to section top

save_or_update()

This method is a combination of the save() and update() methods, which will examine the given instance for a database identity (i.e. if it is transient or detached), and will call the implementation of save() or update() as appropriate. Use save_or_update() to add unattached instances to a session when you're not sure if they were newly created or not. Like save() and update(), save_or_update() cascades along the save-update cascade indicator, described in the cascade section below.

back to section top

merge()

Feature Status: Alpha Implementation

merge() is used to return the persistent version of an instance that is not attached to this Session. When passed an instance, if an instance with its database identity already exists within this Session, it is returned. If the instance does not exist in this Session, it is loaded from the database and then returned.

A future version of merge() will also update the Session's instance with the state of the given instance (hence the name "merge").

This method is useful for bringing in objects which may have been restored from a serialization, such as those stored in an HTTP session:

# deserialize an object
myobj = pickle.loads(mystring)

# "merge" it.  if the session already had this object in the 
# identity map, then you get back the one from the current session.
myobj = session.merge(myobj)

Note that merge() does not associate the given instance with the Session; it remains detached (or attached to whatever Session it was already attached to).

back to section top

Cascade rules

Feature Status: Alpha Implementation

Mappers support the concept of configurable cascade behavior on relation()s. This behavior controls how the Session should treat the instances that have a parent-child relationship with another instance that is operated upon by the Session. Cascade is indicated as a comma-separated list of string keywords, with the possible values all, delete, save-update, refresh-expire, merge, expunge, and delete-orphan.

Cascading is configured by setting the cascade keyword argument on a relation():

mapper(Order, order_table, properties={
    'items' : relation(Item, items_table, cascade="all, delete-orphan"),
    'customer' : relation(User, users_table, user_orders_table, cascade="save-update"),
})

The above mapper specifies two relations, items and customer. The items relationship specifies "all, delete-orphan" as its cascade value, indicating that all save, update, merge, expunge, refresh delete and expire operations performed on a parent Order instance should also be performed on the child Item instances attached to it (save and update are cascaded using the save_or_update() method, so that the database identity of the instance doesn't matter). The delete-orphan cascade value additionally indicates that if an Item instance is no longer associated with an Order, it should also be deleted. The "all, delete-orphan" cascade argument allows a so-called lifecycle relationship between an Order and an Item object.

The customer relationship specifies only the "save-update" cascade value, indicating most operations will not be cascaded from a parent Order instance to a child User instance, except for if the Order is attached with a particular session, either via the save(), update(), or save-update() method.

Additionally, when a child item is attached to a parent item that specifies the "save-update" cascade value on the relationship, the child is automatically passed to save_or_update() (and the operation is further cascaded to the child item).

Note that cascading doesn't do anything that isn't possible by manually calling Session methods on individual instances within a hierarchy, it merely automates common operations on a group of associated instances.

The default value for cascade on relation()s is save-update, and the private=True keyword argument is a synonym for cascade="all, delete-orphan".

back to section top

SessionTransaction

SessionTransaction is a multi-engine transaction manager, which aggregates one or more Engine/Connection pairs and keeps track of a Transaction object for each one. As the Session receives requests to execute SQL statements, it uses the Connection that is referenced by the SessionTransaction. At commit time, the underyling Session is flushed, and each Transaction is the committed.

Example usage is as follows:

sess = create_session()
trans = sess.create_transaction()
try:
    item1 = sess.query(Item).get(1)
    item2 = sess.query(Item).get(2)
    item1.foo = 'bar'
    item2.bar = 'foo'
except:
    trans.rollback()
    raise
trans.commit()

The SessionTransaction object supports Python 2.5's with statement so that the example above can be written as:

sess = create_session()
with sess.create_transaction():
    item1 = sess.query(Item).get(1)
    item2 = sess.query(Item).get(2)
    item1.foo = 'bar'
    item2.bar = 'foo'

The create_transaction() method creates a new SessionTransaction object but does not declare any connection/transaction resources. At the point of the first get() call, a connection resource is opened off the engine that corresponds to the Item classes' mapper and is stored within the SessionTransaction with an open Transaction. When trans.commit() is called, the flush() method is called on the Session and the corresponding update statements are issued to the database within the scope of the transaction already opened; afterwards, the underying Transaction is committed, and connection resources are freed.

SessionTransaction, like the Transaction off of Connection also supports "nested" behavior, and is safe to pass to other functions which then issue their own begin()/commit() pair; only the outermost begin()/commit() pair actually affects the transaction, and any call to rollback() within a particular call stack will issue a rollback.

Note that while SessionTransaction is capable of tracking multiple transactions across multiple databases, it currently is in no way a fully functioning two-phase commit engine; generally, when dealing with multiple databases simultaneously, there is the distinct possibility that a transaction can succeed on the first database and fail on the second, which for some applications may be an invalid state. If this is an issue, its best to either refrain from spanning transactions across databases, or to look into some of the available technologies in this area, such as Zope which offers a two-phase commit engine; some users have already created their own SQLAlchemy/Zope hybrid implementations to deal with scenarios like these.

SessionTransaction Facts:

  • SessionTransaction, like its parent Session object, is not threadsafe.

Using SQL with SessionTransaction

The SessionTransaction can interact with direct SQL queries in two general ways. Either specific Connection objects can be associated with the SessionTransaction, which are then useable both for direct SQL as well as within flush() operations performed by the SessionTransaction, or via accessing the Connection object automatically referenced within the SessionTransaction.

To associate a specific Connection with the SessionTransaction, use the add() method:

Associate a Connection with the SessionTransaction
connection = engine.connect()
trans = session.create_transaction()
try:
    trans.add(connection)
    connection.execute(mytable.update(), {'col1':4, 'col2':17})
    session.flush() # flush() operation will use the same connection
except:
    trans.rollback()
    raise
trans.commit()

The add() method will key the Connection's underlying Engine to this SessionTransaction. When mapper operations are performed against this Engine, the Connection explicitly added will be used. This overrides any other Connection objects that the underlying Session was associated with, corresponding to the underlying Engine of that Connection. However, if the SessionTransaction itself is already associated with a Connection, then an exception is thrown.

The other way is just to use the Connection referenced by the SessionTransaction. This is performed via the connection() method, and requires passing in a class or Mapper which indicates which underlying Connection should be returned (recall that different Mappers may use different underlying Engines). If the class_or_mapper argument is None, then the Session must be globally bound to a specific Engine when it was constructed, else the method returns None.

Get a Connection from the SessionTransaction
trans = session.create_transaction()
try:
    connection = trans.connection(UserClass)   # get the Connection used by the UserClass' Mapper
    connection.execute(mytable.update(), {'col1':4, 'col2':17})
except:
    trans.rollback()
    raise
trans.commit()

The connection() method also exists on the Session object itself, and can be called regardless of whether or not a SessionTransaction is in progress. If a SessionTransaction is in progress, it will return the connection referenced by the transaction. If an Engine is being used with threadlocal strategy, the Connection returned will correspond to the connection resources that are bound to the current thread, if any (i.e. it is obtained by calling contextual_connection()).

back to section top

Using Engine-level Transactions with Sessions

The transactions issued by SessionTransaction as well as internally by the Session's flush() operation use the same Transaction object off of Connection that is publically available. Recall that this object supports "nestable" behavior, meaning any number of actors can call begin() off a particular Connection object, and they will all be managed within the scope of a single transaction. Therefore, the flush() operation can similarly take place within the scope of a regular Transaction:

Transactions with Sessions
connection = engine.connect()   # Connection
session = create_session(bind_to=connection) # Session bound to the Connection
trans = connection.begin()      # start transaction
try:
    stuff = session.query(MyClass).select()     # Session operation uses connection
    stuff[2].foo = 'bar'
    connection.execute(mytable.insert(), dict(id=12, value="bar"))    # use connection explicitly
    session.flush()     # Session flushes with "connection", using transaction "trans"
except:
    trans.rollback()    # or rollback
    raise
trans.commit()      # commit
back to section top

Analyzing Object Flushes

The session module can log an extensive display of its "flush plans", which is a graph of its internal representation of objects before they are written to the database. To turn this logging on:

# make an Session with echo_uow
session = create_session(echo_uow=True)

The flush() operation will then dump to the standard output displays like the following:

Task dump:

 UOWTask(6034768, 'User/users/None')
  |
  |- Save User(6016624)
  |       |-Process User(6016624).addresses
  |
  |- UOWTask(6034832, 'Address/email_addresses/None')
  |   |- Save Address(6034384)
  |   |- Save Address(6034256)
  |   |----
  | 
  |----

The above graph can be read straight downwards to determine the order of operations. It indicates "save User 6016624, process each element in the 'addresses' list on User 6016624, save Address 6034384, Address 6034256".

Of course, one can also get a good idea of the order of operations just by logging the actual SQL statements executed.

back to section top

This section details all the options available to Mappers, as well as advanced patterns.

To start, heres the tables we will work with again:

from sqlalchemy import *

metadata = MetaData()

# a table to store users
users_table = Table('users', metadata,
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(40)),
    Column('password', String(80))
)

# a table that stores mailing addresses associated with a specific user
addresses_table = Table('addresses', metadata,
    Column('address_id', Integer, primary_key = True),
    Column('user_id', Integer, ForeignKey("users.user_id")),
    Column('street', String(100)),
    Column('city', String(80)),
    Column('state', String(2)),
    Column('zip', String(10))
)

# a table that stores keywords
keywords_table = Table('keywords', metadata,
    Column('keyword_id', Integer, primary_key = True),
    Column('name', VARCHAR(50))
)

# a table that associates keywords with users
userkeywords_table = Table('userkeywords', metadata,
    Column('user_id', INT, ForeignKey("users")),
    Column('keyword_id', INT, ForeignKey("keywords"))
)

More On Mapper Properties

Overriding Column Names

When mappers are constructed, by default the column names in the Table metadata are used as the names of attributes on the mapped class. This can be customzed within the properties by stating the key/column combinations explicitly:

user_mapper = mapper(User, users_table, properties={
    'id' : users_table.c.user_id,
    'name' : users_table.c.user_name,
})

In the situation when column names overlap in a mapper against multiple tables, columns may be referenced together with a list:

# join users and addresses
usersaddresses = sql.join(users_table, addresses_table, users_table.c.user_id == addresses_table.c.user_id)
m = mapper(User, usersaddresses,   
    properties = {
        'id' : [users_table.c.user_id, addresses_table.c.user_id],
    }
    )
back to section top

Overriding Properties

A common request is the ability to create custom class properties that override the behavior of setting/getting an attribute. Currently, the easiest way to do this in SQLAlchemy is how it would be done in any Python program; define your attribute with a different name, such as "_attribute", and use a property to get/set its value. The mapper just needs to be told of the special name:

class MyClass(object):
    def _set_email(self, email):
       self._email = email
    def _get_email(self):
       return self._email
    email = property(_get_email, _set_email)

mapper(MyClass, mytable, properties = {
   # map the '_email' attribute to the "email" column
   # on the table
   '_email': mytable.c.email
})

It is also possible to route the the select_by and get_by functions on Query using the new property name, by establishing a synonym:

mapper(MyClass, mytable, properties = {
    # map the '_email' attribute to the "email" column
    # on the table
    '_email': mytable.c.email,

    # make a synonym 'email'
    'email' : synonym('_email')
})

# now you can select_by(email)
result = session.query(MyClass).select_by(email='[email protected]')

Synonym can be established with the flag "proxy=True", to create a class-level proxy to the actual property. This has the effect of creating a fully functional synonym on class instances:

mapper(MyClass, mytable, properties = {
    '_email': mytable.c.email
    'email' : synonym('_email', proxy=True)
})

x = MyClass()
x.email = '[email protected]'

>>> x._email
'[email protected]'
back to section top

Custom List Classes

Feature Status: Alpha API

A one-to-many or many-to-many relationship results in a list-holding element being attached to all instances of a class. The actual list is an "instrumented" list, which transparently maintains a relationship to a plain Python list. The implementation of the underlying plain list can be changed to be any object that implements a list-style append and __iter__ method. A common need is for a list-based relationship to actually be a dictionary. This can be achieved by subclassing dict to have list-like behavior.

In this example, a class MyClass is defined, which is associated with a parent object MyParent. The collection of MyClass objects on each MyParent object will be a dictionary, storing each MyClass instance keyed to its name attribute.

# a class to be stored in the list
class MyClass(object):
    def __init__(self, name):
        self.name = name

# create a dictionary that will act like a list, and store
# instances of MyClass
class MyDict(dict):
    def append(self, item):
        self[item.name] = item
    def __iter__(self):
        return self.values()

# parent class
class MyParent(object):
    pass

# mappers, constructed normally
mapper(MyClass, myclass_table)
mapper(MyParent, myparent_table, properties={
    'myclasses' : relation(MyClass, collection_class=MyDict)
})

# elements on 'myclasses' can be accessed via string keyname
myparent = MyParent()
myparent.myclasses.append(MyClass('this is myclass'))
myclass = myparent.myclasses['this is myclass']
back to section top

Custom Join Conditions

When creating relations on a mapper, most examples so far have illustrated the mapper and relationship joining up based on the foreign keys of the tables they represent. in fact, this "automatic" inspection can be completely circumvented using the primaryjoin and secondaryjoin arguments to relation, as in this example which creates a User object which has a relationship to all of its Addresses which are in Boston:

class User(object):
    pass
class Address(object):
    pass

mapper(Address, addresses_table)
mapper(User, users_table, properties={
    'boston_addresses' : relation(Address, primaryjoin=
                and_(users_table.c.user_id==Address.c.user_id, 
                Addresses.c.city=='Boston'))
})

Many to many relationships can be customized by one or both of primaryjoin and secondaryjoin, shown below with just the default many-to-many relationship explicitly set:

class User(object):
    pass
class Keyword(object):
    pass
mapper(Keyword, keywords_table)
mapper(User, users_table, properties={
    'keywords':relation(Keyword, secondary=userkeywords_table,
        primaryjoin=users_table.c.user_id==userkeywords_table.c.user_id,
        secondaryjoin=userkeywords_table.c.keyword_id==keywords_table.c.keyword_id
        )
})
back to section top

Lazy/Eager Joins Multiple Times to One Table

The previous example leads in to the idea of joining against the same table multiple times. Below is a User object that has lists of its Boston and New York addresses:

mapper(User, users_table, properties={
    'boston_addresses' : relation(Address, primaryjoin=
                and_(users_table.c.user_id==Address.c.user_id, 
                Addresses.c.city=='Boston')),
    'newyork_addresses' : relation(Address, primaryjoin=
                and_(users_table.c.user_id==Address.c.user_id, 
                Addresses.c.city=='New York')),
})

Both lazy and eager loading support multiple joins equally well.

back to section top

Deferred Column Loading

This feature allows particular columns of a table to not be loaded by default, instead being loaded later on when first referenced. It is essentailly "column-level lazy loading". This feature is useful when one wants to avoid loading a large text or binary field into memory when its not needed. Individual columns can be lazy loaded by themselves or placed into groups that lazy-load together.

book_excerpts = Table('books', db, 
    Column('book_id', Integer, primary_key=True),
    Column('title', String(200), nullable=False),
    Column('summary', String(2000)),
    Column('excerpt', String),
    Column('photo', Binary)
)

class Book(object):
    pass

# define a mapper that will load each of 'excerpt' and 'photo' in 
# separate, individual-row SELECT statements when each attribute
# is first referenced on the individual object instance
mapper(Book, book_excerpts, properties = {
    'excerpt' : deferred(book_excerpts.c.excerpt),
    'photo' : deferred(book_excerpts.c.photo)
})

Deferred columns can be placed into groups so that they load together:

book_excerpts = Table('books', db, 
    Column('book_id', Integer, primary_key=True),
    Column('title', String(200), nullable=False),
    Column('summary', String(2000)),
    Column('excerpt', String),
    Column('photo1', Binary),
    Column('photo2', Binary),
    Column('photo3', Binary)
)

class Book(object):
    pass

# define a mapper with a 'photos' deferred group.  when one photo is referenced,
# all three photos will be loaded in one SELECT statement.  The 'excerpt' will 
# be loaded separately when it is first referenced.
mapper(Book, book_excerpts, properties = {
    'excerpt' : deferred(book_excerpts.c.excerpt),
    'photo1' : deferred(book_excerpts.c.photo1, group='photos'),
    'photo2' : deferred(book_excerpts.c.photo2, group='photos'),
    'photo3' : deferred(book_excerpts.c.photo3, group='photos')
})

You can defer or undefer columns at the Query level with the options method:

query = session.query(Book)
query.options(defer('summary')).select()
query.options(undefer('excerpt')).select()
back to section top

Working with Large Collections

SQLAlchemy relations are generally simplistic; the lazy loader loads in the full list of child objects when accessed, and the eager load builds a query that loads the full list of child objects. Additionally, when you are deleting a parent object, SQLAlchemy ensures that it has loaded the full list of child objects so that it can mark them as deleted as well (or to update their parent foreign key to NULL). It does not issue an en-masse "delete from table where parent_id=?" type of statement in such a scenario. This is because the child objects themselves may also have further dependencies, and additionally may also exist in the current session in which case SA needs to know their identity so that their state can be properly updated.

So there are several techniques that can be used individually or combined together to address these issues, in the context of a large collection where you normally would not want to load the full list of relationships:

  • Use lazy=None to disable child object loading (i.e. noload)

    mapper(MyClass, table, properties=relation{
        'children':relation(MyOtherClass, lazy=None)
    })
  • To load child objects, just use a query:

    class Organization(object):
        def __init__(self, name):
            self.name = name
        def find_members(self, criterion):
            """locate a subset of the members associated with this Organization"""
            return object_session(self).query(Member).select(and_(member_table.c.name.like(criterion), org_table.c.org_id==self.org_id), from_obj=[org_table.join(member_table)])
  • Use passive_deletes=True to disable child object loading on a DELETE operation, in conjunction with "ON DELETE (CASCADE|SET NULL)" on your database to automatically cascade deletes to child objects. Note that "ON DELETE" is not supported on SQLite, and requires InnoDB tables when using MySQL:

    mytable = Table('mytable', meta,
        Column('id', Integer, primary_key=True),
        )
    
    myothertable = Table('myothertable', meta,
        Column('id', Integer, primary_key=True),
        Column('parent_id', Integer),
        ForeignKeyConstraint(['parent_id'],['mytable.id'], ondelete="CASCADE"),
        )
    
    mmapper(MyOtherClass, myothertable)
    
    mapper(MyClass, mytable, properties={
        'children':relation(MyOtherClass, passive_deletes=True)
    })
  • As an alternative to using "ON DELETE CASCADE", for very simple scenarios you can create a simple MapperExtension that will issue a DELETE for child objects before the parent object is deleted:

    class DeleteMemberExt(MapperExtension):
        def before_delete(self, mapper, connection, instance):
            connection.execute(member_table.delete(member_table.c.org_id==instance.org_id))
    
    mapper(Organization, org_table, extension=DeleteMemberExt(), properties = {
        'members' : relation(Member, lazy=None, passive_deletes=True, cascade="all, delete-orphan")
    })

Note that this approach is not nearly as efficient or general-purpose as "ON DELETE CASCADE", since the database itself can cascade the operation along any number of tables.

The latest distribution includes an example examples/collection/large_collection.py which illustrates most of these techniques.

back to section top

Relation Options

Options which can be sent to the relation() function. For arguments to mapper(), see Mapper Options.

  • association - Deprecated; as of version 0.3.0 the association keyword is synonomous with applying the "all, delete-orphan" cascade to a "one-to-many" relationship. SA can now automatically reconcile a "delete" and "insert" operation of two objects with the same "identity" in a flush() operation into a single "update" statement, which is the pattern that "association" used to indicate. See the updated example of association mappings in Association Object.
  • backref - indicates the name of a property to be placed on the related mapper's class that will handle this relationship in the other direction, including synchronizing the object attributes on both sides of the relation. Can also point to a backref() construct for more configurability. See Backreferences.
  • cascade - a string list of cascade rules which determines how persistence operations should be "cascaded" from parent to child. For a description of cascade rules, see Lifecycle Relations and Cascade rules.
  • collection_class - a class or function that returns a new list-holding object. will be used in place of a plain list for storing elements. See Custom List Classes.
  • foreign_keys - a list of columns which are to be used as "foreign key" columns. this parameter should be used in conjunction with explicit primaryjoin and secondaryjoin (if needed) arguments, and the columns within the foreign_keys list should be present within those join conditions. Normally, relation() will inspect the columns within the join conditions to determine which columns are the "foreign key" columns, based on information in the Table metadata. Use this argument when no ForeignKey's are present in the join condition, or to override the table-defined foreign keys.
  • foreignkey - deprecated. use the foreign_keys argument for foreign key specification, or remote_side for "directional" logic.
  • lazy=True - specifies how the related items should be loaded. a value of True indicates they should be loaded lazily when the property is first accessed. A value of False indicates they should be loaded by joining against the parent object query, so parent and child are loaded in one round trip (i.e. eagerly). A value of None indicates the related items are not loaded by the mapper in any case; the application will manually insert items into the list in some other way. In all cases, items added or removed to the parent object's collection (or scalar attribute) will cause the appropriate updates and deletes upon flush(), i.e. this option only affects load operations, not save operations.

  • order_by - indicates the ordering that should be applied when loading these items. See the section Controlling Ordering for details.
  • passive_deletes=False - Indicates if lazy-loaders should not be executed during the flush() process, which normally occurs in order to locate all existing child items when a parent item is to be deleted. Setting this flag to True is appropriate when ON DELETE CASCADE rules have been set up on the actual tables so that the database may handle cascading deletes automatically. This strategy is useful particularly for handling the deletion of objects that have very large (and/or deep) child-object collections. See the example in Working with Large Collections.
  • post_update - this indicates that the relationship should be handled by a second UPDATE statement after an INSERT or before a DELETE. Currently, it also will issue an UPDATE after the instance was UPDATEd as well, although this technically should be improved. This flag is used to handle saving bi-directional dependencies between two individual rows (i.e. each row references the other), where it would otherwise be impossible to INSERT or DELETE both rows fully since one row exists before the other. Use this flag when a particular mapping arrangement will incur two rows that are dependent on each other, such as a table that has a one-to-many relationship to a set of child rows, and also has a column that references a single child row within that list (i.e. both tables contain a foreign key to each other). If a flush() operation returns an error that a "cyclical dependency" was detected, this is a cue that you might want to use post_update to "break" the cycle.
  • primaryjoin - a ClauseElement that will be used as the primary join of this child object against the parent object, or in a many-to-many relationship the join of the primary object to the association table. By default, this value is computed based on the foreign key relationships of the parent and child tables (or association table).
  • private=False - deprecated. setting private=True is the equivalent of setting cascade="all, delete-orphan", and indicates the lifecycle of child objects should be contained within that of the parent. See the example in Lifecycle Relations.
  • remote_side - used for self-referential relationships, indicates the column or list of columns that form the "remote side" of the relationship. See the examples in Self Referential Mappers.
  • secondary - for a many-to-many relationship, specifies the intermediary table. The secondary keyword argument should generally only be used for a table that is not otherwise expressed in any class mapping. In particular, using the Association Object Pattern is generally mutually exclusive against using the secondary keyword argument.
  • secondaryjoin - a ClauseElement that will be used as the join of an association table to the child object. By default, this value is computed based on the foreign key relationships of the association and child tables.
  • uselist=(True|False) - a boolean that indicates if this property should be loaded as a list or a scalar. In most cases, this value is determined automatically by relation(), based on the type and direction of the relationship - one to many forms a list, many to one forms a scalar, many to many is a list. If a scalar is desired where normally a list would be present, such as a bi-directional one-to-one relationship, set uselist to False.
  • viewonly=False - when set to True, the relation is used only for loading objects within the relationship, and has no effect on the unit-of-work flush process. Relations with viewonly can specify any kind of join conditions to provide additional views of related objects onto a parent object. Note that the functionality of a viewonly relationship has its limits - complicated join conditions may not compile into eager or lazy loaders properly. If this is the case, use an alternative method, such as those described in Working with Large Collections, Result-Set Mapping, or Mapping a Class against Arbitrary Selects.
back to section top

Controlling Ordering

By default, mappers will attempt to ORDER BY the "oid" column of a table, or the primary key column, when selecting rows. This can be modified in several ways.

The "order_by" parameter can be sent to a mapper, overriding the per-engine ordering if any. A value of None means that the mapper should not use any ordering. A non-None value, which can be a column, an asc or desc clause, or an array of either one, indicates the ORDER BY clause that should be added to all select queries:

# disable all ordering
mapper = mapper(User, users_table, order_by=None)

# order by a column
mapper = mapper(User, users_table, order_by=users_tableusers_table.c.user_id)

# order by multiple items
mapper = mapper(User, users_table, order_by=[users_table.c.user_id, desc(users_table.c.user_name)])

"order_by" can also be specified to an individual select method, overriding all other per-engine/per-mapper orderings:

# order by a column
l = mapper.select(users_table.c.user_name=='fred', order_by=users_table.c.user_id)

# order by multiple criterion
l = mapper.select(users_table.c.user_name=='fred', order_by=[users_table.c.user_id, desc(users_table.c.user_name)])

For relations, the "order_by" property can also be specified to all forms of relation:

# order address objects by address id
mapper = mapper(User, users_table, properties = {
    'addresses' : relation(mapper(Address, addresses_table), order_by=addresses_table.c.address_id)
})

# eager load with ordering - the ORDER BY clauses of parent/child will be organized properly
mapper = mapper(User, users_table, properties = {
    'addresses' : relation(mapper(Address, addresses_table), order_by=desc(addresses_table.c.email_address), lazy=False)
}, order_by=users_table.c.user_id)
back to section top

Limiting Rows

You can limit rows in a regular SQL query by specifying limit and offset. A Mapper can handle the same concepts:

class User(object):
    pass

mapper(User, users_table)
sqlr = session.query(User).select(limit=20, offset=10)

However, things get tricky when dealing with eager relationships, since a straight LIMIT of rows does not represent the count of items when joining against other tables to load related items as well. So here is what SQLAlchemy will do when you use limit or offset with an eager relationship:

class User(object):
    pass
class Address(object):
    pass
    mapper(User, users_table, properties={
    'addresses' : relation(mapper(Address, addresses_table), lazy=False)
})
r = session.query(User).select(User.c.user_name.like('F%'), limit=20, offset=10)
SELECT users.user_id AS users_user_id, users.user_name AS users_user_name,
users.password AS users_password, addresses.address_id AS addresses_address_id,
addresses.user_id AS addresses_user_id, addresses.street AS addresses_street,
addresses.city AS addresses_city, addresses.state AS addresses_state,
addresses.zip AS addresses_zip
FROM
(SELECT users.user_id FROM users WHERE users.user_name LIKE %(users_user_name)s
ORDER BY users.oid LIMIT 20 OFFSET 10) AS rowcount,
users LEFT OUTER JOIN addresses ON users.user_id = addresses.user_id
WHERE rowcount.user_id = users.user_id ORDER BY users.oid, addresses.oid
{'users_user_name': 'F%'}

The main WHERE clause as well as the limiting clauses are coerced into a subquery; this subquery represents the desired result of objects. A containing query, which handles the eager relationships, is joined against the subquery to produce the result.

back to section top

Mapping a Class with Table Inheritance

Feature Status: Alpha Implementation

Inheritance in databases comes in three forms: single table inheritance, where several types of classes are stored in one table, concrete table inheritance, where each type of class is stored in its own table, and multiple table inheritance, where the parent/child classes are stored in their own tables that are joined together in a select.

There is also a concept of polymorphic loading, which indicates if multiple kinds of classes can be loaded in one pass.

SQLAlchemy supports all three kinds of inheritance. Additionally, true polymorphic loading is supported in a straightfoward way for single table inheritance, and has some more manually-configured features that can make it happen for concrete and multiple table inheritance.

Working examples of polymorphic inheritance come with the distribution in the directory examples/polymorphic.

Here are the classes we will use to represent an inheritance relationship:

class Employee(object):
    def __init__(self, name):
        self.name = name
    def __repr__(self):
        return self.__class__.__name__ + " " + self.name

class Manager(Employee):
    def __init__(self, name, manager_data):
        self.name = name
        self.manager_data = manager_data
    def __repr__(self):
        return self.__class__.__name__ + " " + self.name + " " +  self.manager_data

class Engineer(Employee):
    def __init__(self, name, engineer_info):
        self.name = name
        self.engineer_info = engineer_info
    def __repr__(self):
        return self.__class__.__name__ + " " + self.name + " " +  self.engineer_info

Each class supports a common name attribute, while the Manager class has its own attribute manager_data and the Engineer class has its own attribute engineer_info.

Single Table Inheritance

This will support polymorphic loading via the Employee mapper.

employees_table = Table('employees', metadata, 
    Column('employee_id', Integer, primary_key=True),
    Column('name', String(50)),
    Column('manager_data', String(50)),
    Column('engineer_info', String(50)),
    Column('type', String(20))
)

employee_mapper = mapper(Employee, employees_table, polymorphic_on=employees_table.c.type)
manager_mapper = mapper(Manager, inherits=employee_mapper, polymorphic_identity='manager')
engineer_mapper = mapper(Engineer, inherits=employee_mapper, polymorphic_identity='engineer')
back to section top

Concrete Table Inheritance

Without polymorphic loading, you just define a separate mapper for each class.

Concrete Inheritance, Non-polymorphic
managers_table = Table('managers', metadata, 
    Column('employee_id', Integer, primary_key=True),
    Column('name', String(50)),
    Column('manager_data', String(50)),
)

engineers_table = Table('engineers', metadata, 
    Column('employee_id', Integer, primary_key=True),
    Column('name', String(50)),
    Column('engineer_info', String(50)),
)

manager_mapper = mapper(Manager, managers_table)
engineer_mapper = mapper(Engineer, engineers_table)

With polymorphic loading, the SQL query to do the actual polymorphic load must be constructed, usually as a UNION. There is a helper function to create these UNIONS called polymorphic_union.

Concrete Inheritance, Polymorphic
pjoin = polymorphic_union({
    'manager':managers_table,
    'engineer':engineers_table
}, 'type', 'pjoin')

employee_mapper = mapper(Employee, pjoin, polymorphic_on=pjoin.c.type)
manager_mapper = mapper(Manager, managers_table, inherits=employee_mapper, concrete=True, polymorphic_identity='manager')
engineer_mapper = mapper(Engineer, engineers_table, inherits=employee_mapper, concrete=True, polymorphic_identity='engineer')

A future release of SQLALchemy might better merge the generated UNION into the mapper construction phase.

back to section top

Multiple Table Inheritance

Like concrete table inheritance, this can be done non-polymorphically, or with a little more complexity, polymorphically:

Multiple Table Inheritance, Non-polymorphic
employees = Table('employees', metadata, 
   Column('person_id', Integer, primary_key=True),
   Column('name', String(50)),
   Column('type', String(30)))

engineers = Table('engineers', metadata, 
   Column('person_id', Integer, ForeignKey('employees.person_id'), primary_key=True),
   Column('engineer_info', String(50)),
  )

managers = Table('managers', metadata, 
   Column('person_id', Integer, ForeignKey('employees.person_id'), primary_key=True),
   Column('manager_data', String(50)),
   )

person_mapper = mapper(Employee, employees)
mapper(Engineer, engineers, inherits=person_mapper)
mapper(Manager, managers, inherits=person_mapper)

Polymorphic:

Multiple Table Inheritance, Polymorphic
person_join = polymorphic_union(
    {
        'engineer':employees.join(engineers),
        'manager':employees.join(managers),
        'person':employees.select(employees.c.type=='person'),
    }, None, 'pjoin')

person_mapper = mapper(Employee, employees, select_table=person_join, polymorphic_on=person_join.c.type, polymorphic_identity='person')
mapper(Engineer, engineers, inherits=person_mapper, polymorphic_identity='engineer')
mapper(Manager, managers, inherits=person_mapper, polymorphic_identity='manager')

The join condition in a multiple table inheritance relationship can be specified explicitly, using inherit_condition:

AddressUser.mapper = mapper(
        AddressUser,
        addresses_table, inherits=User.mapper, 
        inherit_condition=users_table.c.user_id==addresses_table.c.user_id
    )
back to section top

Mapping a Class against Multiple Tables

Mappers can be constructed against arbitrary relational units (called Selectables) as well as plain Tables. For example, The join keyword from the SQL package creates a neat selectable unit comprised of multiple tables, complete with its own composite primary key, which can be passed in to a mapper as the table.

# a class
class AddressUser(object):
    pass

# define a Join
j = join(users_table, addresses_table)

# map to it - the identity of an AddressUser object will be 
# based on (user_id, address_id) since those are the primary keys involved
m = mapper(AddressUser, j, properties={
    'user_id':[users_table.c.user_id, addresses_table.c.user_id]
})

A second example:

# many-to-many join on an association table
j = join(users_table, userkeywords, 
        users_table.c.user_id==userkeywords.c.user_id).join(keywords, 
           userkeywords.c.keyword_id==keywords.c.keyword_id)

# a class 
class KeywordUser(object):
    pass

# map to it - the identity of a KeywordUser object will be
# (user_id, keyword_id) since those are the primary keys involved
m = mapper(KeywordUser, j, properties={
    'user_id':[users_table.c.user_id, userkeywords.c.user_id],
    'keyword_id':[userkeywords.c.keyword_id, keywords.c.keyword_id]
})

In both examples above, "composite" columns were added as properties to the mappers; these are aggregations of multiple columns into one mapper property, which instructs the mapper to keep both of those columns set at the same value.

back to section top

Mapping a Class against Arbitrary Selects

Similar to mapping against a join, a plain select() object can be used with a mapper as well. Below, an example select which contains two aggregate functions and a group_by is mapped to a class:

s = select([customers, 
            func.count(orders).label('order_count'), 
            func.max(orders.price).label('highest_order')],
            customers.c.customer_id==orders.c.customer_id,
            group_by=[c for c in customers.c]
            ).alias('somealias')
class Customer(object):
    pass

m = mapper(Customer, s)

Above, the "customers" table is joined against the "orders" table to produce a full row for each customer row, the total count of related rows in the "orders" table, and the highest price in the "orders" table, grouped against the full set of columns in the "customers" table. That query is then mapped against the Customer class. New instances of Customer will contain attributes for each column in the "customers" table as well as an "order_count" and "highest_order" attribute. Updates to the Customer object will only be reflected in the "customers" table and not the "orders" table. This is because the primary keys of the "orders" table are not represented in this mapper and therefore the table is not affected by save or delete operations.

back to section top

Multiple Mappers for One Class

The first mapper created for a certain class is known as that class's "primary mapper." Other mappers can be created as well, these come in two varieties.

  • secondary mapper - this is a mapper that must be constructed with the keyword argument non_primary=True, and represents a load-only mapper. Objects that are loaded with a secondary mapper will have their save operation processed by the primary mapper. It is also invalid to add new relation()s to a non-primary mapper. To use this mapper with the Session, specify it to the query method:

example:

# primary mapper
mapper(User, users_table)

# make a secondary mapper to load User against a join
othermapper = mapper(User, users_table.join(someothertable), non_primary=True)

# select
result = session.query(othermapper).select()
  • entity name mapper - this is a mapper that is a fully functioning primary mapper for a class, which is distinguished from the regular primary mapper by an entity_name parameter. Instances loaded with this mapper will be totally managed by this new mapper and have no connection to the original one. Most methods on Session include an optional entity_name parameter in order to specify this condition.

example:

# primary mapper
mapper(User, users_table)

# make an entity name mapper that stores User objects in another table
mapper(User, alternate_users_table, entity_name='alt')

# make two User objects
user1 = User()
user2 = User()

# save one in in the "users" table
session.save(user1)

# save the other in the "alternate_users_table"
session.save(user2, entity_name='alt')

session.flush()

# select from the alternate mapper
session.query(User, entity_name='alt').select()
back to section top

Self Referential Mappers

A self-referential mapper is a mapper that is designed to operate with an adjacency list table. This is a table that contains one or more foreign keys back to itself, and is usually used to create hierarchical tree structures. SQLAlchemy's default model of saving items based on table dependencies is not sufficient in this case, as an adjacency list table introduces dependencies between individual rows. Fortunately, SQLAlchemy will automatically detect a self-referential mapper and do the extra lifting to make it work.

# define a self-referential table
trees = Table('treenodes', engine,
    Column('node_id', Integer, primary_key=True),
    Column('parent_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True),
    Column('node_name', String(50), nullable=False),
    )

# treenode class
class TreeNode(object):
    pass

# mapper defines "children" property, pointing back to TreeNode class,
# with the mapper unspecified.  it will point back to the primary 
# mapper on the TreeNode class.
TreeNode.mapper = mapper(TreeNode, trees, properties={
        'children' : relation(
                        TreeNode, 
                        cascade="all"
                     ),
        }
    )

This kind of mapper goes through a lot of extra effort when saving and deleting items, to determine the correct dependency graph of nodes within the tree.

A self-referential mapper where there is more than one relationship on the table requires that all join conditions be explicitly spelled out. Below is a self-referring table that contains a "parent_node_id" column to reference parent/child relationships, and a "root_node_id" column which points child nodes back to the ultimate root node:

# define a self-referential table with several relations
trees = Table('treenodes', engine,
    Column('node_id', Integer, primary_key=True),
    Column('parent_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True),
    Column('root_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True),
    Column('node_name', String(50), nullable=False),
    )

# treenode class
class TreeNode(object):
    pass

# define the "children" property as well as the "root" property
mapper(TreeNode, trees, properties={
        'children' : relation(
                        TreeNode, 
                        primaryjoin=trees.c.parent_node_id==trees.c.node_id
                        cascade="all",
                        backref=backref("parent", remote_side=[trees.c.node_id])
                     ),
        'root' : relation(
                TreeNode,
                primaryjoin=trees.c.root_node_id=trees.c.node_id, 
                remote_side=[trees.c.node_id],
                uselist=False
            )
        }
    )

The "root" property on a TreeNode is a many-to-one relationship. By default, a self-referential mapper declares relationships as one-to-many, so the extra parameter remote_side, pointing to a column or list of columns on the remote side of a relationship, is needed to indicate a "many-to-one" self-referring relationship (note the previous keyword argument foreignkey is deprecated). Both TreeNode examples above are available in functional form in the examples/adjacencytree directory of the distribution.

back to section top

Result-Set Mapping

Take any result set and feed it into a Query to produce objects. Multiple mappers can be combined to retrieve unrelated objects from the same row in one step. The instances method on Query takes a ResultProxy object, which is the result type generated from SQLEngine, and delivers object instances. (note: this method has been moved off of Mapper, where it is deprecated).

class User(object):
    pass

mapper(User, users_table)

# select users
c = users_table.select().execute()

# get objects
userlist = session.query(User).instances(c)

# define a second class/mapper
class Address(object):
    pass

mapper(Address, addresses_table)

# select users and addresses in one query
# use_labels is so that the user_id column in both tables are distinguished
s = select([users_table, addresses_table], users_table.c.user_id==addresses_table.c.user_id, use_labels=True)

# execute it, and process the results with the User mapper, chained to the Address mapper
r = session.query(User).instances(s.execute(), class_mapper(Address))

# result rows are an array of objects, one for each mapper used
for entry in r:
    user = r[0]
    address = r[1]

Combining Eager Loads with Result Set Mappings

When result-set mapping is used with a particular Mapper, SQLAlchemy has no access to the query being used and therefore has no way of tacking on its own LEFT OUTER JOIN conditions that are normally used to eager load relationships. If the query being constructed is created in such a way that it returns rows not just from a parent table (or tables) but also returns rows from child tables, the result-set mapping can be notified as to which additional properties are contained within the result set. This is done using the contains_eager() query option, which specifies the name of the relationship to be eagerly loaded, and optionally a decorator function that can translate aliased column names when results are received.

# mapping is the users->addresses mapping
mapper(User, users_table, properties={
    'addresses':relation(Address, addresses_table, lazy=False)
})

# define a query on USERS with an outer join to ADDRESSES
statement = users_table.outerjoin(addresses_table).select(use_labels=True)

# construct a Query object which expects the "addresses" results 
query = session.query(User).options(contains_eager('addresses'))

# get results normally
r = query.instances(statement.execute())

It is often the case with large queries that some of the tables within the query need to be aliased in order to distinguish them from other occurences of the same table within the query. A query that attempts to add eagerly loaded child items will often have this condition. The contains_eager() function takes a keyword argument alias which can either be the string name of an alias, or an actual Alias construct used in constructing the query, which will target the eager loading towards the columns of that alias (new in version 0.3.5):

# use an alias of the addresses table
adalias = addresses_table.alias('adalias')

# define a query on USERS with an outer join to adalias
statement = users_table.outerjoin(adalias).select(use_labels=True)

# construct a Query object which expects the "addresses" results 
query = session.query(User).options(contains_eager('addresses', alias=adalias))

# get results normally
sqlr = query.instances(statement.execute())

In the case that the main table itself is also aliased, the contains_alias() option can be used (new in version 0.3.5):

# define an aliased UNION called 'ulist'
statement = users.select(users.c.user_id==7).union(users.select(users.c.user_id>7)).alias('ulist')

# add on an eager load of "addresses"
statement = statement.outerjoin(addresses).select(use_labels=True)

# create query, indicating "ulist" is an alias for the main table, "addresses" property should
# be eager loaded
query = create_session().query(User).options(contains_alias('ulist'), contains_eager('addresses'))

# results
r = query.instances(statement.execute())
back to section top

Mapper Options

Options which can be sent to the mapper() function. For arguments to relation(), see Relation Options.

  • allow_column_override - if True, allows the usage of a relation() which has the same name as a column in the mapped table. The table column will no longer be mapped.
  • allow_null_pks=False - indicates that composite primary keys where one or more (but not all) columns contain NULL is a valid primary key. Primary keys which contain NULL values usually indicate that a result row does not contain an entity and should be skipped.
  • always_refresh=False - if True, all query operations for this mapped class will overwrite all data within object instances that already exist within the session, erasing any in-memory changes with whatever information was loaded from the database.
  • batch=True - when False, indicates that when a mapper is persisting a list of instances, each instance will be fully saved to the database before moving onto the next instance. Normally, inserts and updates are batched together per-table, such as for an inheriting mapping that spans multiple tables. This flag is for rare circumstances where custom MapperExtension objects are used to attach logic to before_insert(), before_update(), etc., and the user-defined logic requires that the full persistence of each instance must be completed before moving onto the next (such as logic which queries the tables for the most recent ID). Note that this flag has a significant impact on the efficiency of a large save operation.
  • column_prefix - a string which will be prepended to the "key" name of all Columns when creating column-based properties from the given Table. Does not affect explicitly specified column-based properties. Setting column_prefix='_' is equivalent to defining all column-based properties as _columnname=table.c.columnname. See Overriding Column Names for information on overriding column-based attribute names.
  • concrete - if True, indicates this mapper should use concrete table inheritance with its parent mapper. Requires inherits to be set.
  • entity_name - defines this mapping as local to a particular class of entities within a single class. Allows alternate persistence mappings for a single class. See Multiple Mappers for One Class.
  • extension - a MapperExtension instance or list of MapperExtension instances which will be applied to all operations by this Mapper. See Extending Mapper.
  • inherits - another Mapper or class for which this Mapper will have an inheritance relationship with. See the examples in Mapping a Class with Table Inheritance.
  • inherit_condition - for joined table inheritance, a SQL expression (constructed ClauseElement) which will define how the two tables are joined; defaults to a natural join between the two tables.
  • non_primary=False - if True, construct a Mapper that will define only the selection of instances, not their persistence. It essentially creates a mapper that can be used for querying but does not define how instances of the class are stored. A non_primary mapper is always created after a regular primary mapper has already been created for the class. To use one, send it in place of the class argument when creating a query, such as session.query(somemapper). Note that it is usually invalid to define additional relationships on a non_primary mapper as they will conflict with those of the primary. See Multiple Mappers for One Class.
  • order_by - a single Column or list of Columns for which selection operations should use as the default ordering for entities. Defaults to the OID/ROWID of the table if any, or the first primary key column of the table. See Controlling Ordering.
  • polymorphic_on - used with mappers in an inheritance relationship, a Column which will identify the class/mapper combination to be used with a particular row. requires the polymorphic_identity value to be set for all mappers in the inheritance hierarchy.
  • polymorphic_identity - a value which will be stored in the Column denoted by polymorphic_on, corresponding to the "class identity" of this mapper. See Mapping a Class with Table Inheritance.
  • primary_key - a list of Column objects which define the "primary key" to be used against this mapper's selectable unit. The mapper normally determines these automatically from the given local_table of the mapper combined against any inherited tables. When this argument is specified, the primary keys of the mapped table if any are disregarded in place of the columns given. This can be used to provide primary key identity to a table that has no PKs defined at the schema level, or to modify what defines "identity" for a particular table.
  • properties - a dictionary mapping the string names of object attributes to MapperProperty instances, which define the persistence behavior of that attribute. Note that the columns in the mapped table are automatically converted into ColumnProperty instances based on the "key" property of each Column (although they can be overridden using this dictionary).
  • select_table - used with polymorphic mappers, this is a Selectable which will take the place of the Mapper's main table argument when performing queries.
  • version_id_col - a Column which must have an integer type that will be used to keep a running "version id" of mapped entities in the database. This is used during save operations to ensure that no other thread or process has updated the instance during the lifetime of the entity, else a ConcurrentModificationError exception is thrown.
back to section top

Extending Mapper

Mappers can have functionality augmented or replaced at many points in its execution via the usage of the MapperExtension class. This class is just a series of "hooks" where various functionality takes place. An application can make its own MapperExtension objects, overriding only the methods it needs. Methods that are not overridden return the special value sqlalchemy.orm.mapper.EXT_PASS, which indicates the operation should proceed as normally.

class MapperExtension(object):
    """base implementation for an object that provides overriding behavior to various
    Mapper functions.  For each method in MapperExtension, a result of EXT_PASS indicates
    the functionality is not overridden."""
    def get_session(self):
        """called to retrieve a contextual Session instance with which to
        register a new object. Note: this is not called if a session is 
        provided with the __init__ params (i.e. _sa_session)"""
        return EXT_PASS
    def select_by(self, query, *args, **kwargs):
        """overrides the select_by method of the Query object"""
        return EXT_PASS
    def select(self, query, *args, **kwargs):
        """overrides the select method of the Query object"""
        return EXT_PASS
    def create_instance(self, mapper, selectcontext, row, class_):
        """called when a new object instance is about to be created from a row.  
        the method can choose to create the instance itself, or it can return 
        None to indicate normal object creation should take place.

        mapper - the mapper doing the operation

        selectcontext - SelectionContext corresponding to the instances() call

        row - the result row from the database

        class_ - the class we are mapping.
        """
        return EXT_PASS
    def append_result(self, mapper, selectcontext, row, instance, identitykey, result, isnew):
        """called when an object instance is being appended to a result list.

        If this method returns EXT_PASS, it is assumed that the mapper should do the appending, else
        if this method returns any other value or None, it is assumed that the append was handled by this method.

        mapper - the mapper doing the operation

        selectcontext - SelectionContext corresponding to the instances() call

        row - the result row from the database

        instance - the object instance to be appended to the result

        identitykey - the identity key of the instance

        result - list to which results are being appended

        isnew - indicates if this is the first time we have seen this object instance in the current result
        set.  if you are selecting from a join, such as an eager load, you might see the same object instance
        many times in the same result set.
        """
        return EXT_PASS
    def populate_instance(self, mapper, selectcontext, row, instance, identitykey, isnew):
        """called right before the mapper, after creating an instance from a row, passes the row
        to its MapperProperty objects which are responsible for populating the object's attributes.
        If this method returns EXT_PASS, it is assumed that the mapper should do the appending, else
        if this method returns any other value or None, it is assumed that the append was handled by this method.

        Essentially, this method is used to have a different mapper populate the object:

            def populate_instance(self, mapper, selectcontext, instance, row, identitykey, isnew):
                othermapper.populate_instance(selectcontext, instance, row, identitykey, isnew, frommapper=mapper)
                return True
        """
        return EXT_PASS
    def before_insert(self, mapper, connection, instance):
        """called before an object instance is INSERTed into its table.

        this is a good place to set up primary key values and such that arent handled otherwise."""
        return EXT_PASS
    def before_update(self, mapper, connection, instance):
        """called before an object instnace is UPDATED"""
        return EXT_PASS
    def after_update(self, mapper, connection, instance):
        """called after an object instnace is UPDATED"""
        return EXT_PASS
    def after_insert(self, mapper, connection, instance):
        """called after an object instance has been INSERTed"""
        return EXT_PASS
    def before_delete(self, mapper, connection, instance):
        """called before an object instance is DELETEed"""
        return EXT_PASS
    def after_delete(self, mapper, connection, instance):
        """called after an object instance is DELETEed"""
        return EXT_PASS

To use MapperExtension, make your own subclass of it and just send it off to a mapper:

m = mapper(User, users_table, extension=MyExtension())

Multiple extensions will be chained together and processed in order; they are specified as a list:

m = mapper(User, users_table, extension=[ext1, ext2, ext3])
back to section top

The package sqlalchemy.types defines the datatype identifiers which may be used when defining metadata. This package includes a set of generic types, a set of SQL-specific subclasses of those types, and a small extension system used by specific database connectors to adapt these generic types into database-specific type objects.

Built-in Types

SQLAlchemy comes with a set of standard generic datatypes, which are defined as classes.

The standard set of generic types are:

package sqlalchemy.types
class String(TypeEngine):
    def __init__(self, length=None)

class Integer(TypeEngine)

class SmallInteger(Integer)

class Numeric(TypeEngine): 
    def __init__(self, precision=10, length=2)

class Float(Numeric):
    def __init__(self, precision=10)

# DateTime, Date and Time types deal with datetime objects from the Python datetime module
class DateTime(TypeEngine)

class Date(TypeEngine)

class Time(TypeEngine)

class Binary(TypeEngine): 
    def __init__(self, length=None)

class Boolean(TypeEngine)

# converts unicode strings to raw bytes
# as bind params, raw bytes to unicode as 
# rowset values, using the unicode encoding 
# setting on the engine (defaults to 'utf-8')
class Unicode(TypeDecorator):
    impl = String

# uses the pickle protocol to serialize data
# in/out of Binary columns
class PickleType(TypeDecorator):
    impl = Binary

More specific subclasses of these types are available, which various database engines may choose to implement specifically, allowing finer grained control over types:

class FLOAT(Numeric)
class TEXT(String)
class DECIMAL(Numeric)
class INT(Integer)
INTEGER = INT
class TIMESTAMP(DateTime)
class DATETIME(DateTime)
class CLOB(String)
class VARCHAR(String)
class CHAR(String)
class BLOB(Binary)
class BOOLEAN(Boolean)

When using a specific database engine, these types are adapted even further via a set of database-specific subclasses defined by the database engine. There may eventually be more type objects that are defined for specific databases. An example of this would be Postgres' Array type.

Type objects are specified to table meta data using either the class itself, or an instance of the class. Creating an instance of the class allows you to specify parameters for the type, such as string length, numerical precision, etc.:

mytable = Table('mytable', engine, 
    # define type using a class
    Column('my_id', Integer, primary_key=True), 

    # define type using an object instance
    Column('value', Number(7,4)) 
)
back to section top

Creating your Own Types

User-defined types can be created, to support either database-specific types, or customized pre-processing of query parameters as well as post-processing of result set data. You can make your own classes to perform these operations. To augment the behavior of a TypeEngine type, such as String, the TypeDecorator class is used:

import sqlalchemy.types as types

class MyType(types.TypeDecorator):
    """basic type that decorates String, prefixes values with "PREFIX:" on 
    the way in and strips it off on the way out."""
    impl = types.String
    def convert_bind_param(self, value, engine):
        return "PREFIX:" + value
    def convert_result_value(self, value, engine):
        return value[7:]

The Unicode and PickleType classes are instances of TypeDecorator already and can be subclassed directly.

To build a type object from scratch, which will not have a corresponding database-specific implementation, subclass TypeEngine:

import sqlalchemy.types as types

class MyType(types.TypeEngine):
    def __init__(self, precision = 8):
        self.precision = precision
    def get_col_spec(self):
        return "MYTYPE(%s)" % self.precision
    def convert_bind_param(self, value, engine):
        return value
    def convert_result_value(self, value, engine):
        return value
back to section top

This section describes the connection pool module of SQLAlchemy. The Pool object it provides is normally embedded within an Engine instance. For most cases, explicit access to the pool module is not required. However, the Pool object can be used on its own, without the rest of SA, to manage DBAPI connections; this section describes that usage. Also, this section will describe in more detail how to customize the pooling strategy used by an Engine.

At the base of any database helper library is a system of efficiently acquiring connections to the database. Since the establishment of a database connection is typically a somewhat expensive operation, an application needs a way to get at database connections repeatedly without incurring the full overhead each time. Particularly for server-side web applications, a connection pool is the standard way to maintain a "pool" of database connections which are used over and over again among many requests. Connection pools typically are configured to maintain a certain "size", which represents how many connections can be used simultaneously without resorting to creating more newly-established connections.

Establishing a Transparent Connection Pool

Any DBAPI module can be "proxied" through the connection pool using the following technique (note that the usage of 'psycopg2' is just an example; substitute whatever DBAPI module you'd like):

import sqlalchemy.pool as pool
import psycopg2 as psycopg
psycopg = pool.manage(psycopg)

# then connect normally
connection = psycopg.connect(database='test', username='scott', password='tiger')

This produces a sqlalchemy.pool.DBProxy object which supports the same connect() function as the original DBAPI module. Upon connection, a connection proxy object is returned, which delegates its calls to a real DBAPI connection object. This connection object is stored persistently within a connection pool (an instance of sqlalchemy.pool.Pool) that corresponds to the exact connection arguments sent to the connect() function.

The connection proxy supports all of the methods on the original connection object, most of which are proxied via __getattr__(). The close() method will return the connection to the pool, and the cursor() method will return a proxied cursor object. Both the connection proxy and the cursor proxy will also return the underlying connection to the pool after they have both been garbage collected, which is detected via the __del__() method.

Additionally, when connections are returned to the pool, a rollback() is issued on the connection unconditionally. This is to release any locks still held by the connection that may have resulted from normal activity.

By default, the connect() method will return the same connection that is already checked out in the current thread. This allows a particular connection to be used in a given thread without needing to pass it around between functions. To disable this behavior, specify use_threadlocal=False to the manage() function.

back to section top

Connection Pool Configuration

For all types of Pool construction, which includes the "transparent proxy" described in the previous section, using an Engine via create_engine(), or constructing a pool through direct class instantiation, the options are generally the same. Additional options may be available based on the specific subclass of Pool being used.

For a description of all pool classes, see the generated documentation.

Common options include:

  • echo=False : if set to True, connections being pulled and retrieved from/to the pool will be logged to the standard output, as well as pool sizing information. Echoing can also be achieved by enabling logging for the "sqlalchemy.pool" namespace. When using create_engine(), this option is specified as echo_pool.
  • use_threadlocal=False : if set to True, repeated calls to connect() within the same application thread will be guaranteed to return the same connection object, if one has already been retrieved from the pool and has not been returned yet. This allows code to retrieve a connection from the pool, and then while still holding on to that connection, to call other functions which also ask the pool for a connection of the same arguments; those functions will act upon the same connection that the calling method is using. This option is overridden during create_engine(), corresponding to the "plain" or "threadlocal" connection strategy.
  • recycle=-1 : if set to non -1, a number of seconds between connection recycling, which means upon checkout, if this timeout is surpassed the connection will be closed and replaced with a newly opened connection.
  • auto_close_cursors = True : cursors, returned by connection.cursor(), are tracked and are automatically closed when the connection is returned to the pool. some DBAPIs like MySQLDB become unstable if cursors remain open.
  • disallow_open_cursors = False : if auto_close_cursors is False, and disallow_open_cursors is True, will raise an exception if an open cursor is detected upon connection checkin. If auto_close_cursors and disallow_open_cursors are both False, then no cursor processing occurs upon checkin.

QueuePool options include:

  • pool_size=5 : the size of the pool to be maintained. This is the largest number of connections that will be kept persistently in the pool. Note that the pool begins with no connections; once this number of connections is requested, that number of connections will remain.
  • max_overflow=10 : the maximum overflow size of the pool. When the number of checked-out connections reaches the size set in pool_size, additional connections will be returned up to this limit. When those additional connections are returned to the pool, they are disconnected and discarded. It follows then that the total number of simultaneous connections the pool will allow is pool_size + max_overflow, and the total number of "sleeping" connections the pool will allow is pool_size. max_overflow can be set to -1 to indicate no overflow limit; no limit will be placed on the total number of concurrent connections.
  • timeout=30 : the number of seconds to wait before giving up on returning a connection
back to section top

Custom Pool Construction

Besides using the transparent proxy, instances of sqlalchemy.pool.Pool can be created directly. Constructing your own pool involves passing a callable used to create a connection. Through this method, custom connection schemes can be made, such as a connection that automatically executes some initialization commands to start.

Constructing a QueuePool
import sqlalchemy.pool as pool
import psycopg2

def getconn():
    c = psycopg2.connect(username='ed', host='127.0.0.1', dbname='test')
    # execute an initialization function on the connection before returning
    c.cursor.execute("setup_encodings()")
    return c

p = pool.QueuePool(getconn, max_overflow=10, pool_size=5, use_threadlocal=True)

Or with SingletonThreadPool:

Constructing a SingletonThreadPool
import sqlalchemy.pool as pool
import sqlite

def getconn():
    return sqlite.connect(filename='myfile.db')

# SQLite connections require the SingletonThreadPool    
p = pool.SingletonThreadPool(getconn)
back to section top

SQLAlchemy has a variety of extensions and "mods" available which provide extra functionality to SA, either via explicit usage or by augmenting the core behavior. Several of these extensions are designed to work together.

SessionContext

Author: Daniel Miller

This plugin is used to instantiate and manage Session objects. It is the preferred way to provide thread-local session functionality to an application. It provides several services:

  • serves as a factory to create sessions of a particular configuration. This factory may either call create_session() with a particular set of arguments, or instantiate a different implementation of Session if one is available.
  • for the Session objects it creates, provides the ability to maintain a single Session per distinct application thread. The Session returned by a SessionContext is called the contextual session. Providing at least a thread-local context to sessions is important because the Session object is not threadsafe, and is intended to be used with localized sets of data, as opposed to a single session being used application wide.
  • besides maintaining a single Session per thread, the contextual algorithm can be changed to support any kind of contextual scheme.
  • provides a MapperExtension that can enhance a Mapper, such that it can automatically save() newly instantiated objects to the current contextual session. It also allows Query objects to be created without an explicit Session. While this is very convenient functionality, having it switched on without understanding it can be very confusing. Note that this feature is optional when using SessionContext.

Using the SessionContext in its most basic form involves just instantiating a SessionContext:

import sqlalchemy
from sqlalchemy.ext.sessioncontext import SessionContext

ctx = SessionContext(sqlalchemy.create_session)

class User(object):
    pass

mapper(User, users_table)
u = User()

# the contextual session is referenced by the "current" property on SessionContext
ctx.current.save(u)
ctx.current.flush()

From this example, one might see that the SessionContext's typical scope is at the module or application level. Since the Session itself is better suited to be used in per-user-request or even per-function scope, the SessionContext provides an easy way to manage the scope of those Session objects.

The construction of each Session instance can be customized by providing a "creation function" which returns a new Session. A common customization is a Session which needs to explicitly bind to a particular Engine:

import sqlalchemy
from sqlalchemy.ext.sessioncontext import SessionContext

# create an engine
someengine = sqlalchemy.create_engine('sqlite:///')

# a function to return a Session bound to our engine
def make_session():
    return sqlalchemy.create_session(bind_to=someengine)

# SessionContext
ctx = SessionContext(make_session)

# get the session bound to engine "someengine":
session = ctx.current

The above pattern is more succinctly expressed using Python lambdas:

ctx = SessionContext(lambda:sqlalchemy.create_session(bind_to=someengine))

The default creation function is simply:

ctx = SessionContext(sqlalchemy.create_session)

The "scope" to which the session is associated, which by default is a thread-local scope, can be customized by providing a "scope callable" which returns a hashable key that represents the current scope:

import sqlalchemy
from sqlalchemy.ext.sessioncontext import SessionContext

# global declaration of "scope"
scope = "scope1"

# a function to return the current "session scope"
def global_scope_func():
    return scope

# create SessionContext with a custom "scopefunc"
ctx = SessionContext(sqlalchemy.create_session, scopefunc=global_scope_func)

# get the session corresponding to "scope1":
session = ctx.current

# switch the "scope"
scope = "scope2"

# get the session corresponding to "scope2":
session = ctx.current

Examples of customized scope can include user-specific sessions or requests, or even sub-elements of an application, such as a graphical application which maintains a single Session per application window (this was the original motivation to create SessionContext).

Using SessionContextExt

This is a MapperExtension which allows a Mapper to be automatically associated with a SessionContext. Newly constructed objects get save()d to the session automatically, and Query objects can be constructed without a session. The instance of SessionContextExt is provided by the SessionContext itself:

import sqlalchemy
from sqlalchemy.ext.sessioncontext import SessionContext

ctx = SessionContext(sqlalchemy.create_session)

class User(object):
    pass

mapper(User, users_table, extension=ctx.mapper_extension)

# 'u' is automatically added to the current session of 'ctx'
u = User()

assert u in ctx.current

# get the current session and flush
ctx.current.flush()

The MapperExtension can be configured either per-mapper as above, or on an application-wide basis using:

import sqlalchemy
from sqlalchemy.orm.mapper import global_extensions
from sqlalchemy.ext.sessioncontext import SessionContext

ctx = SessionContext(sqlalchemy.create_session)

global_extensions.append(ctx.mapper_extension)

SessionContextExt allows Query objects to be created against the mapped class without specifying a Session. Each Query will automatically make usage of the current contextual session:

# create a Query from a class
query = Query(User)

# specify entity name
query = Query(User, entity_name='foo')

# create a Query from a mapper
query = Query(mapper)

# then use it
result = query.select()

When installed globally, all Mapper objects will contain a built-in association to the SessionContext. This means that once a mapped instance is created, creating a new Session and calling save() with the instance as an argument will raise an error stating that the instance is already associated with a different session. While you can always remove the object from its original session, SessionContextExt is probably convenient only for an application that does not need much explicit manipulation of sessions.

The user still has some control over which session gets used at instance construction time. An instance can be redirected at construction time to a different Session by specifying the keyword parameter _sa_session to its constructor, which is decorated by the mapper:

session = create_session()  # create a new session distinct from the contextual session
myuser = User(_sa_session=session)  # make a new User that is saved to this session

Similarly, the entity_name parameter, which specifies an alternate Mapper to be used when attaching this instance to the Session, can be specified via _sa_entity_name:

myuser = User(_sa_session=session, _sa_entity_name='altentity')

The decoration of mapped instances' __init__() method is similar to this example:

oldinit = class_.__init__   # the previous init method
def __init__(self, *args, **kwargs):
    session = kwargs.pop('_sa_session', None)
    entity_name = kwargs.pop('_sa_entity_name', None)
    if session is None:
        session = ext.get_session() # get Session from this Mapper's MapperExtension
        if session is EXT_PASS:
            session = None
    if session is not None:
        session.save(self, entity_name=entity_name)  # attach to the current session
    oldinit(self, *args, **kwagrs)   # call previous init method
back to section top

SelectResults

Author: Jonas Borgström

SelectResults gives transformative behavior to the results returned from the select and select_by methods of Query.

from sqlalchemy.ext.selectresults import SelectResults

query = session.query(MyClass)
res = SelectResults(query)

res = res.filter(table.c.column == "something") # adds a WHERE clause (or appends to the existing via "and")
res = res.order_by([table.c.column]) # adds an ORDER BY clause

for x in res[:10]:  # Fetch and print the top ten instances - adds OFFSET 0 LIMIT 10 or equivalent
  print x.column2

# evaluate as a list, which executes the query
x = list(res)

# Count how many instances that have column2 > 42
# and column == "something"
print res.filter(table.c.column2 > 42).count()

# select() is a synonym for filter()
session.query(MyClass).select(mytable.c.column=="something").order_by([mytable.c.column])[2:7]

An important facet of SelectResults is that the actual SQL execution does not occur until the object is used in a list or iterator context. This means you can call any number of transformative methods (including filter, order_by, list range expressions, etc) before any SQL is actually issued.

Configuration of SelectResults may be per-Query, per Mapper, or per application:

from sqlalchemy.ext.selectresults import SelectResults, SelectResultsExt

# construct a SelectResults for an individual Query
sel = SelectResults(session.query(MyClass))

# construct a Mapper where the Query.select()/select_by() methods will return a SelectResults:
mapper(MyClass, mytable, extension=SelectResultsExt())

# globally configure all Mappers to return SelectResults, using the "selectresults" mod
import sqlalchemy.mods.selectresults

SelectResults greatly enhances querying and is highly recommended. For example, heres an example of constructing a query using a combination of joins and outerjoins:

mapper(User, users_table, properties={
    'orders':relation(mapper(Order, orders_table, properties={
        'items':relation(mapper(Item, items_table))
    }))
})
session = create_session()
query = SelectResults(session.query(User))

result = query.outerjoin_to('orders').outerjoin_to('items').select(or_(Order.c.order_id==None,Item.c.item_id==2))

For a full listing of methods, see the generated documentation.

back to section top

assignmapper

Author: Mike Bayer

This extension is used to decorate a mapped class with direct knowledge about its own Mapper, a contextual Session, as well as functions provided by the Query and Session objects. The methods will automatically make usage of a contextual session with which all newly constructed objects are associated. assign_mapper operates as a MapperExtension, and requires the usage of a SessionContext as well as SessionContextExt, described in SessionContext. It replaces the usage of the normal mapper function with its own version that adds a SessionContext specified as the first argument:

import sqlalchemy
from sqlalchemy.ext.sessioncontext import SessionContext
from sqlalchemy.ext.assignmapper import assign_mapper

# session context
ctx = SessionContext(sqlalchemy.create_session)

# assign mapper to class MyClass using table 'sometable', getting
# Sessions from 'ctx'.
assign_mapper(ctx, MyClass, sometable, properties={...}, ...)

Above, all new instances of MyClass will be associated with the contextual session, ctx.current. Additionally, MyClass and instances of MyClass now contain a large set of methods including get, select, flush, delete. The full list is as follows:

# Query methods:
['get', 'select', 'select_by', 'selectone', 'get_by', 'join_to', 'join_via', 'count', 'count_by']

# Session methods:
['flush', 'delete', 'expire', 'refresh', 'expunge', 'merge', 'save', 'update', 'save_or_update']

To continue the MyClass example:

# create a MyClass.  it will be automatically assigned to the contextual Session.
mc = MyClass()

# save MyClass - this will call flush() on the session, specifying 'mc' as the only
# object to be affected
mc.flush()

# load an object, using Query methods attached to MyClass
result = MyClass.get_by(id=5)

# delete it
result.delete()

# commit the change
result.flush()

It should be noted that the flush() method on the instance need not be called. You're probably better off calling flush() on the actual session, so that all changes are properly written to the database simultaneously:

# create a MyClass.
mc = MyClass()

# load some MyClass objects
result = MyClass.select(MyClass.c.name=='bar')

# delete one of them
result[1].delete()

# commit all changes
ctx.current.flush()
back to section top

associationproxy

Author: Mike Bayer

Version: 0.3.1 or greater

associationproxy is used to create a transparent proxy to the associated object in an association relationship, thereby decreasing the verbosity of the pattern in cases where explicit access to the association object is not required. The association relationship pattern is a richer form of a many-to-many relationship, which is described in Association Object. It is strongly recommended to fully understand the association object pattern in its explicit form before using this extension; see the examples in the SQLAlchemy distribution under the directory examples/association/.

When dealing with association relationships, the association object refers to the object that maps to a row in the association table (i.e. the many-to-many table), while the associated object refers to the "endpoint" of the association, i.e. the ultimate object referenced by the parent. The proxy can return collections of objects attached to association objects, and can also create new association objects given only the associated object. An example using the Keyword mapping described in the data mapping documentation is as follows:

from sqlalchemy.ext.associationproxy import AssociationProxy

class User(object):
    pass

class Keyword(object):
    def __init__(self, name):
        self.keyword_name = name

class Article(object):
    # create "keywords" proxied association.
    # the collection is called 'keyword_associations', the endpoint
    # attribute of each association object is called 'keyword'.  the 
    # class itself of the association object will be figured out automatically  .
    keywords = AssociationProxy('keyword_associations', 'keyword')

class KeywordAssociation(object):
    pass

# create mappers normally
# note that we set up 'keyword_associations' on Article,
# and 'keyword' on KeywordAssociation.
mapper(Article, articles_table, properties={
    'keyword_associations':relation(KeywordAssociation, lazy=False, cascade="all, delete-orphan")
    }
)
mapper(KeywordAssociation, itemkeywords_table,
    primary_key=[itemkeywords_table.c.article_id, itemkeywords_table.c.keyword_id],
    properties={
        'keyword' : relation(Keyword, lazy=False), 
        'user' : relation(User, lazy=False) 
    }
)
mapper(User, users_table)
mapper(Keyword, keywords_table)

# now, Keywords can be attached to an Article directly;
# KeywordAssociation will be created by the AssociationProxy, and have the 
# 'keyword' attribute set to the new Keyword.
# note that these KeywordAssociation objects will not have a User attached to them.
article = Article()
article.keywords.append(Keyword('blue'))
article.keywords.append(Keyword('red'))
session.save(article)
session.flush()

# the "keywords" collection also returns the underlying Keyword objects
article = session.query(Article).get_by(id=12)
for k in article.keywords:
    print "Keyword:", k.keyword_name

# the original 'keyword_associations' relation exists normally with no awareness of the proxy
article.keyword_associations.append(KeywordAssociation())
print [ka for ka in article.keyword_associations]

Note that the above operations on the keywords collection are proxying operations to and from the keyword_associations collection, which exists normally and can be accessed directly. AssociationProxy will also detect if the collection is list or scalar based and will configure the proxied property to act the same way.

For the common case where the association object's creation needs to be specified by the application, AssociationProxy takes an optional callable creator() which takes a single associated object as an argument, and returns a new association object.

def create_keyword_association(keyword):
    ka = KeywordAssociation()
    ka.keyword = keyword
    return ka

class Article(object):
    # create "keywords" proxied association
    keywords = AssociationProxy('keyword_associations', 'keyword', creator=create_keyword_association)
back to section top

threadlocal

Author: Mike Bayer and Daniel Miller

threadlocal is an extension that was created primarily to provide backwards compatibility with the older SQLAlchemy 0.1 series. It uses three features which SQLAlchemy 0.2 and above provide as distinct features: SessionContext, assign_mapper, and the TLEngine, which is the Engine used with the threadlocal create_engine() strategy. It is strongly recommended that these three features are understood individually before using threadlocal.

In SQLAlchemy 0.1, users never dealt with explcit connections and didn't have a very explicit Session interface, instead relying upon a more magical global object called objectstore. The objectstore idea was wildly popular with about half of SA's users, and completely unpopular with the other half. The threadlocal mod basically brings back objectstore, which is in fact just a SessionContext where you can call Session methods directly off of it, instead of saying context.current. For threadlocal to faithfully produce 0.1 behavior, it is invoked as a mod which globally installs the objectstore's mapper extension, such that all Mappers will automatically assign all new instances of mapped classes to the objectstore's contextual Session. Additionally, it also changes the default engine strategy used by create_engine to be the "threadlocal" strategy, which in normal practice does not affect much.

When you import threadlocal, what you get is:

  • the "objectstore" session context object is now added to the sqlalchemy namespace.
  • a global MapperExtension is set up for all mappers which assigns "objectstore"'s session as the default session context, used by new instances as well as Query objects (see the section Using SessionContextExt).
  • a new function "assign_mapper" is added to the sqlalchemy namespace, which calls the assignmapper mapper function using the new "objectstore" context.
  • the create_engine function is modified so that "threadlocal", and not "plain", is the default engine strategy.

So an important point to understand is, don't use the threadlocal mod unless you explcitly are looking for that behavior. Unfortunately, the easy import of the "threadlocal" mod has found its way into several tutorials on external websites, which produces application-wide behavior that is in conflict with the SQLAlchemy tutorial and data mapping documentation.

While "threadlocal" is only about 10 lines of code, it is strongly advised that users instead make usage of SessionContext and assign_mapper explictly to eliminate confusion. Additionally, the "threadlocal" strategy on create_engine() also exists primarily to provide patterns used in 0.1 and is probably not worth using either, unless you specifically need those patterns.

Basic usage of threadlocal involves importing the mod, before any usage of the sqlalchemy namespace, since threadlocal is going to add the "objectstore" and "assign_mapper" keywords to "sqlalchemy".

To use objectstore:

import sqlalchemy.mods.threadlocal
from sqlalchemy import *

metadata = BoundMetaData('sqlite:///')
user_table = Table('users', metadata,
    Column('user_id', Integer, primary_key=True),
    Column('user_name', String(50), nullable=False)
)

class User(object):
    pass
mapper(User, user_table)

# "user" object is added to the session automatically
user = User()

# flush the contextual session
objectstore.flush()

The actual Session is available as:

objectstore.get_session()

To use assign_mapper:

import sqlalchemy.mods.threadlocal
from sqlalchemy import *

metadata = BoundMetaData('sqlite:///')
user_table = Table('users', metadata,
    Column('user_id', Integer, primary_key=True),
    Column('user_name', String(50), nullable=False)
)

class User(object):
    pass

# note that no "context" object is needed
assign_mapper(User, user_table)

# call up a user
user = User.selectfirst(user_table.c.user_id==7)

# call 'delete' on the user
user.delete()

# flush
objectstore.flush()
back to section top

ActiveMapper

Author: Jonathan LaCour

ActiveMapper is a so-called "declarative layer" which allows the construction of a class, a Table, and a Mapper all in one step:

class Person(ActiveMapper):
    class mapping:
        id          = column(Integer, primary_key=True)
        full_name   = column(String)
        first_name  = column(String)
        middle_name = column(String)
        last_name   = column(String)
        birth_date  = column(DateTime)
        ssn         = column(String)
        gender      = column(String)
        home_phone  = column(String)
        cell_phone  = column(String)
        work_phone  = column(String)
        prefs_id    = column(Integer, foreign_key=ForeignKey('preferences.id'))
        addresses   = one_to_many('Address', colname='person_id', backref='person')
        preferences = one_to_one('Preferences', colname='pref_id', backref='person')

    def __str__(self):
        s =  '%s\n' % self.full_name
        s += '  * birthdate: %s\n' % (self.birth_date or 'not provided')
        s += '  * fave color: %s\n' % (self.preferences.favorite_color or 'Unknown')
        s += '  * personality: %s\n' % (self.preferences.personality_type or 'Unknown')

        for address in self.addresses:
            s += '  * address: %s\n' % address.address_1
            s += '             %s, %s %s\n' % (address.city, address.state, address.postal_code)

        return s

class Preferences(ActiveMapper):
    class mapping:
        __table__        = 'preferences'
        id               = column(Integer, primary_key=True)
        favorite_color   = column(String)
        personality_type = column(String)

class Address(ActiveMapper):
    class mapping:
        id          = column(Integer, primary_key=True)
        type        = column(String)
        address_1   = column(String)
        city        = column(String)
        state       = column(String)
        postal_code = column(String)
        person_id   = column(Integer, foreign_key=ForeignKey('person.id'))

More discussion on ActiveMapper can be found at Jonathan LaCour's Blog as well as the SQLAlchemy Wiki.

back to section top

SqlSoup

Author: Jonathan Ellis

SqlSoup creates mapped classes on the fly from tables, which are automatically reflected from the database based on name. It is essentially a nicer version of the "row data gateway" pattern.

>>> from sqlalchemy.ext.sqlsoup import SqlSoup
>>> soup = SqlSoup('sqlite:///')

>>> db.users.select(order_by=[db.users.c.name])
[MappedUsers(name='Bhargan Basepair',email='[email protected]',password='basepair',classname=None,admin=1),
 MappedUsers(name='Joe Student',email='[email protected]',password='student',classname=None,admin=0)]

Full SqlSoup documentation is on the SQLAlchemy Wiki.

back to section top

ProxyEngine

Author: Jason Pellerin

The ProxyEngine is used to "wrap" an Engine, and via subclassing ProxyEngine one can instrument the functionality of an arbitrary Engine instance through the decorator pattern. It also provides a connect() method which will send all Engine requests to different underlying engines. Its functionality in that regard is largely superceded now by DynamicMetaData which is a better solution.

from sqlalchemy.ext.proxy import ProxyEngine
proxy = ProxyEngine()

proxy.connect('postgres://user:pw@host/db')
back to section top