SQLAlchemy 0.3 Documentation
A database engine is a subclass of sqlalchemy.sql.Engine
, and is the starting point for where SQLAlchemy provides a layer of abstraction on top of the various DBAPI2 database modules. For all databases supported by SA, there is a specific "implementation" module, found in the sqlalchemy.databases
package, that provides all the objects an Engine
needs in order to perform its job. A typical user of SQLAlchemy never needs to deal with these modules directly. For many purposes, the only knowledge that's needed is how to create an Engine for a particular connection URL. When dealing with direct execution of SQL statements, one would also be aware of Result, Connection, and Transaction objects. The primary public facing objects are:
-
URL - represents the identifier for a particular database. URL objects are usually created automatically based on a given connect string passed to the
create_engine()
function. -
Engine - Combines a connection-providing resource with implementation-provided objects that know how to generate, execute, and gather information about SQL statements. It also provides the primary interface by which Connections are obtained, as well as a context for constructed SQL objects and schema constructs to "implicitly execute" themselves, which is an optional feature of SQLAlchemy.
-
Connection - represents a connection to the database. The underlying connection object returned by a DBAPI's connect() method is referenced internally by the Connection object. Connection provides methods that handle the execution of SQLAlchemy's own SQL constructs, as well as literal string-based statements.
-
Transaction - represents a transaction on a single Connection. Includes
begin()
,commit()
androllback()
methods that support basic "nestable" behavior, meaning an outermost transaction is maintained against multiple nested calls to begin/commit. - ResultProxy - Represents the results of an execution, and is most analgous to the cursor object in DBAPI. It primarily allows iteration over result sets, but also provides an interface to information about inserts/updates/deletes, such as the count of rows affected, last inserted IDs, etc.
- RowProxy - Represents a single row returned by the fetchone() method on ResultProxy.
Underneath the public-facing API of Engine
, several components are provided by database implementations to provide the full behavior, including:
-
Dialect - this object is provided by database implementations to describe the behavior of a particular database. It acts as a repository for metadata about a database's characteristics, and provides factory methods for other objects that deal with generating SQL strings and objects that handle some of the details of statement execution.
- ConnectionProvider - this object knows how to return a DBAPI connection object. It typically talks to a connection pool which maintains one or more connections in memory for quick re-use.
- ExecutionContext - this object is created for each execution of a single SQL statement, and tracks information about its execution such as primary keys inserted, the total count of rows affected, etc. It also may implement any special logic that various DBAPI implementations may require before or after a statement execution.
-
Compiled - represents a "compiled" SQL expression object. Includes a
compile()
method which receives SQL expression objects and assembles them into strings that are suitable for direct execution. Also collects default bind parameters into a datastructure that will be converted at execution time into a dictionary or list, depending on the dialect's paramstyle.
Supported Databases
Engines exist for SQLite, Postgres, MySQL, and Oracle, using the Pysqlite, Psycopg2 (Psycopg1 will work to a limited degree, but it is not supported), MySQLDB, and cx_Oracle modules. There is also preliminary support for MS-SQL using adodbapi or pymssql, as well as Firebird. For each engine, a distinct Python module exists in the sqlalchemy.databases
package, which provides implementations of some of the objects mentioned in the previous section.
Downloads for each DBAPI at the time of this writing are as follows:
- Postgres: psycopg2
- SQLite: pysqlite
- MySQL: MySQLDB
- Oracle: cx_Oracle
- MS-SQL: adodbapi pymssql
- Firebird: kinterbasdb
The SQLAlchemy Wiki contains a page of database notes, describing whatever quirks and behaviors have been observed. Its a good place to check for issues with specific databases. Database Notes
back to section topEstablishing a Database Engine
SQLAlchemy indicates the source of an Engine strictly via RFC-1738 style URLs, combined with optional keyword arguments to specify options for the Engine. The form of the URL is:
$ driver://username:password@host:port/database
Available drivernames are sqlite
, mysql
, postgres
, oracle
, mssql
, and firebird
. For sqlite, the database name is the filename to connect to, or the special name ":memory:" which indicates an in-memory database. The URL is typically sent as a string to the create_engine()
function:
# postgres pg_db = create_engine('postgres://scott:tiger@localhost:5432/mydatabase') # sqlite (note the four slashes for an absolute path) sqlite_db = create_engine('sqlite:////absolute/path/to/database.txt') sqlite_db = create_engine('sqlite:///relative/path/to/database.txt') sqlite_db = create_engine('sqlite://') # in-memory database # mysql mysql_db = create_engine('mysql://localhost/foo') # oracle via TNS name oracle_db = create_engine('oracle://scott:tiger@dsn') # oracle will feed host/port/SID into cx_oracle.makedsn oracle_db = create_engine('oracle://scott:[email protected]:1521/sidname')
The Engine
will create its first connection to the database when a SQL statement is executed. As concurrent statements are executed, the underlying connection pool will grow to a default size of five connections, and will allow a default "overflow" of ten. Since the Engine
is essentially "home base" for the connection pool, it follows that you should keep a single Engine
per database established within an application, rather than creating a new one for each connection.
Custom DBAPI keyword arguments
Custom arguments can be passed to the underlying DBAPI in three ways. String-based arguments can be passed directly from the URL string as query arguments:
db = create_engine('postgres://scott:tiger@localhost/test?argument1=foo&argument2=bar')
If SQLAlchemy's database connector is aware of a particular query argument, it may convert its type from string to its proper type.
create_engine
also takes an argument connect_args
which is an additional dictionary that will be passed to connect()
. This can be used when arguments of a type other than string are required, and SQLAlchemy's database connector has no type conversion logic present for that parameter:
db = create_engine('postgres://scott:tiger@localhost/test', connect_args = {'argument1':17, 'argument2':'bar'})
The most customizable connection method of all is to pass a creator
argument, which specifies a callable that returns a DBAPI connection:
def connect(): return psycopg.connect(user='scott', host='localhost') db = create_engine('postgres://', creator=connect)
Database Engine Options
Keyword options can also be specified to create_engine()
, following the string URL as follows:
db = create_engine('postgres://...', encoding='latin1', echo=True, module=psycopg1)
A list of all standard options, as well as several that are used by particular database dialects, is as follows:
-
convert_unicode=False - if set to True, all String/character based types will convert Unicode values to raw byte values going into the database, and all raw byte values to Python Unicode coming out in result sets. This is an engine-wide method to provide unicode conversion across the board. For unicode conversion on a column-by-column level, use the
Unicode
column type instead, described in The Types System. -
echo=False - if True, the Engine will log all statements as well as a repr() of their parameter lists to the engines logger, which defaults to sys.stdout. The
echo
attribute ofEngine
can be modified at any time to turn logging on and off. If set to the string"debug"
, result rows will be printed to the standard output as well. This flag ultimately controls a Python logger; see Configuring Logging for information on how to configure logging directly. - echo_pool=False - if True, the connection pool will log all checkouts/checkins to the logging stream, which defaults to sys.stdout. This flag ultimately controls a Python logger; see Configuring Logging for information on how to configure logging directly.
-
encoding='utf-8' - the encoding to use for all Unicode translations, both by engine-wide unicode conversion as well as the
Unicode
type object. - module=None - used by database implementations which support multiple DBAPI modules, this is a reference to a DBAPI2 module to be used instead of the engine's default module. For Postgres, the default is psycopg2, or psycopg1 if 2 cannot be found. For Oracle, its cx_Oracle.
-
pool=None - an actual pool instance. Note that an already-constructed pool should already know how to create database connections, so this option supercedes any other connect options specified. Typically, it is an instance of
sqlalchemy.pool.Pool
to be used as the underlying source for connections. For more on connection pooling, see Connection Pooling.
Example of a manual invocation of pool.QueuePool
(which is the pool instance used for all databases except sqlite):
from sqlalchemy import * import sqlalchemy.pool as pool import MySQLdb def getconn(): return MySQLdb.connect(user='ed', dbname='mydb') engine = create_engine('mysql://', pool=pool.QueuePool(getconn, pool_size=20, max_overflow=40))
-
poolclass=None - a
sqlalchemy.pool.Pool
subclass that will be instantated in place of the default connection pool. -
max_overflow=10 - the number of connections to allow in connection pool "overflow", that is connections that can be opened above and beyond the pool_size setting, which defaults to five. this is only used with
QueuePool
. -
pool_size=5 - the number of connections to keep open inside the connection pool. This used with
QueuePool
as well asSingletonThreadPool
. - pool_recycle=-1 - this setting causes the pool to recycle connections after the given number of seconds has passed. It defaults to -1, or no timeout. For example, setting to 3600 means connections will be recycled after one hour. Note that MySQL in particular will disconnect automatically if no activity is detected on a connection for eight hours (although this is configurable with the MySQLDB connection itself and the server configuration as well).
-
pool_timeout=30 - number of seconds to wait before giving up on getting a connection from the pool. This is only used with
QueuePool
. -
strategy='plain' - the Strategy argument is used to select alternate implementations of the underlying Engine object, which coordinates operations between dialects, compilers, connections, and so on. Currently, the only alternate strategy besides the default value of "plain" is the "threadlocal" strategy, which selects the usage of the
TLEngine
class that provides a modified connection scope for implicit executions. Implicit execution as well as further detail on this setting are described in Implicit Execution. -
threaded=True - used by cx_Oracle; sets the
threaded
parameter of the connection indicating thread-safe usage. cx_Oracle docs indicate setting this flag toFalse
will speed performance by 10-15%. While this defaults toFalse
in cx_Oracle, SQLAlchemy defaults it toTrue
, preferring stability over early optimization. -
use_ansi=True - used only by Oracle; when False, the Oracle driver attempts to support a particular "quirk" of Oracle versions 8 and previous, that the LEFT OUTER JOIN SQL syntax is not supported, and the "Oracle join" syntax of using
<column1>(+)=<column2>
must be used in order to achieve a LEFT OUTER JOIN. - use_oids=False - used only by Postgres, will enable the column name "oid" as the object ID column, which is also used for the default sort order of tables. Postgres as of 8.1 has object IDs disabled by default.
Configuring Logging
As of the 0.3 series of SQLAlchemy, Python's standard logging module is used to implement informational and debug log output. This allows SQLAlchemy's logging to integrate in a standard way with other applications and libraries. The echo
and echo_pool
flags that are present on create_engine()
, as well as the echo_uow
flag used on Session
, all interact with regular loggers.
This section assumes familiarity with the above linked logging module. All logging performed by SQLAlchemy exists underneath the sqlalchemy
namespace, as used by logging.getLogger('sqlalchemy')
. When logging has been configured (i.e. such as via logging.basicConfig()
), the general namespace of SA loggers that can be turned on is as follows:
-
sqlalchemy.engine
- controls SQL echoing. set tologging.INFO
for SQL query output,logging.DEBUG
for query + result set output. -
sqlalchemy.pool
- controls connection pool logging. set tologging.INFO
or lower to log connection pool checkouts/checkins. -
sqlalchemy.orm
- controls logging of various ORM functions. set tologging.INFO
for configurational logging as well as unit of work dumps,logging.DEBUG
for extensive logging during query and flush() operations. Subcategories ofsqlalchemy.orm
include:-
sqlalchemy.orm.attributes
- logs certain instrumented attribute operations, such as triggered callables -
sqlalchemy.orm.mapper
- logs Mapper configuration and operations -
sqlalchemy.orm.unitofwork
- logs flush() operations, including dependency sort graphs and other operations -
sqlalchemy.orm.strategies
- logs relation loader operations (i.e. lazy and eager loads) -
sqlalchemy.orm.sync
- logs synchronization of attributes from parent to child instances during a flush()
-
For example, to log SQL queries as well as unit of work debugging:
import logging logging.basicConfig() logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO) logging.getLogger('sqlalchemy.orm.unitofwork').setLevel(logging.DEBUG)
By default, the log level is set to logging.ERROR
within the entire sqlalchemy
namespace so that no log operations occur, even within an application that has logging enabled otherwise.
The echo
flags present as keyword arguments to create_engine()
and others as well as the echo
property on Engine
, when set to True
, will first attempt to insure that logging is enabled. Unfortunately, the logging
module provides no way of determining if output has already been configured (note we are referring to if a logging configuration has been set up, not just that the logging level is set). For this reason, any echo=True
flags will result in a call to logging.basicConfig()
using sys.stdout as the destination. It also sets up a default format using the level name, timestamp, and logger name. Note that this configuration has the affect of being configured in addition to any existing logger configurations. Therefore, when using Python logging, insure all echo flags are set to False at all times, to avoid getting duplicate log lines.
Using Connections
In this section we describe the SQL execution interface available from an Engine
instance. Note that when using the Object Relational Mapper (ORM) as well as when dealing with with "bound" metadata objects, SQLAlchemy deals with the Engine and Connections for you and you generally don't need to know much about it; in those cases, you can skip this section and go to Database Meta Data. "Bound" metadata is described in Binding MetaData to an Engine.
The Engine provides a connect()
method which returns a Connection
object. Connection
is a proxy object which maintains a reference to a DBAPI connection instance. This object provides methods by which literal SQL text as well as SQL clause constructs can be compiled and executed.
engine = create_engine('sqlite:///:memory:') connection = engine.connect() result = connection.execute("select * from mytable where col1=:col1", col1=5) for row in result: print row['col1'], row['col2'] connection.close()
The close
method on Connection
does not actually remove the underlying connection to the database, but rather indicates that the underlying resources can be returned to the connection pool. When using the connect()
method, the DBAPI connection referenced by the Connection
object is not referenced anywhere else.
In both execution styles above, the Connection
object will also automatically return its resources to the connection pool when the object is garbage collected, i.e. its __del__()
method is called. When using the standard C implementation of Python, this method is usually called immediately as soon as the object is dereferenced. With other Python implementations such as Jython, this is not so guaranteed.
The execute method on Engine
and Connection
can also receive SQL clause constructs as well, which are described in Constructing SQL Queries via Python Expressions:
connection = engine.connect() result = connection.execute(select([table1], table1.c.col1==5)) for row in result: print row['col1'], row['col2'] connection.close()
Both Connection
and Engine
fulfill an interface known as Connectable
which specifies common functionality between the two objects, such as getting a Connection
and executing queries. Therefore, most SQLAlchemy functions which take an Engine
as a parameter with which to execute SQL will also accept a Connection
(and the name of the argument is typically called connectable
):
engine = create_engine('sqlite:///:memory:') # specify some Table metadata metadata = MetaData() table = Table('sometable', metadata, Column('col1', Integer)) # create the table with the Engine table.create(connectable=engine) # drop the table with a Connection off the Engine connection = engine.connect() table.drop(connectable=connection)
Connection facts:
- the Connection object is not threadsafe. While a Connection can be shared among threads using properly synchronized access, this is also not recommended as many DBAPIs have issues with, if not outright disallow, sharing of connection state between threads.
- The Connection object represents a single dbapi connection checked out from the connection pool. In this state, the connection pool has no affect upon the connection, including its expiration or timeout state. For the connection pool to properly manage connections, connections should be returned to the connection pool (i.e. Connection.close()) whenever the connection is not in use. If your application has a need for management of multiple connections or is otherwise long running (this includes all web applications, threaded or not), don't hold a single connection open at the module level.
Transactions
The Connection
object provides a begin()
method which returns a Transaction
object. This object is usually used within a try/except clause so that it is guaranteed to rollback()
or commit()
:
trans = connection.begin() try: r1 = connection.execute(table1.select()) connection.execute(table1.insert(), col1=7, col2='this is some data') trans.commit() except: trans.rollback() raise
The Transaction
object also handles "nested" behavior by keeping track of the outermost begin/commit pair. In this example, two functions both issue a transaction on a Connection, but only the outermost Transaction object actually takes effect when it is committed.
# method_a starts a transaction and calls method_b def method_a(connection): trans = connection.begin() # open a transaction try: method_b(connection) trans.commit() # transaction is committed here except: trans.rollback() # this rolls back the transaction unconditionally raise # method_b also starts a transaction def method_b(connection): trans = connection.begin() # open a transaction - this runs in the context of method_a's transaction try: connection.execute("insert into mytable values ('bat', 'lala')") connection.execute(mytable.insert(), col1='bat', col2='lala') trans.commit() # transaction is not committed yet except: trans.rollback() # this rolls back the transaction unconditionally raise # open a Connection and call method_a conn = engine.connect() method_a(conn) conn.close()
Above, method_a
is called first, which calls connection.begin()
. Then it calls method_b
. When method_b
calls connection.begin()
, it just increments a counter that is decremented when it calls commit()
. If either method_a
or method_b
calls rollback()
, the whole transaction is rolled back. The transaction is not committed until method_a
calls the commit()
method.
Note that SQLAlchemy's Object Relational Mapper also provides a way to control transaction scope at a higher level; this is described in SessionTransaction.
Transaction Facts:
- the Transaction object, just like its parent Connection, is not threadsafe.
Implicit Execution
Implicit execution refers to the execution of SQL without the explicit usage of a Connection
object. This occurs when you call the execute()
method off of an Engine
object or off of a SQL expression or table that is associated with "bound" metadata.
engine = create_engine('sqlite:///:memory:') result = engine.execute("select * from mytable where col1=:col1", col1=5) for row in result: print row['col1'], row['col2'] result.close()
Using "bound" metadata:
engine = create_engine('sqlite:///:memory:') meta = BoundMetaData(engine) table = Table('mytable', meta, Column('col1', Integer), Column('col2', String(20))) r = table.insert().execute(col1=5, col2='some record')
Notice in the above two examples, no connect()
method is ever called nor do we ever see a Connection
anywhere; the Connection
is created for you automatically via the execute()
method, and a handle to the execution's cursor remains open in the returned result set. When the result set is closed via the close()
method, or if the result set object falls out of scope and is garbage collected, the underlying cursor is closed, the Connection
is discarded and the underlying DBAPI connection is returned to the connection pool.
The purpose of the "implicit" connection is strictly one of convenience; while in SQLAlchemy 0.1 it was the only style of operation, it is now optional.
Implicit Execution Strategies
The internal behavior of engine during implicit execution can be affected by the strategy
keyword argument to create_engine()
. Generally this setting can be left at its default value of plain
. However, for the advanced user, the threadlocal
option can provide the service of managing connections against the current thread in which they were pulled from the connection pool, where the same underlying DBAPI connection as well as a single database-level transaction can then be shared by many operations without explicitly passing a Connection
or Transaction
object around. It also may reduce the number of connections checked out from the connection pool at a given time.
Note that this setting does not affect the fact that Connection and Transaction objects are not threadsafe. The "threadlocal" strategy affects the selection of DBAPI connections which are pulled from the connection pool when a Connection
object is created, but does not synchronize method access to the Connection
or Transaction
instances themselves, which are only proxy objects. It is instead intended that many Connection
instances would share access to a single "connection" object that is referenced in relation to the current thread.
When strategy
is set to plain
, each implicit execution requests a unique connection from the connection pool, which is returned to the pool when the resulting ResultProxy
falls out of scope (i.e. __del__()
is called) or its close()
method is called. If a second implicit execution occurs while the ResultProxy
from the previous execution is still open, then a second connection is pulled from the pool.
When strategy
is set to threadlocal
, the Engine
still checks out a connection which is closeable in the same manner via the ResultProxy
, except the connection it checks out will be the same connection as one which is already checked out, assuming the operation is in the same thread. When all ResultProxy
objects are closed in a particular thread, the connection is returned to the pool normally.
An additional feature of the threadlocal
selection is that Transaction
objects can be managed implicitly as well, by calling the begin()
,commit()
and rollback()
methods off of the Engine
, or by using Transaction
objects from the thread-local connection.
It is crucial to note that the plain
and threadlocal
contexts do not impact the connect() method on the Engine. connect()
always returns a unique connection. Implicit connections use a different method off of Engine
for their operations called contextual_connect()
.
By default, every call to execute
pulls a dedicated DBAPI connection from the connection pool:
db = create_engine('mysql://localhost/test', strategy='plain') # execute one statement and receive results. r1 now references a DBAPI connection resource. r1 = db.execute("select * from table1") # execute a second statement and receive results. r2 now references a *second* DBAPI connection resource. r2 = db.execute("select * from table2") for row in r1: ... for row in r2: ... # release connection 1 r1.close() # release connection 2 r2.close()
Using the "threadlocal" strategy, all calls to execute
within the same thread will be guaranteed to use the same underlying DBAPI connection, which is only returned to the connection pool when all ResultProxy
instances have been closed.
db = create_engine('mysql://localhost/test', strategy='threadlocal') # execute one statement and receive results. r1 now references a DBAPI connection resource. r1 = db.execute("select * from table1") # execute a second statement and receive results. r2 now references the *same* resource as r1 r2 = db.execute("select * from table2") for row in r1: ... for row in r2: ... # dereference r1. the connection is still held by r2. r1 = None # dereference r2. with no more references to the underlying connection resources, they # are returned to the pool. r2 = None
To get at the actual Connection
object which is used by implicit executions, call the contextual_connection()
method on Engine
:
# threadlocal strategy db = create_engine('mysql://localhost/test', strategy='threadlocal') conn1 = db.contextual_connection() conn2 = db.contextual_connection() >>> conn1.connection is conn2.connection True
When the plain
strategy is used, the contextual_connection()
method is synonymous with the connect()
method; both return a distinct connection from the pool.
One programming pattern that the threadlocal
strategy supports is transparent connection and transaction sharing.
db = create_engine('mysql://localhost/test', strategy='threadlocal') def dosomethingimplicit(): table1.execute("some sql") table1.execute("some other sql") def dosomethingelse(): table2.execute("some sql") conn = db.contextual_connection() # do stuff with conn conn.execute("some other sql") conn.close() def dosomethingtransactional(): conn = db.contextual_connection() trans = conn.begin() # do stuff trans.commit() db.create_transaction() try: dosomethingimplicit() dosomethingelse() dosomethingtransactional() db.commit() except: db.rollback()
In the above example, the program calls three functions dosomethingimplicit()
, dosomethingelse()
and dosomethingtransactional()
. In all three functions, either implicit execution is used, or an explicit Connection
is used via the contextual_connection()
method. This indicates that they all will share the same underlying dbapi connection as well as the same parent Transaction
instance, which is created in the main body of the program via the call to db.create_transaction()
. So while there are several calls that return "new" Transaction
or Connection
objects, in reality only one "real" connection is ever used, and there is only one transaction (i.e. one begin/commit pair) executed.