[ Index ]

PHP Cross Reference of MediaWiki-1.24.0

title

Body

[close]

/docs/ -> database.txt (source)

   1  Some information about database access in MediaWiki.
   2  By Tim Starling, January 2006.
   3  
   4  ------------------------------------------------------------------------
   5      Database layout
   6  ------------------------------------------------------------------------
   7  
   8  For information about the MediaWiki database layout, such as a 
   9  description of the tables and their contents, please see:
  10    https://www.mediawiki.org/wiki/Manual:Database_layout
  11    https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=blob_plain;f=maintenance/tables.sql;hb=HEAD
  12  
  13  
  14  ------------------------------------------------------------------------
  15      API
  16  ------------------------------------------------------------------------
  17  
  18  To make a read query, something like this usually suffices:
  19  
  20  $dbr = wfGetDB( DB_SLAVE );
  21  $res = $dbr->select( /* ...see docs... */ );
  22  foreach ( $res as $row ) {
  23      ...
  24  }
  25  
  26  For a write query, use something like:
  27  
  28  $dbw = wfGetDB( DB_MASTER );
  29  $dbw->insert( /* ...see docs... */ );
  30  
  31  We use the convention $dbr for read and $dbw for write to help you keep
  32  track of whether the database object is a slave (read-only) or a master
  33  (read/write). If you write to a slave, the world will explode. Or to be
  34  precise, a subsequent write query which succeeded on the master may fail
  35  when replicated to the slave due to a unique key collision. Replication
  36  on the slave will stop and it may take hours to repair the database and
  37  get it back online. Setting read_only in my.cnf on the slave will avoid
  38  this scenario, but given the dire consequences, we prefer to have as
  39  many checks as possible.
  40  
  41  We provide a query() function for raw SQL, but the wrapper functions
  42  like select() and insert() are usually more convenient. They take care
  43  of things like table prefixes and escaping for you. If you really need
  44  to make your own SQL, please read the documentation for tableName() and
  45  addQuotes(). You will need both of them.
  46  
  47  
  48  ------------------------------------------------------------------------
  49      Basic query optimisation
  50  ------------------------------------------------------------------------
  51  
  52  MediaWiki developers who need to write DB queries should have some
  53  understanding of databases and the performance issues associated with
  54  them. Patches containing unacceptably slow features will not be
  55  accepted. Unindexed queries are generally not welcome in MediaWiki,
  56  except in special pages derived from QueryPage. It's a common pitfall
  57  for new developers to submit code containing SQL queries which examine
  58  huge numbers of rows. Remember that COUNT(*) is O(N), counting rows in a
  59  table is like counting beans in a bucket.
  60  
  61  
  62  ------------------------------------------------------------------------
  63      Replication
  64  ------------------------------------------------------------------------
  65  
  66  The largest installation of MediaWiki, Wikimedia, uses a large set of
  67  slave MySQL servers replicating writes made to a master MySQL server. It
  68  is important to understand the issues associated with this setup if you
  69  want to write code destined for Wikipedia.
  70  
  71  It's often the case that the best algorithm to use for a given task
  72  depends on whether or not replication is in use. Due to our unabashed
  73  Wikipedia-centrism, we often just use the replication-friendly version,
  74  but if you like, you can use wfGetLB()->getServerCount() > 1 to
  75  check to see if replication is in use.
  76  
  77  === Lag ===
  78  
  79  Lag primarily occurs when large write queries are sent to the master.
  80  Writes on the master are executed in parallel, but they are executed in
  81  serial when they are replicated to the slaves. The master writes the
  82  query to the binlog when the transaction is committed. The slaves poll
  83  the binlog and start executing the query as soon as it appears. They can
  84  service reads while they are performing a write query, but will not read
  85  anything more from the binlog and thus will perform no more writes. This
  86  means that if the write query runs for a long time, the slaves will lag
  87  behind the master for the time it takes for the write query to complete.
  88  
  89  Lag can be exacerbated by high read load. MediaWiki's load balancer will
  90  stop sending reads to a slave when it is lagged by more than 30 seconds.
  91  If the load ratios are set incorrectly, or if there is too much load
  92  generally, this may lead to a slave permanently hovering around 30
  93  seconds lag.
  94  
  95  If all slaves are lagged by more than 30 seconds, MediaWiki will stop
  96  writing to the database. All edits and other write operations will be
  97  refused, with an error returned to the user. This gives the slaves a
  98  chance to catch up. Before we had this mechanism, the slaves would
  99  regularly lag by several minutes, making review of recent edits
 100  difficult.
 101  
 102  In addition to this, MediaWiki attempts to ensure that the user sees
 103  events occurring on the wiki in chronological order. A few seconds of lag
 104  can be tolerated, as long as the user sees a consistent picture from
 105  subsequent requests. This is done by saving the master binlog position
 106  in the session, and then at the start of each request, waiting for the
 107  slave to catch up to that position before doing any reads from it. If
 108  this wait times out, reads are allowed anyway, but the request is
 109  considered to be in "lagged slave mode". Lagged slave mode can be
 110  checked by calling wfGetLB()->getLaggedSlaveMode(). The only
 111  practical consequence at present is a warning displayed in the page
 112  footer.
 113  
 114  === Lag avoidance ===
 115  
 116  To avoid excessive lag, queries which write large numbers of rows should
 117  be split up, generally to write one row at a time. Multi-row INSERT ...
 118  SELECT queries are the worst offenders should be avoided altogether.
 119  Instead do the select first and then the insert.
 120  
 121  === Working with lag ===
 122  
 123  Despite our best efforts, it's not practical to guarantee a low-lag
 124  environment. Lag will usually be less than one second, but may
 125  occasionally be up to 30 seconds. For scalability, it's very important
 126  to keep load on the master low, so simply sending all your queries to
 127  the master is not the answer. So when you have a genuine need for
 128  up-to-date data, the following approach is advised:
 129  
 130  1) Do a quick query to the master for a sequence number or timestamp 2)
 131  Run the full query on the slave and check if it matches the data you got
 132  from the master 3) If it doesn't, run the full query on the master
 133  
 134  To avoid swamping the master every time the slaves lag, use of this
 135  approach should be kept to a minimum. In most cases you should just read
 136  from the slave and let the user deal with the delay.
 137  
 138  
 139  ------------------------------------------------------------------------
 140      Lock contention
 141  ------------------------------------------------------------------------
 142  
 143  Due to the high write rate on Wikipedia (and some other wikis),
 144  MediaWiki developers need to be very careful to structure their writes
 145  to avoid long-lasting locks. By default, MediaWiki opens a transaction
 146  at the first query, and commits it before the output is sent. Locks will
 147  be held from the time when the query is done until the commit. So you
 148  can reduce lock time by doing as much processing as possible before you
 149  do your write queries.
 150  
 151  Often this approach is not good enough, and it becomes necessary to
 152  enclose small groups of queries in their own transaction. Use the
 153  following syntax:
 154  
 155  $dbw = wfGetDB( DB_MASTER );
 156  $dbw->begin( __METHOD__ );
 157  /* Do queries */
 158  $dbw->commit( __METHOD__ );
 159  
 160  Use of locking reads (e.g. the FOR UPDATE clause) is not advised. They
 161  are poorly implemented in InnoDB and will cause regular deadlock errors.
 162  It's also surprisingly easy to cripple the wiki with lock contention. If
 163  you must use them, define a new flag for $wgAntiLockFlags which allows
 164  them to be turned off, because we'll almost certainly need to do so on
 165  the Wikimedia cluster.
 166  
 167  Instead of locking reads, combine your existence checks into your write
 168  queries, by using an appropriate condition in the WHERE clause of an
 169  UPDATE, or by using unique indexes in combination with INSERT IGNORE.
 170  Then use the affected row count to see if the query succeeded.
 171  
 172  ------------------------------------------------------------------------
 173      Supported DBMSs
 174  ------------------------------------------------------------------------
 175  
 176  MediaWiki is written primarily for use with MySQL. Queries are optimized
 177  for it and its schema is considered the canonical version. However,
 178  MediaWiki does support the following other DBMSs to varying degrees.
 179  
 180  * PostgreSQL
 181  * SQLite
 182  * Oracle
 183  * MSSQL
 184  
 185  More information can be found about each of these databases (known issues,
 186  level of support, extra configuration) in the "databases" subdirectory in
 187  this folder.
 188  
 189  ------------------------------------------------------------------------
 190      Use of GROUP BY
 191  ------------------------------------------------------------------------
 192  
 193  MySQL supports GROUP BY without checking anything in the SELECT clause. 
 194  Other DBMSs (especially Postgres) are stricter and require that all the 
 195  non-aggregate items in the SELECT clause appear in the GROUP BY. For 
 196  this reason, it is highly discouraged to use SELECT * with GROUP BY 
 197  queries.
 198  


Generated: Fri Nov 28 14:03:12 2014 Cross-referenced by PHPXref 0.7.1