[ Index ] |
PHP Cross Reference of MediaWiki-1.24.0 |
[Summary view] [Print] [Text view]
1 Some information about database access in MediaWiki. 2 By Tim Starling, January 2006. 3 4 ------------------------------------------------------------------------ 5 Database layout 6 ------------------------------------------------------------------------ 7 8 For information about the MediaWiki database layout, such as a 9 description of the tables and their contents, please see: 10 https://www.mediawiki.org/wiki/Manual:Database_layout 11 https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=blob_plain;f=maintenance/tables.sql;hb=HEAD 12 13 14 ------------------------------------------------------------------------ 15 API 16 ------------------------------------------------------------------------ 17 18 To make a read query, something like this usually suffices: 19 20 $dbr = wfGetDB( DB_SLAVE ); 21 $res = $dbr->select( /* ...see docs... */ ); 22 foreach ( $res as $row ) { 23 ... 24 } 25 26 For a write query, use something like: 27 28 $dbw = wfGetDB( DB_MASTER ); 29 $dbw->insert( /* ...see docs... */ ); 30 31 We use the convention $dbr for read and $dbw for write to help you keep 32 track of whether the database object is a slave (read-only) or a master 33 (read/write). If you write to a slave, the world will explode. Or to be 34 precise, a subsequent write query which succeeded on the master may fail 35 when replicated to the slave due to a unique key collision. Replication 36 on the slave will stop and it may take hours to repair the database and 37 get it back online. Setting read_only in my.cnf on the slave will avoid 38 this scenario, but given the dire consequences, we prefer to have as 39 many checks as possible. 40 41 We provide a query() function for raw SQL, but the wrapper functions 42 like select() and insert() are usually more convenient. They take care 43 of things like table prefixes and escaping for you. If you really need 44 to make your own SQL, please read the documentation for tableName() and 45 addQuotes(). You will need both of them. 46 47 48 ------------------------------------------------------------------------ 49 Basic query optimisation 50 ------------------------------------------------------------------------ 51 52 MediaWiki developers who need to write DB queries should have some 53 understanding of databases and the performance issues associated with 54 them. Patches containing unacceptably slow features will not be 55 accepted. Unindexed queries are generally not welcome in MediaWiki, 56 except in special pages derived from QueryPage. It's a common pitfall 57 for new developers to submit code containing SQL queries which examine 58 huge numbers of rows. Remember that COUNT(*) is O(N), counting rows in a 59 table is like counting beans in a bucket. 60 61 62 ------------------------------------------------------------------------ 63 Replication 64 ------------------------------------------------------------------------ 65 66 The largest installation of MediaWiki, Wikimedia, uses a large set of 67 slave MySQL servers replicating writes made to a master MySQL server. It 68 is important to understand the issues associated with this setup if you 69 want to write code destined for Wikipedia. 70 71 It's often the case that the best algorithm to use for a given task 72 depends on whether or not replication is in use. Due to our unabashed 73 Wikipedia-centrism, we often just use the replication-friendly version, 74 but if you like, you can use wfGetLB()->getServerCount() > 1 to 75 check to see if replication is in use. 76 77 === Lag === 78 79 Lag primarily occurs when large write queries are sent to the master. 80 Writes on the master are executed in parallel, but they are executed in 81 serial when they are replicated to the slaves. The master writes the 82 query to the binlog when the transaction is committed. The slaves poll 83 the binlog and start executing the query as soon as it appears. They can 84 service reads while they are performing a write query, but will not read 85 anything more from the binlog and thus will perform no more writes. This 86 means that if the write query runs for a long time, the slaves will lag 87 behind the master for the time it takes for the write query to complete. 88 89 Lag can be exacerbated by high read load. MediaWiki's load balancer will 90 stop sending reads to a slave when it is lagged by more than 30 seconds. 91 If the load ratios are set incorrectly, or if there is too much load 92 generally, this may lead to a slave permanently hovering around 30 93 seconds lag. 94 95 If all slaves are lagged by more than 30 seconds, MediaWiki will stop 96 writing to the database. All edits and other write operations will be 97 refused, with an error returned to the user. This gives the slaves a 98 chance to catch up. Before we had this mechanism, the slaves would 99 regularly lag by several minutes, making review of recent edits 100 difficult. 101 102 In addition to this, MediaWiki attempts to ensure that the user sees 103 events occurring on the wiki in chronological order. A few seconds of lag 104 can be tolerated, as long as the user sees a consistent picture from 105 subsequent requests. This is done by saving the master binlog position 106 in the session, and then at the start of each request, waiting for the 107 slave to catch up to that position before doing any reads from it. If 108 this wait times out, reads are allowed anyway, but the request is 109 considered to be in "lagged slave mode". Lagged slave mode can be 110 checked by calling wfGetLB()->getLaggedSlaveMode(). The only 111 practical consequence at present is a warning displayed in the page 112 footer. 113 114 === Lag avoidance === 115 116 To avoid excessive lag, queries which write large numbers of rows should 117 be split up, generally to write one row at a time. Multi-row INSERT ... 118 SELECT queries are the worst offenders should be avoided altogether. 119 Instead do the select first and then the insert. 120 121 === Working with lag === 122 123 Despite our best efforts, it's not practical to guarantee a low-lag 124 environment. Lag will usually be less than one second, but may 125 occasionally be up to 30 seconds. For scalability, it's very important 126 to keep load on the master low, so simply sending all your queries to 127 the master is not the answer. So when you have a genuine need for 128 up-to-date data, the following approach is advised: 129 130 1) Do a quick query to the master for a sequence number or timestamp 2) 131 Run the full query on the slave and check if it matches the data you got 132 from the master 3) If it doesn't, run the full query on the master 133 134 To avoid swamping the master every time the slaves lag, use of this 135 approach should be kept to a minimum. In most cases you should just read 136 from the slave and let the user deal with the delay. 137 138 139 ------------------------------------------------------------------------ 140 Lock contention 141 ------------------------------------------------------------------------ 142 143 Due to the high write rate on Wikipedia (and some other wikis), 144 MediaWiki developers need to be very careful to structure their writes 145 to avoid long-lasting locks. By default, MediaWiki opens a transaction 146 at the first query, and commits it before the output is sent. Locks will 147 be held from the time when the query is done until the commit. So you 148 can reduce lock time by doing as much processing as possible before you 149 do your write queries. 150 151 Often this approach is not good enough, and it becomes necessary to 152 enclose small groups of queries in their own transaction. Use the 153 following syntax: 154 155 $dbw = wfGetDB( DB_MASTER ); 156 $dbw->begin( __METHOD__ ); 157 /* Do queries */ 158 $dbw->commit( __METHOD__ ); 159 160 Use of locking reads (e.g. the FOR UPDATE clause) is not advised. They 161 are poorly implemented in InnoDB and will cause regular deadlock errors. 162 It's also surprisingly easy to cripple the wiki with lock contention. If 163 you must use them, define a new flag for $wgAntiLockFlags which allows 164 them to be turned off, because we'll almost certainly need to do so on 165 the Wikimedia cluster. 166 167 Instead of locking reads, combine your existence checks into your write 168 queries, by using an appropriate condition in the WHERE clause of an 169 UPDATE, or by using unique indexes in combination with INSERT IGNORE. 170 Then use the affected row count to see if the query succeeded. 171 172 ------------------------------------------------------------------------ 173 Supported DBMSs 174 ------------------------------------------------------------------------ 175 176 MediaWiki is written primarily for use with MySQL. Queries are optimized 177 for it and its schema is considered the canonical version. However, 178 MediaWiki does support the following other DBMSs to varying degrees. 179 180 * PostgreSQL 181 * SQLite 182 * Oracle 183 * MSSQL 184 185 More information can be found about each of these databases (known issues, 186 level of support, extra configuration) in the "databases" subdirectory in 187 this folder. 188 189 ------------------------------------------------------------------------ 190 Use of GROUP BY 191 ------------------------------------------------------------------------ 192 193 MySQL supports GROUP BY without checking anything in the SELECT clause. 194 Other DBMSs (especially Postgres) are stricter and require that all the 195 non-aggregate items in the SELECT clause appear in the GROUP BY. For 196 this reason, it is highly discouraged to use SELECT * with GROUP BY 197 queries. 198
title
Description
Body
title
Description
Body
title
Description
Body
title
Body
Generated: Fri Nov 28 14:03:12 2014 | Cross-referenced by PHPXref 0.7.1 |