PHPXRef 0.7.1 : Phabricator : /src/docs/contributor/database.diviner source

[Summary view] [Print] [Text view]
   1  @title Database Schema
   2  @group developer
   3  
   4  This document describes key components of the database schema and should answer
   5  questions like how to store new types of data.
   6  
   7  Database System
   8  ===============
   9  
  10  Phabricator uses MySQL or another MySQL-compatible database (like MariaDB
  11  or Amazon RDS).
  12  
  13  Phabricator the InnoDB table engine. The only exception is the
  14  `search_documentfield` table which uses MyISAM because MySQL doesn't support
  15  fulltext search in InnoDB (recent versions do, but we haven't added support
  16  yet).
  17  
  18  We are unlikely to ever support other incompatible databases like PostgreSQL or
  19  SQLite.
  20  
  21  PHP Drivers
  22  ===========
  23  
  24  Phabricator supports [[ http://www.php.net/book.mysql | MySQL ]] and
  25  [[ http://www.php.net/book.mysqli | MySQLi ]] PHP extensions.
  26  
  27  Databases
  28  =========
  29  
  30  Each Phabricator application has its own database. The names are prefixed by
  31  `phabricator_` (this is configurable). This design has two advantages:
  32  
  33    - Each database is easier to comprehend and to maintain.
  34    - We don't do cross-database joins so each database can live on its own
  35      machine. This gives us flexibility in sharding data later.
  36  
  37  Connections
  38  ===========
  39  
  40  Phabricator specifies if it will use any opened connection just for reading or
  41  also for writing. This allows opening write connections to a primary and read
  42  connections to a replica in primary/replica setups (which are not actually
  43  supported yet).
  44  
  45  Tables
  46  ======
  47  
  48  Most table names are prefixed by their application names. For example,
  49  Differential revisions are stored in database `phabricator_differential` and
  50  table `differential_revision`. This generally makes queries easier to recognize
  51  and understand.
  52  
  53  The exception is a few tables which share the same schema over different
  54  databases such as `edge`.
  55  
  56  We use lower-case table names with words separated by underscores.
  57  
  58  Column Names
  59  ============
  60  
  61  Phabricator uses `camelCase` names for columns. The main advantage is that they
  62  directly map to properties in PHP classes.
  63  
  64  Don't use MySQL reserved words (such as `order`) for column names.
  65  
  66  Data Types
  67  ==========
  68  
  69  Phabricator defines a set of abstract data types (like `uint32`, `epoch`, and
  70  `phid`) which map to MySQL column types. The mapping depends on the MySQL
  71  version.
  72  
  73  Phabricator uses `utf8mb4` character sets where available (MySQL 5.5 or newer),
  74  and `binary` character sets in most other cases. The primary motivation is to
  75  allow 4-byte unicode characters to be stored (the `utf8` character set, which
  76  is more widely available, does not support them). On newer MySQL, we use
  77  `utf8mb4` to take advantage of improved collation rules.
  78  
  79  Phabricator stores dates with an `epoch` abstract data type, which maps to
  80  `int unsigned`. Although this makes dates less readable when browsing the
  81  database, it makes date and time manipulation more consistent and
  82  straightforward in the application.
  83  
  84  We don't use the `enum` data type because each change to the list of possible
  85  values requires altering the table (which is slow with big tables). We use
  86  numbers (or short strings in some cases) mapped to PHP constants instead.
  87  
  88  JSON and Other Serialized Data
  89  ==============================
  90  
  91  Some data don't require structured access -- we don't need to filter or order by
  92  them. We store these data as text fields in JSON format. This approach has
  93  several advantages:
  94  
  95    - If we decide to add another unstructured field then we don't need to alter
  96      the table (which is slow for big tables in MySQL).
  97    - Table structure is not cluttered by fields which could be unused most of the
  98      time.
  99  
 100  An example of such usage can be found in column
 101  `differential_diffproperty.data`.
 102  
 103  Primary Keys
 104  ============
 105  
 106  Most tables have auto-increment column named `id`. Adding an ID column is
 107  appropriate for most tables (even tables that have another natural unique key),
 108  as it improves consistency and makes it easier to perform generic operations
 109  on objects.
 110  
 111  For example, @{class:LiskMigrationIterator} allows you to very easily apply a
 112  migration to a table using a constant amount of memory provided the table has
 113  an `id` column.
 114  
 115  Indexes
 116  ======
 117  
 118  Create all indexes necessary for fast query execution in most cases. Don't
 119  create indexes which are not used. You can analyze queries @{article:Using
 120  DarkConsole}.
 121  
 122  Older MySQL versions are not able to use indexes for tuple search:
 123  `(a, b) IN ((%s, %d), (%s, %d))`. Use `AND` and `OR` instead:
 124  `((a = %s AND b = %d) OR (a = %s AND b = %d))`.
 125  
 126  Foreign Keys
 127  ============
 128  
 129  We don't use foreign keys because they're complicated and we haven't experienced
 130  significant issues with data inconsistency that foreign keys could help prevent.
 131  Empirically, we have witnessed first hand as `ON DELETE CASCADE` relationships
 132  accidentally destroy huge amounts of data. We may pursue foreign keys
 133  eventually, but there isn't a strong case for them at the present time.
 134  
 135  PHIDs
 136  =====
 137  
 138  Each globally referencable object in Phabricator has its associated PHID
 139  ("Phabricator ID") which serves as a global identifier, similar to a GUID.
 140  We use PHIDs for referencing data in different databases.
 141  
 142  We use both autoincrementing IDs and global PHIDs because each is useful in
 143  different contexts. Autoincrementing IDs are meaningfully ordered and allow
 144  us to construct short, human-readable object names (like `D2258`) and URIs.
 145  Global PHIDs allow us to represent relationships between different types of
 146  objects in a homogeneous way.
 147  
 148  For example, infrastructure like "subscribers" can be implemented easily with
 149  PHID relationships: different types of objects (users, projects, mailing lists)
 150  are permitted to subscribe to different types of objects (revisions, tasks,
 151  etc). Without PHIDs, we would need to add a "type" column to avoid ID collision;
 152  using PHIDs makes implementing features like this simpler.
 153  
 154  Transactions
 155  ============
 156  
 157  Transactional code should be written using transactions. Example of such code is
 158  inserting multiple records where one doesn't make sense without the other or
 159  selecting data later used for update. See chapter in @{class:LiskDAO}.
 160  
 161  Advanced Features
 162  =================
 163  
 164  We don't use MySQL advanced features such as triggers, stored procedures or
 165  events because we like expressing the application logic in PHP more than in SQL.
 166  Some of these features (especially triggers) can also cause a great deal of
 167  confusion, and are generally more difficult to debug, profile, version control,
 168  update, and understand than application code.
 169  
 170  Schema Denormalization
 171  ======================
 172  
 173  Phabricator uses schema denormalization sparingly. Avoid denormalization unless
 174  there is a compelling reason (usually, performance) to denormalize.
 175  
 176  Schema Changes and Migrations
 177  =============================
 178  
 179  To create a new schema change or migration:
 180  
 181  **Create a database patch**. Database patches go in
 182  `resources/sql/autopatches/`. To change a schema, use a `.sql` file and write
 183  in SQL. To perform a migration, use a `.php` file and write in PHP. Name your
 184  file `YYYYMMDD.patchname.ext`. For example, `20141225.christmas.sql`.
 185  
 186  **Keep patches small**. Most schema change statements are not transactional. If
 187  a patch contains several SQL statements and fails partway through, it normally
 188  can not be rolled back. When a user tries to apply the patch again later, the
 189  first statement (which, for example, adds a column) may fail (because the column
 190  already exists). This can be avoided by keeping patches small (generally, one
 191  statement per patch).
 192  
 193  **Use namespace and character set variables**. When defining a `.sql` patch,
 194  you should use these variables instead of hard-coding namespaces or character
 195  set names:
 196  
 197  | Variable | Meaning | Notes |
 198  |---|---|---|
 199  | {$NAMESPACE} | Storage Namespace | Defaults to `phabricator` |
 200  | {$CHARSET} | Default Charset | Mostly used to specify table charset |
 201  | {$COLLATE_TEXT} | Text Collation | For most text (case-sensitive) |
 202  | {$COLLATE_SORT} | Sort Collation | For sortable text (case-insensitive) |
 203  | {$CHARSET_FULLTEXT} | Fulltext Charset | Specify explicitly for fulltext |
 204  | {$COLLATE_FULLTEXT} | Fulltext Collate | Specify explicitly for fulltext |
 205  
 206  
 207  **Test your patch**. Run `bin/storage upgrade` to test your patch.
 208  
 209  See Also
 210  ========
 211  
 212    - @{class:LiskDAO}
PHP Cross Reference of Phabricator

/src/docs/contributor/ -> database.diviner (source)