[ Index ] |
PHP Cross Reference of MediaWiki-1.24.0 |
[Summary view] [Print] [Text view]
1 /*! 2 \ingroup FileBackend 3 \page file_backend_design File backend design 4 5 Some notes on the FileBackend architecture. 6 7 \section intro Introduction 8 9 To abstract away the differences among different types of storage media, 10 MediaWiki is providing an interface known as FileBackend. Any MediaWiki 11 interaction with stored files should thus use a FileBackend object. 12 13 Different types of backing storage media are supported (ranging from local 14 file system to distributed object stores). The types include: 15 16 * FSFileBackend (used for mounted file systems) 17 * SwiftFileBackend (used for Swift or Ceph Rados+RGW object stores) 18 * FileBackendMultiWrite (useful for transitioning from one backend to another) 19 20 Configuration documentation for each type of backend is to be found in their 21 __construct() inline documentation. 22 23 24 \section setup Setup 25 26 File backends are registered in LocalSettings.php via the global variable 27 $wgFileBackends. To access one of those defined backends, one would use 28 FileBackendStore::get( <name> ) which will bring back a FileBackend object 29 handle. Such handles are reused for any subsequent get() call (via singleton). 30 The FileBackends objects are caching request calls such as file stats, 31 SHA1 requests or TCP connection handles. 32 33 \par Note: 34 Some backends may require additional PHP extensions to be enabled or can rely on a 35 MediaWiki extension. This is often the case when a FileBackend subclass makes use of an 36 upstream client API for communicating with the backing store. 37 38 39 \section fileoperations File operations 40 41 The MediaWiki FileBackend API supports various operations on either files or 42 directories. See FileBackend.php for full documentation for each function. 43 44 45 \subsection reading Reading 46 47 The following basic operations are supported for reading from a backend: 48 49 On files: 50 * stat a file for basic information (timestamp, size) 51 * read a file into a string or several files into a map of path names to strings 52 * download a file or set of files to a temporary file (on a mounted file system) 53 * get the SHA1 hash of a file 54 * get various properties of a file (stat information, content time, MIME information, ...) 55 56 On directories: 57 * get a list of files directly under a directory 58 * get a recursive list of files under a directory 59 * get a list of directories directly under a directory 60 * get a recursive list of directories under a directory 61 62 \par Note: 63 Backend handles should return directory listings as iterators, all though in some cases 64 they may just be simple arrays (which can still be iterated over). Iterators allow for 65 callers to traverse a large number of file listings without consuming excessive RAM in 66 the process. Either the memory consumed is flatly bounded (if the iterator does paging) 67 or it is proportional to the depth of the portion of the directory tree being traversed 68 (if the iterator works via recursion). 69 70 71 \subsection writing Writing 72 73 The following basic operations are supported for writing or changing in the backend: 74 75 On files: 76 * store (copying a mounted file system file into storage) 77 * create (creating a file within storage from a string) 78 * copy (within storage) 79 * move (within storage) 80 * delete (within storage) 81 * lock/unlock (lock or unlock a file in storage) 82 83 The following operations are supported for writing directories in the backend: 84 * prepare (create parent container and directories for a path) 85 * secure (try to lock-down access to a container) 86 * publish (try to reverse the effects of secure) 87 * clean (remove empty containers or directories) 88 89 90 \subsection invokingoperation Invoking an operation 91 92 Generally, callers should use doOperations() or doQuickOperations() when doing 93 batches of changes, rather than making a suite of single operation calls. This 94 makes the system tolerate high latency much better by pipelining operations 95 when possible. 96 97 doOperations() should be used for working on important original data, i.e. when 98 consistency is important. The former will only pipeline operations that do not 99 depend on each other. It is best if the operations that do not depend on each 100 other occur in consecutive groups. This function can also log file changes to 101 a journal (see FileJournal), which can be used to sync two backend instances. 102 One might use this function for user uploads of file for example. 103 104 doQuickOperations() is more geared toward ephemeral items that can be easily 105 regenerated from original data. It will always pipeline without checking for 106 dependencies within the operation batch. One might use this function for 107 creating and purging generated thumbnails of original files for example. 108 109 110 \section consistency Consistency 111 112 Not all backing stores are sequentially consistent by default. Various FileBackend 113 functions offer a "latest" option that can be passed in to assure (or try to assure) 114 that the latest version of the file is read. Some backing stores are consistent by 115 default, but callers should always assume that without this option, stale data may 116 be read. This is actually true for stores that have eventual consistency. 117 118 Note that file listing functions have no "latest" flag, and thus some systems may 119 return stale data. Thus callers should avoid assuming that listings contain changes 120 made my the current client or any other client from a very short time ago. For example, 121 creating a file under a directory and then immediately doing a file listing operation 122 on that directory may result in a listing that does not include that file. 123 124 125 \section locking Locking 126 127 Locking is effective if and only if a proper lock manager is registered and is 128 actually being used by the backend. Lock managers can be registered in LocalSettings.php 129 using the $wgLockManagers global configuration variable. 130 131 For object stores, locking is not generally useful for avoiding partially 132 written or read objects, since most stores use Multi Version Concurrency 133 Control (MVCC) to avoid this. However, locking can be important when: 134 * One or more operations must be done without objects changing in the meantime. 135 * It can also be useful when a file read is used to determine a file write or DB change. 136 For example, doOperations() first checks that there will be no "file already exists" 137 or "file does not exist" type errors before attempting an operation batch. This works 138 by stating the files first, and is only safe if the files are locked in the meantime. 139 140 When locking, callers should use the latest available file data for reads. 141 Also, one should always lock the file *before* reading it, not after. If stale data is 142 used to determine a write, there will be some data corruption, even when reads of the 143 original file finally start returning the updated data without needing the "latest" 144 option (eventual consistency). The "scoped" lock functions are preferable since 145 there is not the problem of forgetting to unlock due to early returns or exceptions. 146 147 Since acquiring locks can fail, and lock managers can be non-blocking, callers should: 148 * Acquire all required locks up font 149 * Be prepared for the case where locks fail to be acquired 150 * Possible retry acquiring certain locks 151 152 MVCC is also a useful pattern to use on top of the backend interface, because operations 153 are not atomic, even with doOperations(), so doing complex batch file changes or changing 154 files and updating a database row can result in partially written "transactions". Thus one 155 should avoid changing files once they have been stored, except perhaps with ephemeral data 156 that are tolerant of some degree of inconsistency. 157 158 Callers can use their own locking (e.g. SELECT FOR UPDATE) if it is more convenient, but 159 note that all callers that change any of the files should then go through functions that 160 acquire these locks. For example, if a caller just directly uses the file backend store() 161 function, it will ignore any custom "FOR UPDATE" locks, which can cause problems. 162 163 \section objectstore Object stores 164 165 Support for object stores (like Amazon S3/Swift) drive much of the API and design 166 decisions of FileBackend, but using any POSIX compliant file systems works fine. 167 The system essentially stores "files" in "containers". For a mounted file system 168 as a backing store, "files" will just be files under directories. For an object store 169 as a backing store, the "files" will be objects stored in actual containers. 170 171 172 \section file_obj_diffs File system and Object store differences 173 174 An advantage of object stores is the reduced Round-Trip Times. This is 175 achieved by avoiding the need to create each parent directory before placing a 176 file somewhere. It gets worse the deeper the directory hierarchy is. Another 177 advantage of object stores is that object listings tend to use databases, which 178 scale better than the linked list directories that file sytems sometimes use. 179 File systems like btrfs and xfs use tree structures, which scale better. 180 For both object stores and file systems, using "/" in filenames will allow for the 181 intuitive use of directory functions. For example, creating a file in Swift 182 called "container/a/b/file1" will mean that: 183 - a "directory listing" of "container/a" will contain "b", 184 - and a "file listing" of "b" will contain "file1" 185 186 This means that switching from an object store to a file system and vise versa 187 using the FileBackend interface will generally be harmless. However, one must be 188 aware of some important differences: 189 190 * In a file system, you cannot have a file and a directory within the same path 191 whereas it is possible in an object stores. Calling code should avoid any layouts 192 which allow files and directories at the same path. 193 * Some file systems have file name length restrictions or overall path length 194 restrictions that others do not. The same goes with object stores which might 195 have a maximum object length or a limitation regarding the number of files 196 under a container or volume. 197 * Latency varies among systems, certain access patterns may not be tolerable for 198 certain backends but may hold up for others. Some backend subclasses use 199 MediaWiki's object caching for serving stat requests, which can greatly 200 reduce latency. Making sure that the backend has pipelining (see the 201 "parallelize" and "concurrency" settings) enabled can also mask latency in 202 batch operation scenarios. 203 * File systems may implement directories as linked-lists or other structures 204 with poor scalability, so calling code should use layouts that shard the data. 205 Instead of storing files like "container/file.txt", one can store files like 206 "container/<x>/<y>/file.txt". It is best if "sharding" optional or configurable. 207 208 */
title
Description
Body
title
Description
Body
title
Description
Body
title
Body
Generated: Fri Nov 28 14:03:12 2014 | Cross-referenced by PHPXref 0.7.1 |