[ Index ]

PHP Cross Reference of Phabricator

title

Body

[close]

/src/docs/flavor/ -> things_you_should_do_now.diviner (source)

   1  @title Things You Should Do Now
   2  @group sundry
   3  
   4  Describes things you should do now when building software, because the cost to
   5  do them increases over time and eventually becomes prohibitive or impossible.
   6  
   7  
   8  = Overview =
   9  
  10  If you're building a hot new web startup, there are a lot of decisions to make
  11  about what to focus on. Most things you'll build will take about the same amount
  12  of time to build regardless of what order you build them in, but there are a few
  13  technical things which become vastly more expensive to fix later.
  14  
  15  If you don't do these things early in development, they'll become very hard or
  16  impossible to do later. This is basically a list of things that would have saved
  17  Facebook huge amounts of time and effort down the road if someone had spent
  18  a tiny amount of time on them earlier in the development process.
  19  
  20  See also @{article:Things You Should Do Soon} for things that scale less
  21  drastically over time.
  22  
  23  
  24  = Start IDs At a Gigantic Number =
  25  
  26  If you're using integer IDs to identify data or objects, **don't** start your
  27  IDs at 1. Start them at a huge number (e.g., 2^33) so that no object ID will
  28  ever appear in any other role in your application (like a count, a natural
  29  index, a byte size, a timestamp, etc). This takes about 5 seconds if you do it
  30  before you launch and rules out a huge class of nasty bugs for all time. It
  31  becomes incredibly difficult as soon as you have production data.
  32  
  33  The kind of bug that this causes is accidental use of some other value as an ID:
  34  
  35    COUNTEREXAMPLE
  36    // Load the user's friends, returns a map of friend_id => true
  37    $friend_ids = user_get_friends($user_id);
  38  
  39    // Get the first 8 friends.
  40    $first_few_friends = array_slice($friend_ids, 0, 8);
  41  
  42    // Render those friends.
  43    render_user_friends($user_id, array_keys($first_few_friends));
  44  
  45  Because array_slice() in PHP discards array indices and renumbers them, this
  46  doesn't render the user's first 8 friends but the users with IDs 0 through 7,
  47  e.g. Mark Zuckerberg (ID 4) and Dustin Moskovitz (ID 6). If you have IDs in this
  48  range, sooner or later something that isn't an ID will get treated like an ID
  49  and the operation will be valid and cause unexpected behavior. This is
  50  completely avoidable if you start your IDs at a gigantic number.
  51  
  52  
  53  = Only Store Valid UTF-8 =
  54  
  55  For the most part, you can ignore UTF-8 and unicode until later. However, there
  56  is one aspect of unicode you should address now: store only valid UTF-8 strings.
  57  
  58  Assuming you're storing data internally as UTF-8 (this is almost certainly the
  59  right choice and definitely the right choice if you have no idea how unicode
  60  works), you just need to sanitize all the data coming into your application and
  61  make sure it's valid UTF-8.
  62  
  63  If your application emits invalid UTF-8, other systems (like browsers) will
  64  break in unexpected and interesting ways. You will eventually be forced to
  65  ensure you emit only valid UTF-8 to avoid these problems. If you haven't
  66  sanitized your data, you'll basically have two options:
  67  
  68    - do a huge migration on literally all of your data to sanitize it; or
  69    - forever sanitize all data on its way out on the read pathways.
  70  
  71  As of 2011 Facebook is in the second group, and spends several milliseconds of
  72  CPU time sanitizing every display string on its way to the browser, which
  73  multiplies out to hundreds of servers worth of CPUs sitting in a datacenter
  74  paying the price for the invalid UTF-8 in the databases.
  75  
  76  You can likely learn enough about unicode to be confident in an implementation
  77  which addresses this problem within a few hours. You don't need to learn
  78  everything, just the basics. Your language probably already has a function which
  79  does the sanitizing for you.
  80  
  81  
  82  = Never Design a Blacklist-Based Security System =
  83  
  84  When you have an alternative, don't design security systems which are default
  85  permit, blacklist-based, or otherwise attempt to enumerate badness. When
  86  Facebook launched Platform, it launched with a blacklist-based CSS filter, which
  87  basically tried to enumerate all the "bad" parts of CSS and filter them out.
  88  This was a poor design choice and lead to basically infinite security holes for
  89  all time.
  90  
  91  It is very difficult to enumerate badness in a complex system and badness is
  92  often a moving target. Instead of trying to do this, design whitelist-based
  93  security systems where you list allowed things and reject anything you don't
  94  understand. Assume things are bad until you verify that they're OK.
  95  
  96  It's tempting to design blacklist-based systems because they're easier to write
  97  and accept more inputs. In the case of the CSS filter, the product goal was for
  98  users to just be able to use CSS normally and feel like this system was no
  99  different from systems they were familiar with. A whitelist-based system would
 100  reject some valid, safe inputs and create product friction.
 101  
 102  But this is a much better world than the alternative, where the blacklist-based
 103  system fails to reject some dangerous inputs and creates //security holes//. It
 104  //also// creates product friction because when you fix those holes you break
 105  existing uses, and that backward-compatibility friction makes it very difficult
 106  to move the system from a blacklist to a whitelist. So you're basically in
 107  trouble no matter what you do, and have a bunch of security holes you need to
 108  unbreak immediately, so you won't even have time to feel sorry for yourself.
 109  
 110  Designing blacklist-based security is one of the worst now-vs-future tradeoffs
 111  you can make. See also "The Six Dumbest Ideas in Computer Security":
 112  
 113  http://www.ranum.com/security/computer_security/
 114  
 115  
 116  = Fail Very Loudly when SQL Syntax Errors Occur in Production =
 117  
 118  This doesn't apply if you aren't using SQL, but if you are: detect when a query
 119  fails because of a syntax error (in MySQL, it is error 1064). If the failure
 120  happened in production, fail in the loudest way possible. (I implemented this in
 121  2008 at Facebook and had it just email me and a few other people directly. The
 122  system was eventually refined.)
 123  
 124  This basically creates a high-signal stream that tells you where you have SQL
 125  injection holes in your application. It will have some false positives and could
 126  theoretically have false negatives, but at Facebook it was pretty high signal
 127  considering how important the signal is.
 128  
 129  Of course, the real solution here is to not have SQL injection holes in your
 130  application, ever. As far as I'm aware, this system correctly detected the one
 131  SQL injection hole we had from mid-2008 until I left in 2011, which was in a
 132  hackathon project on an underisolated semi-production tier and didn't use the
 133  query escaping system the rest of the application does.
 134  
 135  Hopefully, whatever language you're writing in has good query libraries that
 136  can handle escaping for you. If so, use them. If you're using PHP and don't have
 137  a solution in place yet, the Phabricator implementation of qsprintf() is similar
 138  to Facebook's system and was successful there.


Generated: Sun Nov 30 09:20:46 2014 Cross-referenced by PHPXref 0.7.1