Suppose you are sitting in a coffe shop working on your book. Chris comes over and tells you about his new phone. The new phone came with a new number and you have Chris dictate it while you change it using your laptop’s address book application.
Luckily, your address book is built on CouchDB; so when you come home, all you need to do to get your home computer up to date with Chris’s number is replicate your address book from your laptop. Neat, eh? What’s more, CouchDB has a mechanism to maintain continuous replication, so you can keep a whole set of computers in sync with the same data, whenever a network connection is available.
Let’s change the scenario a little bit. Chris didn’t anticipate meeting you at the coffee shop and sent you a mail with the new number. You weren’t using the WiFi so you could concentrate on your work. You didn’t read his email until getting back home and meanwhile, it was a long day, you have long forgotten that you changed the number in the address book on your laptop. You read the email, however, when getting back home and you simply copy & paste the number into your address book on your home computer. Now, and here is the twist, you copied the number wrong on your laptop’s address book.
You now have a document in each of the databases that has different information. This situation is called a conflict. Conflicts occur in distributed systems. They are a natural state of your data. How does CouchDB’s replication system deal with conflicts? [fix story-to-textbook-lingo-switch]
When you replicate two databases in CouchDB and you have conflicting changes, CouchDB will detect that and flag the affected document with the special attribute "_conflicts":true. Next, CouchDB determines which of the changes will be stored as the latest revision (remember, documents in CouchDB are versioned). The version that gets picked to be the latest revision is the winning revision. The losing revision gets stored as the previous revision.
CouchDB does not attempt to merge the conflicting revision. Your application dictates how the merging should be done. The choice of picking the winning revision is arbitrary. In the case of the phone number, there is no way for a computer to decide on the right revision. This is not specific to CouchDB, no other software can this.
Replication guarantees that conflicts are detected and that each instance of CouchDB makes the same choice regarding winners and losers, independent of all the other instances. There is no group-decision made, instead, a deterministic algorithm determines the order of the conflicting revision. After replication, all instances taking part have the same data. The data set is said to be in a consistent state. If you ask any instance for a document, you will get the same answer regardless of which one you ask.
Whether or not, CouchDB picked the version that your application needs, you need to go and resolve the conflict, just as you need to resolve a conflict in a version control system like Subversion. Simply create a version that you want to be the latest by either picking the lastest, or the previous, or both (by merging them) and save it as the now latest revision. Done. Replicate again and your resolution will populate over to all other instances of CouchDB. Your conflict resolving on one node could lead to further conflicts all of which will need to be addressed, but eventually, you will end up with a conflict free database on all nodes.