Performing a bulk set

During development or production you will probably want to add application-specific seed data into Couchbase. This may be data you use to test your application during development, or it may be application-specific content that is pre-populated, such as catalog data.

In general, you need three elements in place to do a bulk upload:

  • Set of data you want to upload. This can be cleanly structured information in a file, a JSON document, or information in a database.

  • Program in the SDK language of your choice. This program that you write will connect to Couchbase Server, read the file or data into memory and then store it to Couchbase Server. You program will typically have an event loop to loop through all the elements you want to store and store them.

  • Any supporting classes used to represent that data you want to store. In some cases you may be storing simple data which can be stored in your loader program as primitive types, in which case you do not need to create a class.

The following PHP example demonstrates a bulk set of sample data on beers and breweries. Sample code and data for this example are at Github: import.php and beer and brewey sample data.

First, here is an example of a JSON record for a beer. This particular beer is in the beer_#17_Cream_Ale.json file in the beer-sample/beer directory.

{
    "_id":"beer_#17_Cream_Ale",
    "brewery":"Big Ridge Brewing",
    "name":"#17 Cream Ale",
    "category":"North American Lager",
    "style":"American-Style Lager",
    "updated":"2010-07-22 20:00:20"
}

We also have brewery data with each brewery in a JSON file located in beer-sample/breweries. Finally we create the script that reads in the directories and stores each file as a record in Couchbase Server:

<?php

// Set up Couchbase client object
try {
  $cb = new Couchbase(COUCHBASE_HOST.':'.COUCHBASE_PORT, COUCHBASE_USER, COUCHBASE_PASSWORD, COUCHBASE_BUCKET);
} catch (ErrorException $e) {
  die($e->getMessage());
}

// import a directory
function import($cb, $dir) {
  $d = dir($dir);
  while (false !== ($file = $d->read())) {
    if (substr($file, -5) != '.json') continue;
    echo "adding $file\n";
    $json = json_decode(file_get_contents($dir . $file), true);
    unset($json["_id"]);
    echo $cb->set(substr($file, 0, -5), json_encode($json));
    echo "\n";
  }
}

// import beers and breweries
import($cb, 'beer-sample/beer/');
import($cb, 'beer-sample/breweries/');

?>

We first create a Couchbase client, then we declare an import function which will read in our files and write them to Couchbase Server. While the import function reads each file into memory, we repeat the same set of operations for each file. If the file is not a JSON file we convert it into JSON. We also omit the first attribute of ‘_id’ from the file since we already provide unique file names and use the filename itself as a key; therefore we do not need this as a unique identifier. Then we store the value to Couchbase Server as JSON and use the filename, minus the.json file extension as the key for each record.