C++ Boost

Introduction

The boost Tokenizer package provides a flexible and easy to use way to break of a string or other character sequence into a series of tokens. Below is a simple example that will break up a phrase into words.

// simple_example_1.cpp
#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>

int main(){
   using namespace std;
   using namespace boost;
   string s = "This is,  a test";
   tokenizer<> tok(s);
   for(tokenizer<>::iterator beg=tok.begin(); beg!=tok.end();++beg){
       cout << *beg << "\n";
   }
}

You can choose how the string gets broken up. You do this by specifying the TokenizerFunction. If you do not specify anything, the default TokenizerFunction is char_delimiters_separator<char> which defaults to breaking up a string based on space and punctuation. Here is an example of using another TokenizerFunction called escaped_list_separator. This TokenizerFunction parses a superset of comma separated value (csv) lines. The format looks like this

Field 1,"putting quotes around fields, allows commas",Field 3

Below is an example that will break the previous line into its 3 fields

// simple_example_2.cpp
#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>

int main(){
   using namespace std;
   using namespace boost;
   string s = "Field 1,\"putting quotes around fields, allows commas\",Field 3";
   tokenizer<escaped_list_separator<char> > tok(s);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg){
       cout << *beg << "\n";
   }
}

Finally, for some TokenizerFunctions you have to pass in something into the constructor in order to do anything interesting. An example is offset_separator. This class breaks a string into tokens based on offsets for example

12252001 when parsed using offsets of 2,2,4 becomes 12 25 2001. Below is an example to parse this.

// simple_example_3.cpp
#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>

int main(){
   using namespace std;
   using namespace boost;
   string s = "12252001";
   int offsets[] = {2,2,4};
   offset_separator f(offsets, offsets+3);
   tokenizer<offset_separator> tok(s,f);
   for(tokenizer<offset_separator>::iterator beg=tok.begin(); beg!=tok.end();++beg){
       cout << *beg << "\n";
   }
}

© Copyright John R. Bandela 2001. Permission to copy, use, modify, sell and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.