Brewing Models
brew
is Caffe2’s new API for building models. TheCNNModelHelper
filled this role in the past, but since Caffe2 has expanded well beyond excelling at CNNs it made sense to provide aModelHelper
object that is more generic. You may notice that the newModelHelper
has much the same functionality asCNNModelHelper
.brew
wraps the newModelHelper
making building models even easier than before.
Model Building and Brew’s Helper Functions
In this overview we will introduce brew, a lightweight collection of helper functions to help you build your model. We will start with explaining the key concepts of Ops versus Helper Functions. Then we will show brew
usage, how it acts as an interface to the ModelHelper
object, the and arg_scope
syntax sugar. Finally we discuss the motivation of introducing brew
.
Concepts: Ops vs Helper Functions
Before we dig into brew
we should review some conventions in Caffe2 and how layers of a neural network are represented. Deep learning networks in Caffe2 are built up with operators. Typically these operators are written in C++ for maximum performance. Caffe2 also provides a Python API that wraps these C++ operators, so you can more flexibly experiment and prototype. In Caffe2, operators are always presented in a CamelCase fashion, whereas Python helper functions with a similar name are in lowercase. Examples of this are to follow.
Ops
We often refer to operators as an “Op” or a collection of operators as “Ops”. For example, the FC
Op represents a Fully-Connected operator that has weighted connections to every neuron in the previous layer and to every neuron on the next layer. For example, you can create an FC
Op with:
1 | model.net.FC([blob_in, weights, bias], blob_out) |
Or you can create a Copy
Op with:
1 | model.net.Copy(blob_in, blob_out) |
A list of Operators handled by
ModelHelper
is at the bottom of this document. The 29 Ops that are most commonly used are currently included. This is a subset of the 400+ Ops Caffe2 has at the time of this writing.
It should also be noted that you can also create an operator without annotating net
. For example, just like in the previous example where we created a Copy
Op, we can use the following code to create a Copy
operator on model.net
:
1 | model.Copy(blob_in, blob_out) |
Helper Functions
Building your model/network using merely single operators could be painstaking since you will have to do parameter initialization, device/engine choice all by yourself (but this is also why Caffe2 is so fast!). For example, to build an FC layer you have several lines of code to prepare weight
and bias
, which are then fed to the Op.
This is the longer, manual way:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | model = model_helper.ModelHelper(name="train") # initialize your weight weight = model.param_init_net.XavierFill( [], blob_out + '_w', shape=[dim_out, dim_in], **kwargs, # maybe indicating weight should be on GPU here ) # initialize your bias bias = model.param_init_net.ConstantFill( [], blob_out + '_b', shape=[dim_out, ], **kwargs, ) # finally building FC model.net.FC([blob_in, weights, bias], blob_out, **kwargs) |
Luckily Caffe2 helper functions are here to help. Helper functions are wrappers functions that create a complete layer for a model. The helper function will typically handle parameter initialization, operator definition, and engine selection. Caffe2 default helper functions are named in Python PEP8 function convention. For example, using python/helpers/fc.py, implementing an FC
Op via the helper function fc
is much simpler:
An easier way using a helper function:
1 | fcLayer = fc(model, blob_in, blob_out, **kwargs) # returns a blob reference |
Some helper functions build much more than 1 operator. For example, the LSTM function in python/rnn_cell.py is helping you building a whole LSTM unit in your network.
Check out the repo for more cool helper functions!
brew
Now that you’ve been introduced to Ops and Helper Functions, let’s cover how brew
can make model building even easier. brew
is a smart collection of helper functions. You can use all Caffe2 awesome helper functions with a single import of brew module. You can now add a FC layer using:
1 2 3 | from caffe2.python import brew brew.fc(model, blob_in, blob_out, ...) |
That’s pretty much the same as using the helper function directly, however brew
really starts to shine once your models get more complicated. The following is a LeNet model building example, extracted from the MNIST tutorial.
1 2 3 4 5 6 7 8 9 10 11 | from caffe2.python import brew def AddLeNetModel(model, data): conv1 = brew.conv(model, data, 'conv1', 1, 20, 5) pool1 = brew.max_pool(model, conv1, 'pool1', kernel=2, stride=2) conv2 = brew.conv(model, pool1, 'conv2', 20, 50, 5) pool2 = brew.max_pool(model, conv2, 'pool2', kernel=2, stride=2) fc3 = brew.fc(model, pool2, 'fc3', 50 * 4 * 4, 500) fc3 = brew.relu(model, fc3, fc3) pred = brew.fc(model, fc3, 'pred', 500, 10) softmax = brew.softmax(model, pred, 'softmax') |
Each layer is created using brew
, which in turn is using its operator hooks to instantiate each Op.
arg_scope
arg_scope
is a syntax sugar for you to set default helper function argument values within its context. For example, say you want to experiment with different weight initializations in your ResNet-150 training script. You can either:
1 2 3 4 5 6 | # change all weight_init here brew.conv(model, ..., weight_init=('XavierFill', {}),...) ... # repeat 150 times ... brew.conv(model, ..., weight_init=('XavierFill', {}),...) |
Or with the help of arg_scope
, you can
1 2 3 4 | with brew.arg_scope([brew.conv], weight_init=('XavierFill', {})): brew.conv(model, ...) # no weight_init needed here! brew.conv(model, ...) ... |
Custom Helper Function
As you use brew
more often and find a need for implementing an Op not currently covered by brew
you will want to write your own helper function. You can register your helper function to brew to enjoy unified management and syntax sugar.
Simply define your new helper function, register it with brew
using the .Register
function, then call it with brew.new_helper_function
.
1 2 3 4 5 6 7 | def my_super_layer(model, blob_in, blob_out, **kwargs): """ 100x faster, awesome code that you'll share one day. """ brew.Register(my_super_conv) brew.my_super_layer(model, blob_in, blob_out) |
If you think your helper function might be helpful to the rest of the Caffe2 community, remember to share it, and create a pull request.
Caffe2 Default Helper Functions
To get more details about each of these functions, visit the Operators Catalogue.
- accuracy
- add_weight_decay
- average_pool
- concat
- conv
- conv_nd
- conv_transpose
- depth_concat
- dropout
- fc
- fc_decomp
- fc_prune
- fc_sparse
- group_conv
- group_conv_deprecated
- image_input
- instance_norm
- iter
- lrn
- max_pool
- max_pool_with_index
- packed_fc
- prelu
- softmax
- spatial_bn
- relu
- sum
- transpose
- video_input
Motivation for brew
Thanks for reading a whole overview on brew
! Congratulations, you are finally here! Long story short, we want to separate model building process and model storage. In our view, ModelHelper
class should only contain network definition and parameter information. The brew
module will have the functions to build network and initialize parameters.
Compared with previous gigantic CNNModelHelper
that is doing both model storage and model building, the ModelHelper
+ brew
way of model building is much more modularized and easier to extend. In terms of naming, it is also much less confusing as the Caffe2 family supports a variety of networks, including MLP, RNN and CNN. We hope this tutorial will help your model building to be faster and easier while also getting to know Caffe2 in more depth. There is a detailed example of brew usage in python/brew_test.py. If you have any question about brew, please feel free to contact us and ask a question in an Issue on the repo. Thank you again for embracing the new brew
API.