sandbox.cuda.dnn
– cuDNN¶
cuDNN is an NVIDIA library with functionality used by deep neural network. It provides optimized versions of some operations like the convolution. cuDNN is not currently installed with CUDA 6.5. You must download and install it yourself.
To install it, decompress the downloaded file and make the *.h
and
*.so*
files available to the compilation environment.
There are at least three possible ways of doing so:
 The easiest is to include them in your CUDA installation. Copy the
*.h
files toCUDA_ROOT/include
and the*.so*
files toCUDA_ROOT/lib64
(by default,CUDA_ROOT
is/usr/local/cuda
on Linux).  Alternatively, on Linux, you can set the environment variables
LD_LIBRARY_PATH
,LIBRARY_PATH
andCPATH
to the directory extracted from the download. If needed, separate multiple directories with:
as in thePATH
environment variable.  And as a third way, also on Linux, you can copy the
*.h
files to/usr/include
and the*.so*
files to/lib64
.
By default, Theano will detect if it can use cuDNN. If so, it will use it. If not, Theano optimizations will not introduce cuDNN ops. So Theano will still work if the user did not introduce them manually.
To get an error if Theano can not use cuDNN, use this Theano flag:
optimizer_including=cudnn
.
Note
CuDNN v3 has now been released. CuDNN v2 remains supported but CuDNN v3 is faster and offers many more options. We recommend that everybody update to v3.
Note
Starting in CuDNN v3, multiple convolution implementations are offered and it is possible to use heuristics to automatically choose a convolution implementation well suited to the parameters of the convolution.
The Theano flag dnn.conv.algo_fwd
allows to specify the CuDNN
convolution implementation that Theano should use for forward convolutions.
Possible values include :
small
(default) : use a convolution implementation with small memory usagenone
: use a slower implementation with minimal memory usagelarge
: use a sometimes faster implementation with large memory usagefft
: use the Fast Fourrier Transform implementation of convolution (very high memory usage)guess_once
: the first time a convolution is executed, the implementation to use is chosen according to CuDNN’s heuristics and reused for every subsequent execution of the convolution.guess_on_shape_change
: likeguess_once
but a new convolution implementation selected every time the shapes of the inputs and kernels don’t match the shapes from the last execution.time_once
: the first time a convolution is executed, every convolution implementation offered by CuDNN is executed and timed. The fastest is reused for every subsequent execution of the convolution.time_on_shape_change
: liketime_once
but a new convolution implementation selected every time the shapes of the inputs and kernels don’t match the shapes from the last execution.
The Theano flag dnn.conv.algo_bwd
allows to specify the CuDNN
convolution implementation that Theano should use for gradient convolutions.
Possible values include :
none
(default) : use the default nondeterministic convolution implementationdeterministic
: use a slower but deterministic implementationfft
: use the Fast Fourrier Transform implementation of convolution (very high memory usage)guess_once
: the first time a convolution is executed, the implementation to use is chosen according to CuDNN’s heuristics and reused for every subsequent execution of the convolution.guess_on_shape_change
: likeguess_once
but a new convolution implementation selected every time the shapes of the inputs and kernels don’t match the shapes from the last execution.time_once
: the first time a convolution is executed, every convolution implementation offered by CuDNN is executed and timed. The fastest is reused for every subsequent execution of the convolution.time_on_shape_change
: liketime_once
but a new convolution implementation selected every time the shapes of the inputs and kernels don’t match the shapes from the last execution.
guess_*
and time_*
flag values take into account the amount of
available memory when selecting an implementation. This means that slower
implementations might be selected if not enough memory is available for the
faster implementations.
Note
Normally you should not call GPU Ops directly, but the CPU interface currently does not allow all options supported by cuDNN ops. So it is possible that you will need to call them manually.
Note
The documentation of CUDNN tells that, for the 2 following operations, the reproducibility is not guaranteed with the default implementation: cudnnConvolutionBackwardFilter and cudnnConvolutionBackwardData. Those correspond to the gradient wrt the weights and the gradient wrt the input of the convolution. They are also used sometimes in the forward pass, when they give a speed up.
The Theano flag dnn.conv.algo_bwd
can be use to force the use of a
slower but deterministic convolution implementation.
Note
There is a problem we do not understand yet when cudnn paths are used with symbolic links. So avoid using that.
Note
cudnn.so* must be readable and executable by everybody. cudnn.h must be readable by everybody.
Functions¶

theano.sandbox.cuda.dnn.
dnn_conv
(img, kerns, border_mode='valid', subsample=(1, 1), conv_mode='conv', direction_hint=None, workmem=None, algo=None) GPU convolution using cuDNN from NVIDIA.
The memory layout to use is ‘bc01’, that is ‘batch’, ‘channel’, ‘first dim’, ‘second dim’ in that order.
Parameters:  img – Images to do the convolution over.
 kerns – Convolution filters.
 border_mode – One of ‘valid’, ‘full’; additionally, the padding size could be directly specified by an integer or a pair of integers.
 subsample – Perform subsampling of the output (default: (1, 1)).
 conv_mode – Perform convolution (kernels flipped) or crosscorrelation. One of ‘conv’, ‘cross’ (default: ‘conv’).
 direction_hint – Used by graph optimizers to change algorithm choice. By default, GpuDnnConv will be used to carry out the convolution. If border_mode is ‘valid’, subsample is (1,1) and direction_hint is ‘bprop weights’, it will use GpuDnnConvGradW. If border_mode is ‘full’, subsample is (1,1) and direction_hint is ‘bprop inputs’, it will use GpuDnnConvGradI. This parameter is used internally by graph optimizers and may be removed at any time without a deprecation period. You have been warned.
 workmem – deprecated, use parameter algo instead.
 algo ({‘none’, ‘small’, ‘large’, ‘fft’, ‘guess_once’, ‘guess_on_shape_change’, ‘time_once’, ‘time_on_shape_change’}) – Convolution implementation to use. Some of its values may require certain versions of CuDNN to be installed. Default is the value of :attr:`config.dnn.conv.algo_fwd.

theano.sandbox.cuda.dnn.
dnn_pool
(img, ws, stride=(1, 1), mode='max', pad=(0, 0)) GPU pooling using cuDNN from NVIDIA.
The memory layout to use is ‘bc01’, that is ‘batch’, ‘channel’, ‘first dim’, ‘second dim’ in that order.
Parameters:  img – Images to do the pooling over.
 ws – Subsampling window size.
 stride – Subsampling stride (default: (1, 1)).
 mode ({‘max’, ‘average_inc_pad’, ‘average_exc_pad}) –
 pad – (padX, padY) padding information. padX is the size of the left and right borders, padY is the size of the top and bottom borders.
 :param : If set to 3 all other params (except mode) must have an extra
 dimension to match. 3 is only available for cudnn v3
:type : param nd: dimensions of pooling, can be 2 or 3 for 2d or 3d pooling :param .. warning: capability of 3.0 or higer. This means that older GPU will not
work with this Op.Notes
This Op implements the ignore_border=True of max_pool_2d.
Convolution Ops¶

class
theano.sandbox.cuda.dnn.
GpuDnnConvDesc
(border_mode, subsample=(1, 1), conv_mode='conv') This Op builds a convolution descriptor for use in the other convolution operations.
See the doc of
dnn_conv()
for a description of the parameters.

class
theano.sandbox.cuda.dnn.
GpuDnnConv
(workmem=None, inplace=False, algo=None) The forward convolution.
Parameters:  image –
 kernel –
 descr – The convolution descriptor.
 workmem – deprecated, use parameter algo instead.
 algo –
 [‘none’, ‘small’, ‘large’, ‘fft’, ‘guess_once’,
 ‘guess_on_shape_change’, ‘time_once’, ‘time_on_shape_change’]
Default is the value of
config.dnn.conv.algo_fwd
.

static
get_out_shape
(ishape, kshape, border_mode, subsample) This function computes the output shape for a convolution with the specified parameters. ishape and kshape can be symbolic or scalar.

class
theano.sandbox.cuda.dnn.
GpuDnnConv3d
(workmem=None, inplace=False, algo=None) The forward convolution.
Parameters:  image –
 kernel –
 descr – the convolution descriptor
 workmem – deprecated, use parameter algo instead.
 algo –
 [‘none’, ‘guess_once’, ‘guess_on_shape_change’,
 ‘time_once’, ‘time_on_shape_change’]
Default is the value of :attr:`config.dnn.conv.algo_fwd.

static
get_out_shape
(ishape, kshape, border_mode, subsample) This function computes the output shape for a convolution with the specified parameters. ishape and kshape can be symbolic or scalar.

class
theano.sandbox.cuda.dnn.
GpuDnnConvGradW
(inplace=False, workmem=None, algo=None) The convolution gradient with respect to the weights.
Parameters:  image –
 kernel –
 descr – The convolution descriptor.
 workmem – deprecated, use parameter algo instead.
 algo –
 [‘none’, ‘deterministic’, ‘fft’, ‘guess_once’,
 ‘guess_on_shape_change’, ‘time_once’, ‘time_on_shape_change’]
Default is the value of
config.dnn.conv.algo_bwd
.

class
theano.sandbox.cuda.dnn.
GpuDnnConv3dGradW
(inplace=False, workmem=None, algo=None) The convolution gradient with respect to the weights.
Parameters:  image –
 kernel –
 descr – the convolution descriptor
 workmem – deprecated, use parameter algo instead.
 algo –
 [‘none’, ‘guess_once’, ‘guess_on_shape_change’,
 ‘time_once’, ‘time_on_shape_change’]
Default is the value of
config.dnn.conv.algo_bwd
.

class
theano.sandbox.cuda.dnn.
GpuDnnConvGradI
(inplace=False, workmem=None, algo=None) The convolution gradient with respect to the inputs.
Parameters:  image –
 kernel –
 descr – The convolution descriptor.
 workmem – deprecated, use parameter algo instead.
 algo –
 [‘none’, ‘deterministic’, ‘fft’, ‘guess_once’,
 ‘guess_on_shape_change’, ‘time_once’, ‘time_on_shape_change’]
Default is the value of
config.dnn.conv.algo_bwd
.

class
theano.sandbox.cuda.dnn.
GpuDnnConv3dGradI
(inplace=False, workmem=None, algo=None) The convolution gradient with respect to the inputs.
Parameters:  image –
 kernel –
 descr – the convolution descriptor
 workmem – deprecated, use parameter algo instead.
 algo –
 [‘none’, ‘guess_once’, ‘guess_on_shape_change’,
 ‘time_once’, ‘time_on_shape_change’]
Default is the value of
config.dnn.conv.algo_bwd
.
Pooling Ops¶

class
theano.sandbox.cuda.dnn.
GpuDnnPoolDesc
(ws=(1, 1), stride=(1, 1), mode='max', pad=(0, 0)) This Op builds a pooling descriptor for use in the other pooling operations.
Parameters:  ws – Windows size.
 stride – (dx, dy).
 mode ({‘max’, ‘average_inc_pad’, ‘average_exc_pad’}) – The old deprecated name ‘average’ correspond to ‘average_inc_pad’.
 pad – (padX, padY) padding information. padX is the size of the left and right borders, padY is the size of the top and bottom borders.

class
theano.sandbox.cuda.dnn.
GpuDnnPool
Pooling.
Parameters:  img – The image 4d or 5d tensor.
 desc – The pooling descriptor.

class
theano.sandbox.cuda.dnn.
GpuDnnPoolGrad
The pooling gradient.
Parameters:  inp – The input of the pooling.
 out – The output of the pooling in the forward.
 inp_grad – Same size as out, but is the corresponding gradient information.
 desc – The pooling descriptor.
Softmax Ops¶

class
theano.sandbox.cuda.dnn.
GpuDnnSoftmax
(tensor_format, algo, mode) Op for the cuDNN Softmax.
Parameters:  tensor_format – Whether the data format is ‘bc01’ or ‘b01c’.
 algo – ‘fast’ or ‘accurate’ indicating whether computations should be optimized for speed or accuracy respectively.
 mode – ‘instance’ or ‘channel’ indicating whether the softmax should be computed per image across ‘c01’ or per spatial location ‘01’ per image across ‘c’.

class
theano.sandbox.cuda.dnn.
GpuDnnSoftmaxGrad
(tensor_format, algo, mode) Op for the cuDNN SoftmaxGrad.
Parameters:  tensor_format – Whether the data format is ‘bc01’ or ‘b01c’.
 algo – ‘fast’ or ‘accurate’ indicating whether computations should be optimized for speed or accuracy respectively.
 mode – ‘instance’ or ‘channel’ indicating whether the softmax should be computed per image across ‘c01’ or per spatial location ‘01’ per image across ‘c’.