`theano.sandbox.cuda.dnn` – cuDNN¶

cuDNN is an NVIDIA library with functionality used by deep neural network. It provides optimized versions of some operations like the convolution. cuDNN is not currently installed with CUDA. You must download and install it yourself.

To install it, decompress the downloaded file and make the *.h and *.so* files available to the compilation environment. There are at least three possible ways of doing so:

The easiest is to include them in your CUDA installation. Copy the *.h files to CUDA_ROOT/include and the *.so* files to CUDA_ROOT/lib64 (by default, CUDA_ROOT is /usr/local/cuda on Linux).
Alternatively, on Linux, you can set the environment variables LD_LIBRARY_PATH, LIBRARY_PATH and CPATH to the directory extracted from the download. If needed, separate multiple directories with : as in the PATH environment variable.

example:
```
export LD_LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH
export CPATH=/home/user/path_to_CUDNN_folder/include:$CPATH
export LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LIBRARY_PATH
```
And as a third way, also on Linux, you can copy the *.h files to /usr/include and the *.so* files to /lib64.

By default, Theano will detect if it can use cuDNN. If so, it will use it. If not, Theano optimizations will not introduce cuDNN ops. So Theano will still work if the user did not introduce them manually.

The recently added Theano flag dnn.enabled allows to change the default behavior to force it or disable it. Older Theano version do not support this flag. To get an error when cuDNN can not be used with them, use this flag: optimizer_including=cudnn.

Note

cuDNN v5.1 is supported in Theano master version. So it dropped cuDNN v3 support. Theano 0.8.0 and 0.8.1 support only cuDNN v3 and v4. Theano 0.8.2 will support only v4 and v5.

Note

Starting in cuDNN v3, multiple convolution implementations are offered and it is possible to use heuristics to automatically choose a convolution implementation well suited to the parameters of the convolution.

The Theano flag dnn.conv.algo_fwd allows to specify the cuDNN convolution implementation that Theano should use for forward convolutions. Possible values include :

small (default) : use a convolution implementation with small memory usage
none : use a slower implementation with minimal memory usage
large : use a sometimes faster implementation with large memory usage
fft : use the Fast Fourier Transform implementation of convolution (very high memory usage)
fft_tiling : use the Fast Fourier Transform implementation of convolution with tiling (high memory usage, but less then fft)
guess_once : the first time a convolution is executed, the implementation to use is chosen according to cuDNN’s heuristics and reused for every subsequent execution of the convolution.
guess_on_shape_change : like guess_once but a new convolution implementation selected every time the shapes of the inputs and kernels don’t match the shapes from the last execution.
time_once : the first time a convolution is executed, every convolution implementation offered by cuDNN is executed and timed. The fastest is reused for every subsequent execution of the convolution.
time_on_shape_change : like time_once but a new convolution implementation selected every time the shapes of the inputs and kernels don’t match the shapes from the last execution.

The Theano flag dnn.conv.algo_bwd_filter and dnn.conv.algo_bwd_data allows to specify the cuDNN convolution implementation that Theano should use for gradient convolutions. Possible values include :

none (default) : use the default non-deterministic convolution implementation
deterministic : use a slower but deterministic implementation
fft : use the Fast Fourier Transform implementation of convolution (very high memory usage)
guess_once : the first time a convolution is executed, the implementation to use is chosen according to cuDNN’s heuristics and reused for every subsequent execution of the convolution.
guess_on_shape_change : like guess_once but a new convolution implementation selected every time the shapes of the inputs and kernels don’t match the shapes from the last execution.
time_once : the first time a convolution is executed, every convolution implementation offered by cuDNN is executed and timed. The fastest is reused for every subsequent execution of the convolution.
time_on_shape_change : like time_once but a new convolution implementation selected every time the shapes of the inputs and kernels don’t match the shapes from the last execution.
(algo_bwd_data only) fft_tiling : use the Fast Fourier Transform implementation of convolution with tiling (high memory usage, but less then fft)
(algo_bwd_data only) small : use a convolution implementation with small memory usage

guess_* and time_* flag values take into account the amount of available memory when selecting an implementation. This means that slower implementations might be selected if not enough memory is available for the faster implementations.

Note

Normally you should not call GPU Ops directly, but the CPU interface currently does not allow all options supported by cuDNN ops. So it is possible that you will need to call them manually.

Note

The documentation of CUDNN tells that, for the 2 following operations, the reproducibility is not guaranteed with the default implementation: cudnnConvolutionBackwardFilter and cudnnConvolutionBackwardData. Those correspond to the gradient wrt the weights and the gradient wrt the input of the convolution. They are also used sometimes in the forward pass, when they give a speed up.

The Theano flag dnn.conv.algo_bwd can be use to force the use of a slower but deterministic convolution implementation.

Note

There is a problem we do not understand yet when cudnn paths are used with symbolic links. So avoid using that.

Note

cudnn.so* must be readable and executable by everybody. cudnn.h must be readable by everybody.

Convolution:
Pooling:
- theano.sandbox.cuda.dnn.dnn_pool().
Batch Normalization:
- theano.sandbox.cuda.dnn.dnn_batch_normalization_train()
- theano.sandbox.cuda.dnn.dnn_batch_normalization_test().
RNN:
- New back-end only!.
Softmax:
- You can manually use the op GpuDnnSoftmax to use its extra feature.

List of Implemented Operations¶

class theano.sandbox.cuda.dnn.DnnBase[source]¶: Creates a handle for cudnn and pulls in the cudnn libraries and headers.

class theano.sandbox.cuda.dnn.GpuDnnBatchNorm(mode='per-activation', epsilon=0.0001, running_average_factor=0, running_averages=False, inplace_running_mean=False, inplace_running_var=False, inplace_output=False)[source]¶

Op for the cuDNN BatchNormalizationForwardTraining function. See GpuDnnBatchNormBase for parameters.

On application, takes input, scale, bias and produces: output = (input - mean) / sqrt(variance + epsilon) * scale + bias mean = input.mean(axis=axes, keepdims=True), invstd = 1. / sqrt(input.var(axis=axes, keepdims=True) + epsilon)

where axes=0 if mode=’per-activation’, and axes=(0,2,3) if mode=’spatial’

Note: scale and bias must follow the same tensor layout!

class theano.sandbox.cuda.dnn.GpuDnnBatchNormBase(mode='per-activation', epsilon=0.0001)[source]¶

Base Op for cuDNN Batch Normalization.

Parameters:

mode ({'per-activation', 'spatial'}) – Whether to normalize per activation (in this mode, bias and scale tensor dimensions are 1xCxHxW) or share normalization factors across spatial dimensions (in this mode, bias and scale tensor dimensions are 1xCx1x1).
epsilon – Epsilon value used in the batch normalization formula. Minimum allowed value is 1e-5 (imposed by cuDNN).
running_average_factor (float) – Factor for updating the values or running_mean and running_var. If the factor is close to one, the running averages will update quickly, if the factor is close to zero it will update slowly.
running_mean (tensor or None) – Previous value of the running mean. If this is given, the new value running_mean * (1 - r_a_factor) + batch mean * r_a_factor will be returned as one of the outputs of this function. running_mean and running_var should either both be given or both be None.
running_var (tensor or None) – Previous value of the running variance. If this is given, the new value running_var * (1 - r_a_factor) + (m / (m - 1)) * batch var * r_a_factor will be returned as one of the outputs of this function, where m is the product of lengths of the averaged-over dimensions. running_mean and running_var should either both be given or both be None.

class theano.sandbox.cuda.dnn.GpuDnnBatchNormGrad(mode='per-activation', epsilon=0.0001)[source]¶

Op for the cuDNN BatchNormalizationBackward function. See GpuDnnBatchNormBase for parameters.

On application, takes input, dy, scale, mean, invstd and produces dinput, dscale and dbias. Note that it does not need the bias.

Note: scale, mean and invstd must follow the same tensor layout!

class theano.sandbox.cuda.dnn.GpuDnnBatchNormInference(mode='per-activation', epsilon=0.0001, inplace=False)[source]¶

Op for the cuDNN BatchNormalizationForwardInference function. See GpuDnnBatchNormBase for parameters.

On application, takes input, scale, bias, mean and variance and produces: output = (input - mean) / sqrt(variance + epsilon) * scale + bias

where mean and variance are usually some running averages over multiple batches computed during training.

Note: scale, bias, mean and variance must follow the same tensor layout!

class theano.sandbox.cuda.dnn.GpuDnnConv(workmem=None, inplace=False, algo=None)[source]¶

The forward convolution.

Parameters:	image – kernel – descr – The convolution descriptor. workmem – deprecated, use parameter algo instead. algo ({'none', 'small', 'large', 'fft', 'fft_tiling', 'guess_once', 'winograd',) – ‘guess_on_shape_change’, ‘time_once’, ‘time_on_shape_change’} Default is the value of `config.dnn.conv.algo_fwd`.

static get_out_shape(ishape, kshape, border_mode, subsample)[source]¶: This function computes the output shape for a convolution with the specified parameters. ishape and kshape can be symbolic or scalar.

class theano.sandbox.cuda.dnn.GpuDnnConv3d(workmem=None, inplace=False, algo=None)[source]¶

The forward convolution.

Parameters:	image – kernel – descr – The convolution descriptor workmem – deprecated, use parameter algo instead. algo ({'none', 'small', 'fft_tiling', 'winograd', 'guess_once',) – ‘guess_on_shape_change’, ‘time_once’, ‘time_on_shape_change’} Default is the value of `config.dnn.conv.algo_fwd`.

static get_out_shape(ishape, kshape, border_mode, subsample)[source]¶: This function computes the output shape for a convolution with the specified parameters. ishape and kshape can be symbolic or scalar.

class theano.sandbox.cuda.dnn.GpuDnnConv3dGradI(inplace=False, workmem=None, algo=None)[source]¶

The convolution gradient with respect to the inputs.

Parameters:	image – kernel – descr – The convolution descriptor workmem – deprecated, use parameter algo instead. algo ({'none', 'deterministic, 'fft_tiling', 'winograd', 'guess_once',) – ‘guess_on_shape_change’, ‘time_once’, ‘time_on_shape_change’} Default is the value of `config.dnn.conv.algo_bwd_data`.

class theano.sandbox.cuda.dnn.GpuDnnConv3dGradW(inplace=False, workmem=None, algo=None)[source]¶

The convolution gradient with respect to the weights.

Parameters:	image – kernel – descr – The convolution descriptor workmem – deprecated, use parameter algo instead. algo ({'none', 'small', 'guess_once', 'guess_on_shape_change',) – ‘time_once’, ‘time_on_shape_change’} Default is the value of `config.dnn.conv.algo_bwd_filter`.

class theano.sandbox.cuda.dnn.GpuDnnConvDesc(border_mode, subsample=(1, 1), conv_mode='conv', precision='float32')[source]¶

This Op builds a convolution descriptor for use in the other convolution operations.

See the doc of dnn_conv() for a description of the parameters.

class theano.sandbox.cuda.dnn.GpuDnnConvGradI(inplace=False, workmem=None, algo=None)[source]¶

The convolution gradient with respect to the inputs.

Parameters:	image – kernel – descr – The convolution descriptor. workmem – deprecated, use parameter algo instead. algo ({'none', 'deterministic', 'fft', 'fft_tiling', 'winograd', 'guess_once',) – ‘guess_on_shape_change’, ‘time_once’, ‘time_on_shape_change’} Default is the value of `config.dnn.conv.algo_bwd_data`.

class theano.sandbox.cuda.dnn.GpuDnnConvGradW(inplace=False, workmem=None, algo=None)[source]¶

The convolution gradient with respect to the weights.

Parameters:	image – kernel – descr – The convolution descriptor. workmem – deprecated, use parameter algo instead. algo ({'none', 'deterministic', 'fft', 'small', 'guess_once',) – ‘guess_on_shape_change’, ‘time_once’, ‘time_on_shape_change’} Default is the value of `config.dnn.conv.algo_bwd_filter`.

class theano.sandbox.cuda.dnn.GpuDnnPool(mode='max')[source]¶

Pooling.

Parameters:	img – The image 4d or 5d tensor. ws – Windows size. stride – (dx, dy). mode ({'max', 'average_inc_pad', 'average_exc_pad'}) – The old deprecated name ‘average’ correspond to ‘average_inc_pad’. pad – (padX, padY) padding information. padX is the size of the left and right borders, padY is the size of the top and bottom borders.

class theano.sandbox.cuda.dnn.GpuDnnPoolDesc(ws=(1, 1), stride=None, mode='max', pad=None)[source]¶

This Op builds a pooling descriptor for use in the other pooling operations.

Parameters:

ws – Windows size.
stride – (dx, dy).
mode ({'max', 'average_inc_pad', 'average_exc_pad'}) – The old deprecated name ‘average’ correspond to ‘average_inc_pad’.
pad – (pad_h, pad_w) padding information. pad_h is the number of zero-valued pixels added to each of the top and bottom borders. pad_w is the number of zero-valued pixels added to each of the left and right borders.

Note

Do not use anymore. Only needed to reload old pickled files.

class theano.sandbox.cuda.dnn.GpuDnnPoolGrad(mode='max')[source]¶

The pooling gradient.

Parameters:

inp – The input of the pooling.
out – The output of the pooling in the forward.
inp_grad – Same size as out, but is the corresponding gradient information.
ws – Windows size.
stride – (dx, dy).
mode ({'max', 'average_inc_pad', 'average_exc_pad'}) – The old deprecated name ‘average’ correspond to ‘average_inc_pad’.
pad – (padX, padY) padding information. padX is the size of the left and right borders, padY is the size of the top and bottom borders.

class theano.sandbox.cuda.dnn.GpuDnnSoftmax(tensor_format, algo, mode)[source]¶

Op for the cuDNN Softmax.

Parameters:	tensor_format – Always set to ‘bc01’. algo ({'fast', 'accurate'}) – Indicating whether computations should be optimized for speed or accuracy respectively. mode ({'instance', 'channel'}) – Indicating whether the softmax should be computed per image across ‘c01’ or per spatial location ‘01’ per image across ‘c’.

class theano.sandbox.cuda.dnn.GpuDnnSoftmaxBase(tensor_format, algo, mode)[source]¶

Op for the cuDNN Softmax.

Parameters:

tensor_format – Always set this to ‘bc01’.
algo ({'fast', 'accurate', 'log'}) – Indicating whether, respectively, computations should be optimized for speed, for accuracy, or if cuDNN should rather compute the log-softmax instead.
mode ({'instance', 'channel'}) – Indicating whether the softmax should be computed per image across ‘c01’ or per spatial location ‘01’ per image across ‘c’.

class theano.sandbox.cuda.dnn.GpuDnnSoftmaxGrad(tensor_format, algo, mode)[source]¶

Op for the cuDNN SoftmaxGrad.

Parameters:	tensor_format – Always set to ‘bc01’. algo ({'fast', 'accurate'}) – Indicating whether computations should be optimized for speed or accuracy respectively. mode ({'instance', 'channel'}) – Indicating whether the softmax should be computed per image across ‘c01’ or per spatial location ‘01’ per image across ‘c’.

theano.sandbox.cuda.dnn.dnn_batch_normalization_test(inputs, gamma, beta, mean, var, mode='per-activation', epsilon=0.0001)[source]¶

Performs batch normalization of the given inputs, using the given mean and variance.

Parameters:	mode ({'per-activation', 'spatial'}) – Whether to normalize per activation or share normalization factors across spatial dimensions (i.e., all dimensions past the second). gamma (tensor) – Scale factors. Must match the dimensionality of inputs, but have sizes of 1 for all axes normalized over (i.e., in the first dimension for mode='per-activation'`, and additionally in all dimensions past the second for ``mode='spatial'). beta (tensor) – Biases. Must match the tensor layout of gamma. mean (tensor) – Means. Usually these are running averages computed during training. Must match the tensor layout of gamma. var (tensor) – Variances. Usually these are running averages computed during training. Must match the tensor layout of gamma. epsilon (float) – Epsilon value used in the batch normalization formula. Minimum allowed value is 1e-5 (imposed by cuDNN).
Returns:	out – Batch-normalized inputs.
Return type:	tensor

Notes

Request cuDNN 5 and Theano 0.9dev2 or more recent.

For 4d tensors, the returned value is equivalent to:

axes = (0,) if mode == 'per-activation' else (0, 2, 3)
gamma, beta, mean, var = (T.addbroadcast(t, *axes)
                          for t in (gamma, beta, mean, var))
out = (inputs - mean) * gamma / T.sqrt(var + epsilon) + beta

For 5d tensors, the axes would be (0, 2, 3, 4).

theano.sandbox.cuda.dnn.dnn_batch_normalization_train(inputs, gamma, beta, mode='per-activation', epsilon=0.0001, running_average_factor=0.1, running_mean=None, running_var=None)[source]¶

Performs batch normalization of the given inputs, using the mean and variance of the inputs.

Parameters:

mode ({'per-activation', 'spatial'}) – Whether to normalize per activation or share normalization factors across spatial dimensions (i.e., all dimensions past the second).
gamma (tensor) – Learnable scale factors. Must match the dimensionality of inputs, but have sizes of 1 for all axes normalized over (i.e., in the first dimension for mode='per-activation'`, and additionally in all dimensions past the second for ``mode='spatial').
beta (tensor) – Learnable biases. Must match the tensor layout of gamma.
epsilon (float) – Epsilon value used in the batch normalization formula. Minimum allowed value is 1e-5 (imposed by cuDNN).
running_average_factor (float) – Factor for updating the values or running_mean and running_var. If the factor is close to one, the running averages will update quickly, if the factor is close to zero it will update slowly.
running_mean (tensor or None) – Previous value of the running mean. If this is given, the new value running_mean * (1 - r_a_factor) + batch mean * r_a_factor will be returned as one of the outputs of this function. running_mean and running_var should either both be given or both be None.
running_var (tensor or None) – Previous value of the running variance. If this is given, the new value running_var * (1 - r_a_factor) + (m / (m - 1)) * batch var * r_a_factor will be returned as one of the outputs of this function, where m is the product of lengths of the averaged-over dimensions. running_mean and running_var should either both be given or both be None.

Returns:

out (tensor) – Batch-normalized inputs.
mean (tensor) – Means of inputs across the normalization axes.
invstd (tensor) – Inverse standard deviations of inputs across the normalization axes.
new_running_mean (tensor) – New value of the running mean (only if both running_mean and running_var were given).
new_running_var (tensor) – New value of the running variance (only if both running_var and running_mean were given).

Notes

Request cuDNN 5 and Theano 0.9dev2 or more recent.

For 4d tensors, returned values are equivalent to:

axes = 0 if mode == 'per-activation' else (0, 2, 3)
mean = inputs.mean(axes, keepdims=True)
var = inputs.var(axes, keepdims=True)
invstd = T.inv(T.sqrt(var + epsilon))
out = (inputs - mean) * gamma * invstd + beta

m = T.cast(T.prod(inputs.shape) / T.prod(mean.shape), 'float32')
running_mean = running_mean * (1 - running_average_factor) + \
               mean * running_average_factor
running_var = running_var * (1 - running_average_factor) + \
              (m / (m - 1)) * var * running_average_factor

For 5d tensors, the axes are (0, 2, 3, 4).

theano.sandbox.cuda.dnn.dnn_conv(img, kerns, border_mode='valid', subsample=(1, 1), conv_mode='conv', direction_hint=None, workmem=None, algo=None, precision=None)[source]¶

GPU convolution using cuDNN from NVIDIA.

The memory layout to use is ‘bc01’, that is ‘batch’, ‘channel’, ‘first dim’, ‘second dim’ in that order.

Parameters:

img – Images to do the convolution over.
kerns – Convolution filters.
border_mode – One of ‘valid’, ‘full’, ‘half’; additionally, the padding size can be directly specified by an integer or a pair of integers (as a tuple), specifying the amount of zero padding added to _both_ the top and bottom (first entry) and left and right (second entry) sides of the image.
subsample – Perform subsampling of the output (default: (1, 1)).
conv_mode – Perform convolution (kernels flipped) or cross-correlation. One of ‘conv’, ‘cross’ (default: ‘conv’).
direction_hint – Used by graph optimizers to change algorithm choice. By default, GpuDnnConv will be used to carry out the convolution. If border_mode is ‘valid’, subsample is (1,1) and direction_hint is ‘bprop weights’, it will use GpuDnnConvGradW. If border_mode is ‘full’, subsample is (1,1) and direction_hint is ‘bprop inputs’, it will use GpuDnnConvGradI. This parameter is used internally by graph optimizers and may be removed at any time without a deprecation period. You have been warned.
workmem – deprecated, use parameter algo instead.
algo ({'none', 'small', 'large', 'fft', 'guess_once', 'guess_on_shape_change', 'time_once', 'time_on_shape_change'}) – Convolution implementation to use. Some of its values may require certain versions of cuDNN to be installed. Default is the value of config.dnn.conv.algo_fwd.
precision ({'as_input_f32', 'as_input', 'float16', 'float32', 'float64'}) – Description of the dtype in which the computation of the convolution should be done. Possible values are ‘as_input’, ‘float16’, ‘float32’ and ‘float64’. Default is the value of config.dnn.conv.precision.

theano.sandbox.cuda.dnn.dnn_conv3d(img, kerns, border_mode='valid', subsample=(1, 1, 1), conv_mode='conv', direction_hint=None, workmem=None, algo=None, precision=None)[source]¶

GPU convolution using cuDNN from NVIDIA.

The memory layout to use is ‘bct01’, that is ‘batch’, ‘channel’, ‘first dim’, ‘second dim’, ‘third dim’ in that order.

Parameters:	img – images to do the convolution over kerns – convolution filters border_mode – One of ‘valid’, ‘full’, ‘half’; additionally, the padding size can be directly specified by an integer or a triplet of integers (as a tuple), specifying the amount of zero padding added to _both_ the top and bottom (first entry) and left and right (second entry) and front and back (third entry) sides of the volume. subsample – perform subsampling of the output (default: (1, 1, 1)) conv_mode – perform convolution (kernels flipped) or cross-correlation. One of ‘conv’, ‘cross’. (default: ‘conv’) direction_hint – Used by graph optimizers to change algorithm choice. By default, GpuDnnConv will be used to carry out the convolution. If border_mode is ‘valid’, subsample is (1,1,1) and direction_hint is ‘bprop weights’, it will use GpuDnnConvGradW. This parameter is used internally by graph optimizers and may be removed at any time without a deprecation period. You have been warned. workmem – deprecated, use param algo instead algo – convolution implementation to use. Only ‘none’ is implemented for the conv3d. Default is the value of `config.dnn.conv.algo_fwd`. precision – dtype in which the computation of the convolution should be done. Possible values are ‘as_input_f32’, ‘as_input’, ‘float16’, ‘float32’ and ‘float64’. Default is the value of `config.dnn.conv.precision`.
Warning:	The cuDNN library only works with GPU that have a compute capability of 3.0 or higer. This means that older GPU will not work with this Op.
Warning:	dnn_conv3d only works with cuDNN library 3.0

theano.sandbox.cuda.dnn.dnn_gradinput(kerns, topgrad, img_shp, border_mode='valid', subsample=(1, 1), conv_mode='conv')[source]¶

GPU convolution gradient with respect to input using cuDNN from NVIDIA.

The memory layout to use is ‘bc01’, that is ‘batch’, ‘channel’, ‘first dim’, ‘second dim’ in that order.

FIXME parameters doc

Warning:	The cuDNN library only works with GPU that have a compute capability of 3.0 or higer. This means that older GPU will not work with this Op.

theano.sandbox.cuda.dnn.dnn_gradinput3d(kerns, topgrad, img_shp, border_mode='valid', subsample=(1, 1), conv_mode='conv')[source]¶

GPU convolution gradient with respect to input using cuDNN from NVIDIA.

The memory layout to use is ‘bct01’, that is ‘batch’, ‘channel’, ‘first dim’, ‘second dim’ in that order.

FIXME parameters doc

Warning:	The cuDNN library only works with GPU that have a compute capability of 3.0 or higer. This means that older GPU will not work with this Op.

theano.sandbox.cuda.dnn.dnn_gradweight(img, topgrad, kerns_shp, border_mode='valid', subsample=(1, 1), conv_mode='conv')[source]¶

GPU convolution gradient with respect to weight using cuDNN from NVIDIA.

The memory layout to use is ‘bc01’, that is ‘batch’, ‘channel’, ‘first dim’, ‘second dim’ in that order.

FIXME parameters doc

Warning:	The cuDNN library only works with GPU that have a compute capability of 3.0 or higer. This means that older GPU will not work with this Op.

theano.sandbox.cuda.dnn.dnn_gradweight3d(img, topgrad, kerns_shp, border_mode='valid', subsample=(1, 1, 1), conv_mode='conv')[source]¶

GPU convolution gradient with respect to weight using cuDNN from NVIDIA.

The memory layout to use is ‘bct01’, that is ‘batch’, ‘channel’, ‘first dim’, ‘second dim’ in that order.

FIXME parameters doc

Warning:	The cuDNN library only works with GPU that have a compute capability of 3.0 or higer. This means that older GPU will not work with this Op.

theano.sandbox.cuda.dnn.dnn_pool(img, ws, stride=None, mode='max', pad=None)[source]¶

GPU pooling using cuDNN from NVIDIA.

For 2D pooling, the memory layout to use is ‘bc01’, that is ‘batch’, ‘channel’, ‘first dim’, ‘second dim’ in that order.

For 3D pooling, the memory layout to use is ‘bc012’, that is ‘batch’, ‘channel’, ‘first dim’, ‘second dim’, ‘third dim’.

Parameters:

img – Images to do the pooling over.
ws – Subsampling window size. Should have 2 or 3 elements.
stride – Subsampling stride (default: (1, 1) or (1, 1, 1)).
mode ({'max', 'average_inc_pad', 'average_exc_pad', 'sum'}) –
pad – Padding: (pad_h, pad_w) for 2D or (pad_h, pad_w, pad_d) for 3D. pad_h is the number of zero-valued pixels added to each of the top and bottom borders. pad_w is the number of zero-valued pixels added to each of the left and right borders. pad_d is the number of zero-valued pixels added to each of the front and back borders (3D pooling only).

Warning

The cuDNN library only works with GPU that have a compute capability of 3.0 or higer. This means that older GPU will not work with this Op.

Notes

This Op implements the ignore_border=True of max_pool_2d.

theano.sandbox.cuda.dnn.values_eq_approx_high_tol(a, b)[source]¶

This fct is needed to don’t have DebugMode raise useless errors due to rounding error.

This happen as we reduce on the two last dimensions, so this can raise the absolute error if the number of elements we reduce on is significant.

theano.sandbox.cuda.dnn – cuDNN¶

List of Implemented Operations¶

`theano.sandbox.cuda.dnn` – cuDNN¶