.. _gpu_data_convert: =================================== PyCUDA/CUDAMat/Gnumpy compatibility =================================== PyCUDA ====== Currently, PyCUDA and Theano have different objects to store GPU data. The two implementations do not support the same set of features. Theano's implementation is called *CudaNdarray* and supports *strides*. It also only supports the *float32* dtype. PyCUDA's implementation is called *GPUArray* and doesn't support *strides*. However, it can deal with all NumPy and CUDA dtypes. We are currently working on having the same base object for both that will also mimic Numpy. Until this is ready, here is some information on how to use both objects in the same script. Transfer -------- You can use the ``theano.misc.pycuda_utils`` module to convert GPUArray to and from CudaNdarray. The functions ``to_cudandarray(x, copyif=False)`` and ``to_gpuarray(x)`` return a new object that occupies the same memory space as the original. Otherwise it raises a *ValueError*. Because GPUArrays don't support strides, if the CudaNdarray is strided, we could copy it to have a non-strided copy. The resulting GPUArray won't share the same memory region. If you want this behavior, set ``copyif=True`` in ``to_gpuarray``. Compiling with PyCUDA --------------------- You can use PyCUDA to compile CUDA functions that work directly on CudaNdarrays. Here is an example from the file ``theano/misc/tests/test_pycuda_theano_simple.py``: .. code-block:: python import sys import numpy import theano import theano.sandbox.cuda as cuda_ndarray import theano.misc.pycuda_init import pycuda import pycuda.driver as drv import pycuda.gpuarray def test_pycuda_theano(): """Simple example with pycuda function and Theano CudaNdarray object.""" from pycuda.compiler import SourceModule mod = SourceModule(""" __global__ void multiply_them(float *dest, float *a, float *b) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; } """) multiply_them = mod.get_function("multiply_them") a = numpy.random.randn(100).astype(numpy.float32) b = numpy.random.randn(100).astype(numpy.float32) # Test with Theano object ga = cuda_ndarray.CudaNdarray(a) gb = cuda_ndarray.CudaNdarray(b) dest = cuda_ndarray.CudaNdarray.zeros(a.shape) multiply_them(dest, ga, gb, block=(400, 1, 1), grid=(1, 1)) assert (numpy.asarray(dest) == a * b).all() Theano Op using a PyCUDA function --------------------------------- You can use a GPU function compiled with PyCUDA in a Theano op: .. code-block:: python import numpy, theano import theano.misc.pycuda_init from pycuda.compiler import SourceModule import theano.sandbox.cuda as cuda class PyCUDADoubleOp(theano.Op): __props__ = () def make_node(self, inp): inp = cuda.basic_ops.gpu_contiguous( cuda.basic_ops.as_cuda_ndarray_variable(inp)) assert inp.dtype == "float32" return theano.Apply(self, [inp], [inp.type()]) def make_thunk(self, node, storage_map, _, _2, impl=None): mod = SourceModule(""" __global__ void my_fct(float * i0, float * o0, int size) { int i = blockIdx.x * blockDim.x + threadIdx.x; if(i