Caffe2 - C++ API
A deep learning, cross platform ML framework
Public Member Functions
caffe2::ElementwiseRTCOp Class Referencefinal

A GPU operator that can generate limited elementwise operations. More...

Inheritance diagram for caffe2::ElementwiseRTCOp:
caffe2::Operator< CUDAContext > caffe2::OperatorBase

Public Member Functions

 ElementwiseRTCOp (const OperatorDef &operator_def, Workspace *ws)
 
bool RunOnDevice () override
 
- Public Member Functions inherited from caffe2::Operator< CUDAContext >
 Operator (const OperatorDef &operator_def, Workspace *ws)
 
const Tensor< CUDAContext > & Input (int idx)
 
Tensor< CUDAContext > * Output (int idx)
 
bool Run (int stream_id=0) final
 
bool RunAsync (int stream_id=0) final
 
- Public Member Functions inherited from caffe2::OperatorBase
 OperatorBase (const OperatorDef &operator_def, Workspace *ws)
 
bool HasArgument (const string &name) const
 Checks if the operator has an argument of the given name.
 
template<typename T >
GetSingleArgument (const string &name, const T &default_value) const
 
template<typename T >
bool HasSingleArgumentOfType (const string &name) const
 
template<typename T >
vector< T > GetRepeatedArgument (const string &name, const vector< T > &default_value={}) const
 
template<typename T >
const T & Input (int idx)
 
template<typename T >
T * Output (int idx)
 
const BlobInputBlob (int idx)
 
BlobOutputBlob (int idx)
 
template<typename T >
bool InputIsType (int idx)
 
template<typename T >
bool OutputIsType (int idx)
 
int InputSize ()
 
int OutputSize ()
 
const vector< const Blob * > & Inputs () const
 
const vector< Blob * > & Outputs ()
 
virtual void AddRelatedBlobInfo (EnforceNotMet *err)
 
const OperatorDef & def () const
 
const ArgumentHelperarg_helper () const
 

Additional Inherited Members

- Protected Attributes inherited from caffe2::Operator< CUDAContext >
CUDAContext context_
 

Detailed Description

A GPU operator that can generate limited elementwise operations.

ElementwiseRTCOp allows one to do a simple and limited thing: it takes in multiple inputs and multiple outputs, as well as a raw string argument rtc_src. The runtime then generates the following kernel code:

global void kernel_name(const size_t nthreads, ...) { for(int index = blockIdx.x * blockDim.x + threadIdx.x; index < nthreads; index += blockDim.x * gridDim.x) { rtc_src } } where the "..." part is auto generated, so one can refer to the input and output as in0, in1, ..., out0, out1... in the rtc_src string.

For example, if one wants to do a vector multiplication, one can take two inputs and one outputs, and write rtc_src as out0[index] = in0[index] * in1[index];

This op is currently highly experimental. We do not have a gradient registered for it either.

Definition at line 72 of file elemenntwise_rtc_gpu.cc.


The documentation for this class was generated from the following file: