CUPTI

What's New

CUPTI contains below changes as part of the CUDA Toolkit 7.5 release.
  • Device-wide sampling of the program counter (PC) is now enabled by default. This was a preview feature in the CUDA Toolkit 7.0 release and it was not enabled by default.
  • Ability to collect all events and metrics accurately in presence of multiple contexts on the GPU is extended for devices with compute capability 5.x.
  • API cuptiGetLastError is introduced to return the last error that has been produced by any of the cupti API calls or the callbacks in the same host thread.
  • Unified memory profiling is now supported with MPS (Multi-Process Service)
  • Callback is provided to collect replay information after every kernel run during kernel replay. See API cuptiKernelReplaySubscribeUpdate and callback type CUpti_KernelReplayUpdateFunc.
  • Added new attributes in enum CUpti_DeviceAttribute to query maximum shared memory size for different cache preferences for a device function.

Table of Contents