wiki:CudaApps

Version 3 (modified by davea, 16 years ago) (diff)

--

CUDA applications

Device selection

Your application will be run with an additional command-line argument

--device m

where m specifies with GPU to use (0..n-1). Pass this to cudaSetDevice().

If your application uses multiple GPUs, there will be multiple --device arguments.

CUDA initialization

Below is a function that you can use at CUDA initialization in your application that will use the CUDA Driver API to setup a blocking sync context. Call this function just before you call cudaSetDevice() in your initialization and the thread yielding is automatically provided when you launch device kernels from your host code. There is no need to poll or use sleep-loop. Sleeping is done in the driver and your thread will resume very quickly when the GPU completes the scheduled task.

bool SetCUDABlockingSync(int device) {
    CUdevice  hcuDevice;
    CUcontext hcuContext;

    CUresult status = cuInit(0);
    if(status != CUDA_SUCCESS)
       return false;

    status = cuDeviceGet( &hcuDevice, device);
    if(status != CUDA_SUCCESS)
       return false;

    status = cuCtxCreate( &hcuContext, 0x4, hcuDevice );
    if(status != CUDA_SUCCESS)
       return false;

    return true;
}

Don't set up a cuda context via the CUDART runtime API before the calling of the new driver API code. If there are ANY calls to CUDART runtime DLL's (other than possibly cudaGetDeviceCount() and cudaGetDeviceProperties()) prior to calling the driver API blocking sync context setup, the driver API call will not have any affect.

Thread priority

Your application should use normal thread priority on Windows. To do this, initialize BOINC as follows:

BOINC_OPTIONS options;
boinc_options_defaults(options);
options.normal_thread_priority = 1;
boinc_init_options(&options);

Performance estimation

Information on performance counters: http://developer.nvidia.com/object/nvperfkit_counters.html