Version 3 (modified by 16 years ago) (diff) | ,
---|
CUDA applications
Device selection
Your application will be run with an additional command-line argument
--device m
where m specifies with GPU to use (0..n-1). Pass this to cudaSetDevice().
If your application uses multiple GPUs, there will be multiple --device arguments.
CUDA initialization
Below is a function that you can use at CUDA initialization in your application that will use the CUDA Driver API to setup a blocking sync context. Call this function just before you call cudaSetDevice() in your initialization and the thread yielding is automatically provided when you launch device kernels from your host code. There is no need to poll or use sleep-loop. Sleeping is done in the driver and your thread will resume very quickly when the GPU completes the scheduled task.
bool SetCUDABlockingSync(int device) { CUdevice hcuDevice; CUcontext hcuContext; CUresult status = cuInit(0); if(status != CUDA_SUCCESS) return false; status = cuDeviceGet( &hcuDevice, device); if(status != CUDA_SUCCESS) return false; status = cuCtxCreate( &hcuContext, 0x4, hcuDevice ); if(status != CUDA_SUCCESS) return false; return true; }
Don't set up a cuda context via the CUDART runtime API before the calling of the new driver API code. If there are ANY calls to CUDART runtime DLL's (other than possibly cudaGetDeviceCount() and cudaGetDeviceProperties()) prior to calling the driver API blocking sync context setup, the driver API call will not have any affect.
Thread priority
Your application should use normal thread priority on Windows. To do this, initialize BOINC as follows:
BOINC_OPTIONS options; boinc_options_defaults(options); options.normal_thread_priority = 1; boinc_init_options(&options);
Performance estimation
Information on performance counters: http://developer.nvidia.com/object/nvperfkit_counters.html