Hi. My name is Alex Wiltschko.

This is my internet log.



17 June, 2011

PyOpenCL vs PyCUDA, a naïve comparison

A simple elementwise kernel, written in both PyOpenCL and PyCUDA, of the form

C(t) = a*A(t) + b*B(t)

Opencl

Cuda

Here, the "w/ access" and "w/out access" mean slightly different things than they did before. In both cases, I pull the data off the device, into main memory, but the difference is, the "w/ access" case pulls the data down at every repetition (I redo the computation 100 times for each array size). It looks like PyCUDA suffers from some difficulties in transferring data over the GPU -> memory bus, at least in the particular way I've written the code up. I don't think I'm unfairly penalizing PyCUDA, at least from the perspective of a user of the library, because the two different libraries seem to be designed to be used in very similar ways. There really are only very few modifications necessary to convert from one library to the other.

The issue, though, is that PyCUDA contains some really serious conveniences that PyOpenCL (currently) lacks, like gradient optimization, and sparse matrix multiplication. Still digging in this stuff, and status reports will be forthcoming when interesting things are uncovered.

Search it. Browse it. Subscribe it. Get caught up in it.


Get the RSS feed! Go ahead.