Insights from an expert

While shamelessly asking Mr. Samir Shaker for the code accompanying his paper on using OpenCL while implementing the SLAM algorithm, I received very insightful and extremely relevant advice based on his hard earned experience, which I have reproduced in part here (with his permission of course):

 [...] it seems that you are using the AMD implementation of OpenCL. I have worked with both the AMD and Nvidia implementations extensively, and it would be safe to say that Nvidia's implementation is much faster and much more completely. The biggest flaw in the AMD implementation I would say is the lack of support for images in OpenCL. This is a driver issue, and they plan on supporting images eventually, but after all the time that passed since the OpenCL standard they still haven't done so! My code uses images, so it would only run on an Nvidia implementation (for now).

Also, as a general remark, I would like to tell you that from experience (and a lot of reading), not all algorithms are faster on the GPU, even those that can be parallelized. Whether or not you get faster speeds relies on many factors. For example, off the top of the my head, performance on the GPU depends on:

1- The speed of the GPU (seems obvious but): Most integrated GPUs and those on standard laptops (not high-end ones) are slower than the CPUs on board. So running an algorithm on those GPUs will prove much slower than running them on the CPUs available.

2- Type of algorithm: If the algorithm requires a lot of data transfer between the CPU and the GPU, this will likely be a huge bottleneck.

3- The GPU manufacturer: For now, Nvidia's implementation is much better than AMD's or Intel's, and this is natural since they got into the GPU computing game much earlier than the rest, and they kind of drew the path for all the rest.

4- If you are working on a mobile robot and computation is done on-board the robot (as opposed to wirelessly on a desktop PC), having a fast-enough GPU on-board is probably not feasible since those consume a lot of power, so it would be hard to procure a battery powerful enough to handle it.

5- In practice (at least in today's technology), the best time to use GPU computation is when you have a desktop PC with a high-end GPU from Nvidia, those that require a larger system power supply, and when you have an algorithm that can be easily parallelized.


  1. Hi

    I couldn't hesitate to share my thoughts on this topic. I have worked with OpenCL on NVIDIA and AMD devices for over two years now and my experience is the opposite of what is stated above.

    First of all: NVIDIAs OpenCL 1.1 implementation was beta until very recently and their previous implementations was filled with bugs. I even managed to run code which crashed their assembly compiler! AMD on the other hand had a 1.1 implementation that was working from day one. Also, AMD have a stable 1.2 implementation available now. NVIDIA, on the other hand, hasn't said a word about 1.2 yet.

    Images: AMD lacking image support?? Not true. This is supported on all devices from the 5xxx to the newest 7xxx series. It is actually NVIDIA that lacks proper image support. They don't support writing to 3D images in kernels on ANY devices. This is a big problem for me who do a lot of 3D image processing. Due to this restriction, a lof of my code runs a lot faster on AMDs 5xxx and 7xxx cards than a Tesla C2070 we have in our lab which costs 5+ times more than the AMD cards.

    So to sum up my experience: AMD OpenCL >> NVIDIA OpenCL :)

    - Erik Smistad

  2. Thanks for you thoughts!

    Actually, I have done some quick research myself, and I arrived at the conclusion that AMDs last offerings when it comes to OpenCL are surpassing those of nVidia in many respects.

    AMD recently released their flagship graphics card "HD7970", which seems to do better than nVidia in all sorts of benchmarks, especially for OpenCL. Sources: