Table Of ContentOpenCL Programming
by Example
A comprehensive guide on OpenCL programming
with examples
Ravishekhar Banger
Koushik Bhattacharyya
BIRMINGHAM - MUMBAI
OpenCL Programming by Example
Copyright © 2013 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, without the prior written
permission of the publisher, except in the case of brief quotations embedded in
critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy
of the information presented. However, the information contained in this book is
sold without warranty, either express or implied. Neither the authors, nor Packt
Publishing, and its dealers and distributors will be held liable for any damages
caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the
companies and products mentioned in this book by the appropriate use of capitals.
However, Packt Publishing cannot guarantee the accuracy of this information.
First published: December 2013
Production Reference: 1161213
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-84969-234-2
www.packtpub.com
Cover Image by Asher Wishkerman ([email protected])
Credits
Authors Project Coordinators
Ravishekhar Banger Wendell Palmer
Koushik Bhattacharyya Amey Sawant
Reviewers Proofreader
Thomas Gall Mario Cecere
Erik Rainey
Erik Smistad Indexers
Rekha Nair
Priya Subramani
Acquisition Editors
Wilson D'souza
Kartikey Pandey Graphics
Sheetal Aute
Kevin Colaco
Ronak Dhruv
Yuvraj Mannari
Lead Technical Editor
Arun Nadar Abhinash Sahu
Technical Editors Production Coordinator
Gauri Dasgupta Conidon Miranda
Dipika Gaonkar
Faisal Siddiqui Cover Work
Conidon Miranda
About the Authors
Ravishekhar Banger calls himself a "Parallel Programming Dogsbody". Currently
he is a specialist in OpenCL programming and works for library optimization using
OpenCL. After graduation from SDMCET, Dharwad, in Electrical Engineering, he
completed his Masters in Computer Technology from Indian Institute of Technology,
Delhi. With more than eight years of industry experience, his present interest
lies in General Purpose GPU programming models, parallel programming, and
performance optimization for the GPU. Having worked for Samsung and Motorola,
he is now a Member of Technical Staff at Advanced Micro Devices, Inc. One of his
dreams is to cover most of the Himalayas by foot in various expeditions. You can
reach him at [email protected].
Koushik Bhattacharyya is working with Advanced Micro Devices, Inc. as
Member Technical Staff and also worked as a software developer in NVIDIA®. He
did his M.Tech in Computer Science (Gold Medalist) from Indian Statistical Institute,
Kolkata, and M.Sc in pure mathematics from Burdwan University. With more than
ten years of experience in software development using a number of languages and
platforms, Koushik's present area of interest includes parallel programming and
machine learning.
We would like to take this opportunity to thank "PACKT publishing"
for giving us an opportunity to write this book.
Also a special thanks to all our family members, friends and
colleagues, who have helped us directly or indirectly in writing
this book.
About the Reviewers
Thomas Gall had his first experience with accelerated coprocessors on the
Amiga back in 1986. After working with IBM for twenty years, now he is working
as a Principle Engineer and serves as Linaro.org's technical lead for the Graphics
Working Group. He manages the Graphics and GPGPU teams. The GPGPU team
is dedicated to optimize existing open source software to take advantage of GPGPU
technologies such as OpenCL, as well as the implementation of GPGPU drivers for
ARM based SoC systems.
Erik Rainey works at Texas Instruments, Inc. as a Senior Software Engineer on
Computer Vision software frameworks in embedded platforms in the automotive,
safety, industrial, and robotics markets. He has a young son, who he loves playing
with when not working, and enjoys other pursuits such as music, drawing, crocheting,
painting, and occasionally a video game. He is currently involved in creating the
Khronos Group's OpenVX, the specification for computer vision acceleration.
Erik Smistad is a PhD candidate at the Norwegian University of Science and
Technology, where he uses OpenCL and GPUs to quickly locate organs and other
anatomical structures in medical images for the purpose of helping surgeons
navigate inside the body during surgery. He writes about OpenCL and his projects
on his blog, thebigblob.com, and shares his code at github.com/smistad.
www.PacktPub.com
Support files, eBooks, discount offers
and more
You might want to visit www.PacktPub.com for support files and downloads related to
your book.
Did you know that Packt offers eBook versions of every book published, with PDF and
ePub files available? You can upgrade to the eBook version at www.PacktPub.com and
as a print book customer, you are entitled to a discount on the eBook copy. Get in touch
with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up
for a range of free newsletters and receive exclusive discounts and offers on Packt books
and eBooks.
TM
http://PacktLib.PacktPub.com
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital
book library. Here, you can access, read and search across Packt's entire library of books.
Why Subscribe?
• Fully searchable across every book published by Packt
• Copy and paste, print and bookmark content
• On demand and accessible via web browser
Free Access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access
PacktLib today and view nine entirely free books. Simply use your login credentials for
immediate access.
Table of Contents
Preface 1
Chapter 1: Hello OpenCL 7
Advances in computer architecture 8
Different parallel programming techniques 10
OpenMP 10
MPI 11
OpenACC 11
CUDA 12
CUDA or OpenCL? 12
Renderscripts 13
Hybrid parallel computing model 13
Introduction to OpenCL 13
Hardware and software vendors 15
Advanced Micro Devices, Inc. (AMD) 15
NVIDIA® 17
Intel® 18
ARM Mali™ GPUs 19
OpenCL components 19
An example of OpenCL program 21
Basic software requirements 21
Windows 21
Linux 21
Installing and setting up an OpenCL compliant computer 22
Installation steps 22
Installing OpenCL on a Linux system with an AMD graphics card 23
Installing OpenCL on a Linux system with an NVIDIA graphics card 24
Installing OpenCL on a Windows system with an AMD graphics card 24
Installing OpenCL on a Windows system with an NVIDIA graphics card 24
Apple OSX 25
Table of Contents
Multiple installations 25
Implement the SAXPY routine in OpenCL 26
Summary 32
References 33
Chapter 2: OpenCL Architecture 35
Platform model 36
AMD A10 5800K APUs 37
AMD Radeon™ HD 7870 Graphics Processor 38
NVIDIA® GeForce® GTC 680 GPU 38
Intel® IVY bridge 39
Platform versions 40
Query platforms 40
Query devices 42
Execution model 45
NDRange 46
OpenCL context 50
OpenCL command queue 51
Memory model 52
Global memory 53
Constant memory 53
Local memory 53
Private memory 54
OpenCL ICD 55
What is an OpenCL ICD? 56
Application scaling 57
Summary 58
Chapter 3: OpenCL Buffer Objects 59
Memory objects 60
Creating subbuffer objects 62
Histogram calculation 65
Algorithm 65
OpenCL Kernel Code 66
The Host Code 68
Reading and writing buffers 71
Blocking_read and Blocking_write 73
Rectangular or cuboidal reads 75
Copying buffers 79
Mapping buffer objects 80
Querying buffer objects 83
Undefined behavior of the cl_mem objects 85
Summary 85
[ ii ]
Table of Contents
Chapter 4: OpenCL Images 87
Creating images 88
Image format descriptor cl_image_format 88
Image details descriptor cl_image_desc 90
Passing image buffers to kernels 95
Samplers 96
Reading and writing buffers 98
Copying and filling images 100
Mapping image objects 102
Querying image objects 102
Image histogram computation 104
Summary 108
Chapter 5: OpenCL Program and Kernel Objects 109
Creating program objects 110
Creating and building program objects 110
OpenCL program building options 117
Querying program objects 118
Creating binary files 120
Offline and online compilation 121
SAXPY using the binary file 123
SPIR – Standard Portable Intermediate Representation 125
Creating kernel objects 126
Setting kernel arguments 127
Executing the kernels 129
Querying kernel objects 130
Querying kernel argument 131
Releasing program and kernel objects 134
Built-in kernels 135
Summary 135
Chapter 6: Events and Synchronization 137
OpenCL events and monitoring these events 139
OpenCL event synchronization models 140
No synchronization needed 140
Single device in-order usage 140
Synchronization needed 141
Single device and out-of-order queue 141
Multiple devices and different OpenCL contexts 141
Multiple devices and single OpenCL context 142
Coarse-grained synchronization 143
Event-based or fine-grained synchronization 145
Getting information about cl_event 147
[ iii ]