Mr. Aryeh J. Kuller Profile

Aryeh J. Kuller

Research Fellow

SPIE Involvement:

Author

Publications (2)

Proceedings Article | 13 June 2014 Paper

Targeting multiple heterogeneous hardware platforms with OpenCL

Paul Fox, Stephen Kozacik, John Humphrey, Aaron Paolini, Aryeh Kuller, Eric Kelmelis

Proceedings Volume 9095, 90950E (2014) https://doi.org/10.1117/12.2050643

KEYWORDS: Control systems, Computer programming, Switching, Parallel computing, Manufacturing, Standards development, Wavefronts, Detection and tracking algorithms, Algorithm development, Photonics

Read Abstract +

The OpenCL API allows for the abstract expression of parallel, heterogeneous computing, but hardware implementations have substantial implementation differences. The abstractions provided by the OpenCL API are often insufficiently high-level to conceal differences in hardware architecture. Additionally, implementations often do not take advantage of potential performance gains from certain features due to hardware limitations and other factors. These factors make it challenging to produce code that is portable in practice, resulting in much OpenCL code being duplicated for each hardware platform being targeted. This duplication of effort offsets the principal advantage of OpenCL: portability. The use of certain coding practices can mitigate this problem, allowing a common code base to be adapted to perform well across a wide range of hardware platforms. To this end, we explore some general practices for producing performant code that are effective across platforms. Additionally, we explore some ways of modularizing code to enable optional optimizations that take advantage of hardware-specific characteristics. The minimum requirement for portability implies avoiding the use of OpenCL features that are optional, not widely implemented, poorly implemented, or missing in major implementations. Exposing multiple levels of parallelism allows hardware to take advantage of the types of parallelism it supports, from the task level down to explicit vector operations. Static optimizations and branch elimination in device code help the platform compiler to effectively optimize programs. Modularization of some code is important to allow operations to be chosen for performance on target hardware. Optional subroutines exploiting explicit memory locality allow for different memory hierarchies to be exploited for maximum performance. The C preprocessor and JIT compilation using the OpenCL runtime can be used to enable some of these techniques, as well as to factor in hardware-specific optimizations as necessary.

Proceedings Article | 13 June 2014 Paper

Optimization techniques for OpenCL-based linear algebra routines

Stephen Kozacik, Paul Fox, John Humphrey, Aryeh Kuller, Eric Kelmelis, Dennis Prather

Proceedings Volume 9095, 90950D (2014) https://doi.org/10.1117/12.2050673

KEYWORDS: Linear algebra, Field programmable gate arrays, Matrix multiplication, Matrices, Standards development, Algorithm development, Optimization (mathematics), Graphics processing units, Computer programming, Image processing

Read Abstract +

View contact details

UPDATE YOUR PROFILE

Is this your profile? Update it now.

Sign into your SPIE.org account

Don’t have a profile and want one?

Create an account on SPIE.org

Keywords/Phrases

Search In:

Publication Years