Bookmark and Share

ARM Guide to OpenCL Optimizing Convolution: The Reference Implementation

Register or sign in to access the Embedded Vision Academy's free technical training content.

The training materials provided by the Embedded Vision Academy are offered free of charge to everyone. All we ask in return is that you register, and tell us a little about yourself so that we can understand a bit about our audience. As detailed in our Privacy Policy, we will not share your registration information, nor contact you, except with your consent.

Registration is free and takes less than one minute. Click here to register, and get full access to the Embedded Vision Academy's unique technical training content.

If you've already registered, click here to sign in.

See a sample of this page's content below:

This chapter describes some initial implementations of convolutions and describes the reference implementation for the optimization process.

These are the simplest and most intuitive implementations of convolution algorithms.

Convolution example in C and C++, run on the application processor

Before any optimizations are implemented, the algorithm just runs on the application processor. Any optimizations using OpenCL and GPU compute start from this state.

The 2D convolution operation requires two double-loops:

  • Two loops for scanning each pixel of the source image.
  • Two loops for scanning each pixel inside the convolution matrix.

The code is composed of the following main parts:

  1. Convolution computation with two loops.
    The two loops are used for computing the sum of products between all convolution matrix values H(i, j) and the corresponding source image element I(u + x, v + y).

    // Convolution computation
    for(int32_t yKernel = -1; yKernel < 2; yKernel++)
        for(int32_t xKernel = -1; xKernel < 2; xKernel++)
            // Compute address
            int32_t xPx = (x + xKernel)*src->pixel_size();
            int32_t scanline = (y + yKernel)*strideByte;

            // Get red, green and blue channels in source image
            srcRedCh = (int16_t)*(srcData + xPx + scanline);
            srcGreenCh = (int16_t)*(srcData + xPx + 1 + scanline);
            srcBlueCh = (int16_t)*(srcData + xPx + 2 + scanline);

            // Get kernel value
            int16_t convolution_values = *(convolution_matrix + (xKernel + 1) + (yKernel + 1)*3);

            // Store partial result
            dstRedCh += srcRedCh * convolution_values;
            dstGreenCh += srcGreenCh * convolution_values;
            dstBlueCh += srcBlueCh * convolution_values;

  2. Normalization to ensure the resulting image has the same brightness as the original. See Integer values for the convolution matrix for an explanation of the scale value.

    // Avoid divide by zero
    if(scale != 0)
        dstRedCh /= scale;