![]() |
Mali OpenCL SDK v1.1.0
|
Floating point FIR (Finite Input Response) image filter for pixelization and/or noise reduction.
The FIR filter is designed to calculate averages from a finite input.
To simplify the concept of FIR filtering, consider the one-dimensional signal [17 76 17 84 29], to which we will apply a 3x1 filter with the coefficients [3/15 9/15 3/15]. To keep the output signal values less than the common coefficient denominator, all the coefficient numerators summed together should be less than or equal to the common coefficient denominator.
The output is the sum of the signal-values multiplied with the filter coefficients.
[17 76 17] → [(17 * 3/15) + (76 * 9/15) + (17 * 3/15)] → [52.4] → [52]. (Using integers for simplicity).
We have included a 512x512 input bitmap for use with this sample (to keep the size of the installer small). However, you are more likely to see performance improvements (when compared to C code running on a CPU) when larger images are used. There is some start-up overhead associated with using OpenCL. This overhead can outweigh the benefits of parallel processing when the input data sizes are small.
This sample has been coded to allow any input bitmap to be used. Simply change input.bmp in the assets directory of the sample to the input image of your choice. You will see larger calculation performance improvements when larger images are used.
It is important to note that we have not considered padding here. The output image is two pixels smaller in both dimensions. Because every output requires pixels around it, it is impossible to calculate the output for the edge pixels. In this example we are simply leaving the edge output pixels as the values they are initialized to.
Sometimes it can be desirable to have the size of the output-signal be the same as the size of the input-signal, in which case "padding" must be applied to the input to take into account for the fact that the filter-application, by its nature, reduces the size. Strategies for padding differ, but for images, a common choice is to repeat the boundary-values (i.e. the outmost set of pixels) on all sides or (in some cases) on just some sides.
Mali-T600 series GPU pipelines provide true IEEE-754 single-precision floating-point math in hardware. We recommend to use vectors of 128-bit wide. For more information about vectorization, see Vectorizing your OpenCL code.
In this sample, the calculations use 32-bit floating point numbers. One 128-bit vector can fit four 32-bit floating point numbers. Therefore, using float4's makes maximum use of the hardware.
We recommend the use of vectors wherever possible when using a Mali-T600 series GPU.
We implement FIR filtering on a single 8-bit channel for simplicity. To do FIR filtering on RGB images you can run the FIR filter on each channel separately and then combine the results. In this sample we take an RGB image, convert it to a 8-bit luminance image and send it to the GPU.
Each FIR calculation gives an output for the centre pixel of the mask. The output value of the centre pixel is the sum of the pixel values in a 3x3 grid around the pixel, multiplied by the coefficients. This can be split into three stages by doing the summations for each row of the grid separately. This sample does the calculations one row at a time. However, instead of operating on one pixel at a time, it uses vectors of 4 pixels at once.
The kernel applies a 3x3 FIR filter with constant coefficients (weights) to a 6x3 window in an input luminance image to produce 4x1 windows in the output image.
The input and output images are represented using arrays of floating point numbers.
The FIR coefficients are defined as constant floats and organized as follows:
FW_UL FW_UM FW_UR FW_CL FW_CM FW_CR FW_BL FW_BM FW_BR
In a real application, these values can be derived in a number of different ways depending on the intended result. For this sample, the values are random.
Our common coefficient denominator is 256, so the output pixel fits within a char. And as explained in The Algorithm section, the sums of the coefficient numerators must be less than 256.
The coefficients being used are:
[30 5 6 ] [19 30 9 ] / 256 [15 5 40]
Unless otherwise noted, all code snippets come from the OpenCL kernel found in fir_float.cl.
Choosing the size of the kernel
We are using vector types in the kernel and so we are actually outputting 4 results per kernel. See above for more details of vectorising. We adjust the pointers into the data to reflect this:
And when we enqueue the kernel in fir_float.cpp, we reduce the worksize accordingly:
Loading the input data
Here we do vector loads from one row of the data:
Applying the filter
Then we carry out the calculation on 4 pixels. Each vector calculation can be done as a single operation on Mali-T600 series GPU:
We apply the same pattern to the second and third row, to accumulate the result in the accumulator variable:
Storing the result
Finally store the data. We use a vector store to write out the 4 results at once:
From a command prompt in the root of the SDK, run:
This compiles the FIR float sample code and copies all the files it needs to run to the bin folder in the root directory of the SDK.
Navigate to the folder on the board and run the FIR float binary:
You should see output similar to:
An output image should be created on the board called output.bmp.
Find solutions for Common Issues.
For more information have a look at the code in fir_float.cpp and fir_float.cl.