If I understand you correctly, the background (parallax) images are not related to the effect? If that's right then you have the classical situation where multiple parts are kind of equally demanding. You couuld either remove your backgrounds or the shaded tiles and both times you get a measurable performance boost.
Also, again, a simple effect might seem simple on a per pixel level, but at 60fps its done more than 220 million times a second (that would be just one fullscreen draw on your S7 at 60fps) it's a lot of work to be done.
The performance boost with the parallax images is likely because you're at the bandwidth limit of your gpu. With 5 images roughly fullscreen you're operating already at 1 billion pixels a second just for the parallax effect - at 32bit that's up to 8GB/sec of memory that has to be read and written.
Here's another detail where 3d engines may actually be less demanding. 3d games use the z-buffer to determine wheter a pixel has to be drawn/shaded or not and for that reason, usually render whatever is visible front to back. GPUs are highly optimized to do this z-buffer rejection on their pixels and so, in 3d even the same amount of overdraw (like your 5 parallax images) is usually much less demanding on the GPU bandwidth than using the typical 2d way of handling this which is, drawing from back to front, for correct transparency - in 3d games that's only done for geometry/materials with transluency.
If you don't need the shader effect to run per frame, a preprocessed image is a great optimization. You could still do this with a shader and a snapshot as snapshots don't have to be updated each frame.
Also you could try optimizations like, render your parallax effect into a snapshot half the width and height of your actual display and only render this at full resolution. This should reduce the used bandwith for the parallax images to slightly below 50% of whats consumed at the moment. Depending on your artstyle users may not even notice the difference.