Working with Structural Similarity Index
This article is part of series dedicated to autoencoders where the purpose is to create a system that reconstructs input images as best as possible.
SSIM as a Loss Function
Working with Neural Network implies to set an optimization function and a loss function.
The optimization function may be ”Adam”, with learning rate of eg 0.001 or 0.0005, tuned in the Learning Rate scheduler (callback in the Keras fit function).
The loss function may be the very popular MSE “Mean Square Error”, but it was found not very practical to analyze visual difference between the output and input images.
The MSE calculation focuses on pixel values, whereas the Structural Similarity Index (SSIM) measurement focuses and analyzing the structural differences between 2 images using analysis within NxN pixel window.
Mean Square Error is formulated like that:
The standard formula for Structural Similarity Index is more sophisticated:
It was found that the scikit-learn (sk) has an implementation of SSIM, so does Tensorflow (tf) and Pytorch. In this study we will not try the pytorch implementation and we will use only the Tensorflow implementation.
As SSIM evaluates the similarity, the result will be between -1 (no similarity) and 1 (fully similar).
Let us see the error measurement on 4 images.
For each image, the MSE and tf_ssim were evaluated using this code as a base:
import tensorflow as tf
# MSE
error_mse = mse(img, imgRef)# SSIM TF - conversion needed for use with Tensorflow
im1 =tf.image.convert_image_dtype(imgRef, tf.float32)
im2 =tf.image.convert_image_dtype(img, tf.float32)
error_tf_ssim = float(tf.image.ssim(im1, im2, max_val=1, filter_size=11)
This code has been tested on 4 images
· Image 1 : original
· Image 2 : Salt and Pepper noise added using random_noise function of sckit-learn
· Image 3 : some gray overlays have been added
· Image 4 : converted to HSV format, Value has been increased by 30
In all pictures, the eye can clearly see that the images 2, 3 4 are similar to Image 1.
Result on Image 2
It is interesting to notice that the Salt and Pepper noise gives a huge MSE error, This is expected as many pixels have received a dramatic value change.
TF_SSIM gives a reasonable score, though not as high as expected. As the SSIM compares patterns and pixel gradients, the introduction and random values hinders the SSIM approach from performing well.
Result on Image 3
This picture shows the superiority of SSIM Calculation over MSE. Here the MSE is very high just because some pixel values have changed, and not all in the same way. Thanks to the SSIM calculations, we can see that the image 3 is very similar to image 1.
Result on Image 4
The MSE performs better as the pixel values have slightly changed and a similar way.
The SSIM works as expected.
Conclusion
We can safely conclude that SSIM is an accurate way, at least better than MSE, to calculate how images can be similar.
Therefore, it also makes sense to use SSIM as the Loss function during training of (convolutional) neural networks.
We have seen that high similarity error gives SSIM close to 0, which is opposite to MSE calculation.
So, we will calculate the SSIM error as 1-SSIM, so that we are aligned with the MSE calculation:
· Low error = good reconstruction of the image
· High error = bad reconstruction of the image
Implementation of SSIM as Loss Function
This is how to set it up using Keras:
def SSIMLoss(y_true, y_pred):
return 1 - tf.reduce_mean(tf.image.ssim(y_true, y_pred, 1.0))
....
model.compile(optimizer=opt, loss=[SSIMLoss])
A final word an SSIM
This paper http://www2.units.it/ramponi/teaching/DIP/materiale/z08_mse_bovik09.pdf explains an enhanced way to calculate SSIM, called Complex Wavelet SSIM (CW-SSIM); an implementation can be found here: https://github.com/llvll/imgcluster
Bovik, A.C.: Mean squared error: love it or leave it? — A new look at signal fidelity measures. IEEE Sig. Process. Mag. 26, 98–117
CW-SSIM seems more robust than standard SSIM.
This approach has not been tested, although promising.