In this blog post, I will implement and experiment with two machine learning techniques for image compression and image segmentation.
Published
May 3, 2023
Image Compression with Singular Value Decomposition
A single value decomposition of a real matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\) is: \(\mathbf{A} = \mathbf{U}\mathbf{D}\mathbf{V}^T\;,\)
where \(\mathbf{D} \in \mathbb{R}^{m \times n}\) has nonzero entries along its diagonal and where \(\mathbf{U} \in \mathbb{R}^{m \times m}\) and \(\mathbf{V} \in \mathbb{R}^{n \times n}\) are orthogonal matrices. The singular value \(\sigma_i\) collectively give some measure of how large the matrix A is.
Here, we will access an RGB image of “Girl with a Pearl Earring” by Vermeer from the internet by its URL. Then, we will download it, and convert it to greyscale.
Via the process above, we have the original image taken from the internet and a greyscale replica of it. We can use the SVD pipeline to construct approximations of this image. This task is called image compression and it is an important problem for storing large quantities of images on computers that may have small amounts of storage.
We will now go ahead and reconstruct an image from its singular value decomposition:
Code
def svd_reconstruct(img, k): U, sigma, V = np.linalg.svd(img)# create the D matrix in the SVD D = np.zeros_like(img,dtype=float) # matrix of zeros of same shape as A D[:min(img.shape),:min(img.shape)] = np.diag(sigma) # singular values on the main diagonal U = U[:,:k] D = D[:k, :k] V = V[:k, :] img = U @ D @ V return img
The svd_reconstruct function takes 2 arguments: the image to reconstruct, and the number k of singular values to use.
The compression factor, the number of bits required to store the compressed image, divided by the number of bits required to store the original image, is log2K / 24.
We will go ahead and experiment with different K values.
Seems like a k value of 50 allows us to not be able to distinguish the reconstructed image from the original by eye. The amount of storage needed for your reconstruction as a fraction of the amount of storage needed for the original image is: log2(5) / 24.
Extra parameters added to svd_reconstruct
We will change a few things to our function to allow user to be able to specific the compression ratio and the epsilon threshold.
Code
def user_svd_reconstruct(img, comp_factor, threshold): U, sigma, V = np.linalg.svd(img) M = img.shape[0] N = img.shape[1]# k must be less than m and n k =round((N * M) / ((N + M +1) * comp_factor))# create the D matrix in the SVD D = np.zeros_like(img,dtype=float) # matrix of zeros of same shape as Afor value in sigma:if (value > threshold): D[:min(img.shape),:min(img.shape)] = np.diag(sigma) # singular values on the main diagonal U = U[:,:k] D = D[:k, :k] V = V[:k, :] img = U @ D @ V return img
The user_svd_reconstruct function have three arguments: the image to reconstruct, and the compression factor, and a desired threshold epsilon. The user can enter the desired compression factor and threshold epsilon. We will go ahead and experiment on different compression ratios and thresholds.