Assignment 2¶

Part 1: Image Processing Basics¶

Computers use tiny dots called pixels to display images. Each pixel is stored as an array of numbers that represent color intensities.

Example. In an 8-bit grayscale image, each pixel is a single number. The number represents light intensity ranging from black (0) to white (255).

Example. In a 24-bit RGB color image, each pixel is an array of 3 numbers. These numbers range from 0 to 255 and represent red, green, and blue intensity, respectively. For instance, (0, 0, 255) is bright blue and (255, 128, 0) is orange.

In this assignment, you'll use Python and NumPy to manipulate 24-bit RGB color images.

You can use Image.open() from the Python imaging library (PIL) to open an image:

from PIL import Image

# Cat image from https://unsplash.com/photos/FqkBXo2Nkq0
cat_img = Image.open("cat.png")

Images display inline in Jupyter notebooks:

cat_img

In a Python terminal, you can display the image in a new window with .show() instead.

NumPy can convert images to arrays:

import numpy as np

cat = np.array(cat_img)

To convert an array back to an image (for display) use the function below:

def as_image(x):
    """Convert an ndarray to an Image.
    
    Args:
        x (ndarray): The array of pixels.
        
    Returns:
        Image: The Image object.
    """
    return Image.fromarray(np.uint8(x))

Exercise 1.1. How many dimensions does the cat array have? What does each dimension represent?

cat.shape
# 267 rows, 400 columns and 3 elements in each position inside the array

(267L, 400L, 3L)

cat[1,1]

array([34, 38, 39], dtype=uint8)

Exercise 1.2. Use .copy() to copy the cat array to a new variable. Swap the green and blue color channels in the copy. Display the result.

# origin red, green, and blue
cat_cp = cat.copy()
# red, blue, green
# flip can work for reverse position of sets of elements if I put the whole array into flip function
# however, flip can reverse the order inside each set if there is only one row: [[a,b,c],[d,e,f],[g,h,j]]
# then roll
# so I need a loop
for i in range(0,267):
        cat_cp[i,:]=np.flip(cat_cp[i,:],axis=1)
        cat_cp[i,:]=np.roll(cat_cp[i,:],1,axis=1)
print cat[1:3,1:3]
print cat_cp[1:3,1:3]

[[[34 38 39]
  [84 86 88]]

 [[33 38 38]
  [80 84 84]]]
[[[34 39 38]
  [84 88 86]]

 [[33 38 38]
  [80 84 84]]]

as_image(cat_cp)

Exercise 1.3. Why is .copy() necessary in exercise 1.2? What happens if you don't use .copy()?

If we use cat_cp = cat, python will not create a new dataset which is called cat_cp. So each time we use cat_cp, we are actually using cat itself. When we are making changes for cat_cp, we are making changes in cat instead.

Exercise 1.4. Flip the blue color channel from left to right. Display the resulting image. Hint: see the NumPy documentation on array manipulation routines.

# origin red, green, and blue
# use fliplr 
cat_blue=np.empty([267,400,3],dtype=int)
for i in range(0, 267):
    cat_blue[i,:]=np.fliplr(cat[i,:])
print cat[1:3,1:3]
print cat_blue[1:3,1:3]

[[[34 38 39]
  [84 86 88]]

 [[33 38 38]
  [80 84 84]]]
[[[39 38 34]
  [88 86 84]]

 [[38 38 33]
  [84 84 80]]]

as_image(cat_blue)

Part 2: Singular Value Decomposition¶

Suppose $X$ is an $n \times p$ matrix (for instance, one color channel of the cat image). The singular value decomposition (SVD) factors $X$ as $X = UD V^T$, where:

$U$ is an $n \times n$ orthogonal matrix
$D$ is an $n \times p$ matrix with zeroes everywhere except the diagonal
$V$ is an $p \times p$ orthogonal matrix

Note that a matrix $A$ is orthogonal when $A^T A = I$ and $AA^T = I$.

Example. We can use NumPy to compute the SVD for a matrix:

# the way of matrix present in array: ([[1,2],
                                    #   [3,4]])  this is different with matrix form in R
# this form is actually ([[1,2],[3,4]]) -> one row or one column in the array
# however, u, d, vt below are matrices in python, there is no comma between[] and [].
x = np.array(
    [[0, 2, 3],
     [3, 2, 1]]
)
u, d, vt = np.linalg.svd(x)
# Here d is 2x2 because NumPy only returns the diagonal of D.
print "u is:\n", u, "\nd is:\n", d, "\nv^T is:\n", vt

u is:
[[-0.68145174 -0.73186305]
 [-0.73186305  0.68145174]] 
d is:
[ 4.52966162  2.54600974] 
v^T is:
[[-0.48471372 -0.62402665 -0.6128975 ]
 [ 0.80296442 -0.03960025 -0.59470998]
 [ 0.34684399 -0.78039897  0.52026598]]

If we let

$u_i$ denote the $i$th column of $U$
$d_i$ denote the $i$th diagonal element of $D$
$v_i$ denote the $i$th column of $V$

then we can write the SVD as $\ X = UDV^T = d_1 u_1 v_1^T + \ldots + d_m u_m v_m^T\ $ using the rules of matrix multiplication. In other words, the SVD decomposes $X$ into a sum!

If we eliminate some of the terms in the sum, we get a simple approximation for $X$. For instance, we could eliminate all but first 3 terms to get the approximation $X \approx d_1 u_1 v_1^T + d_2 u_2 v_2^T + d_3 u_3 v_3^T$. This is the same as if we:

Zero all but the first 3 diagonal elements of $D$ to get $D_3$, then compute $X \approx UD_3V^T$
Eliminate all but the first 3 columns of $V$ to get $p \times 3$ matrix $V_3$, then compute $X \approx UDV_3^T$

We always eliminate terms starting from the end rather than the beginning, because these terms contribute the least to $X$.

Why would we want to approximate a matrix $X$?

In statistics, principal components analysis uses this approximation to reduce the dimension (number of covariates) in a centered (mean 0) data set. The vectors $d_i u_i$ are called the principal components of $X$. The vectors $v_i^T$ are called the basis vectors. Note that both depend on $X$. The dimension is reduced by using the first $q$ principal components instead of the original $p$ covariates. In other words, the $n \times p$ data $X$ is replaced by the $n \times q$ data $UD_q = XV_q$

In computing, this approximation is sometimes used to reduce the number of bits needed to store a matrix (or image). If $q$ terms are kept, then only $nq + pq$ values (for $XV_q$ and $V_q^T$) need to be stored instead of the uncompressed $np$ values.

Exercise 2.1. Write the functions described below.

A function that takes a matrix $X$ and returns its principal component matrix $XV_q$ and basis matrix $V_q^T$. This function should also take the number of terms kept $q$ as an argument.
A function that takes a principal component matrix $XV_q$ and basis matrix $V_q^T$ and returns an approximation $\hat{X}$ for the original matrix.

As usual, make sure to document your functions. Test your function on the red color channel of the cat image. What's the smallest number of terms where the cat is still recognizable as a cat?

#returns its principal component matrix and basis matrix 
def pca(x,q):
    u, d, vt = np.linalg.svd(x)
    d = d[0:q]
    vt = vt[0:q,:]
    pcm = x.dot(vt.T)
    bm = vt
    return pcm, bm

# test my function on x
x = np.array(
    [[0, 2, 3],
     [3, 2, 1]]
)
x_pcm,x_bm = pca(x,2)
print "x_pcm (n*q) is:\n", x_pcm
print "x_bm (q*p) is:\n",x_bm

x_pcm (n*q) is:
[[-3.0867458  -1.86333045]
 [-3.31509197  1.73498277]]
x_bm (q*p) is:
[[-0.48471372 -0.62402665 -0.6128975 ]
 [ 0.80296442 -0.03960025 -0.59470998]]

# takes a principal component matrix and basis matrix and returns an approximation of x
def pca_est(pcm,bm):
    x_est = pcm.dot(bm)
    return x_est
# test with x
# approximation of x by using the 1st and 2nd principle components
pca_est(x_pcm,x_bm)

array([[ -4.44089210e-16,   2.00000000e+00,   3.00000000e+00],
       [  3.00000000e+00,   2.00000000e+00,   1.00000000e+00]])

# test my function on the red color channel of the cat image
# first take out the red color channel: red, green, blue
red_pixls = np.empty([267,400],int)
# Is there a simpler way to do this without loop?????????
for i in range(0,267):
    for j in range(0,400):
        red_pixls[i,j]=cat[i,j][0] #red
print red_pixls.shape

(267L, 400L)

# test by using cat image
def test_red(red_pixls,q):
    # approximate x by using singular value decomposition
    red_pcm,red_bm = pca(red_pixls,q)  
    red_pixls_est = pca_est(red_pcm,red_bm)
    #plugin the est red value back into the original array
    cat_adj=cat.copy()
    for i in range(0,267):
        for j in range(0,400):
            cat_adj[i,j][0]=red_pixls_est[i,j] #red
            cat_adj[i,j][1:3]=[0,0] # value of other channels to be zero
    return as_image(cat_adj)

test_red(red_pixls,100)

test_red(red_pixls,50)

test_red(red_pixls,10)

we cannot recognize the cat when there are less than 10 terms in the image.

Exercise 2.2. You can check the number of bytes used by a NumPy array with the .nbytes attribute. How many bytes does the red color channel of the cat image use? How many bytes does the compressed version use when 10 terms are kept? What percentage of the original size is this?

# function that return the array of red channel with q terms
def red_color(red_pixls,q):
    red_pcm,red_bm = pca(red_pixls,q)  
    red_pixls_est = pca_est(red_pcm,red_bm)
    red_adj=cat.copy()
    for i in range(0,267):
        for j in range(0,400):
            red_adj[i,j][0]=red_pixls_est[i,j] #red
            red_adj[i,j][1:3]=[0,0]
    return red_adj

# red channel
red_color(red_pixls,400).nbytes
as_image(red_color(red_pixls,400))

# compressed version: when 10 terms are kept
print red_color(red_pixls,10).nbytes
as_image(red_color(red_pixls,10))

320400

# percentage of the original size
print cat.nbytes
as_image(cat)

The bytes of these three graphs are the same. I don't know the reason. Maybe I didn't really understand the question.