Project 1: Colorizing the Prokudin-Gorskii photo collection


Overview

Sergei Mikhailovich Prokudin-Gorskii (1863-1944) was quite ahead of his time! With his dream of capturing the world in color, he won Tzar's special permission to travel across the Russian Empire and take color photographs of everything he saw. His idea was simple: record three exposures of every scene onto a glass plate using a red, a green, and a blue filter. His RGB glass plate negatives survived beyond the Russian empire, and were purchased in 1948 by the Library of Congress. The LoC has recently digitized the negatives and made them available online.

In this project, my aim was to take the digitized Prokudin-Gorskii glass plate images as input and automatically produce a color image with as few visual artifacts as possible. In summary, the three color channel images were extracted, aligned, and placed on top of each other to form a single RGB color image.


Approach

Exhaustive Search (aka Window Search)

First, I separated the 3 RGB color channels by dividing the original image into three equal parts along its height axis. Next, I cropped the image by 15% on each side as to avoid issues caused by the border throwing off image similarity metrics.

The exhaustive search was implemented by naively searching over a 30x30 window, shifting the image to be aligned (G or R) from [-15, 15] on both width and height. At each shift, the image to be aligned was compared with the reference image (B) and evaluated by an image comparison metric. Metrics that I tested were MSE (Mean Squared Error), NCC (Normalized Cross Correlation), and SSIM (Structural Similarity Index Measure). I found NCC to be the most efficient as well as relatively accurate.

For larger images, exhaustive search (even on the same 30x30 window size) becomes quite slow. I parallelized the exhaustive search function using ThreadPoolExecutor, which distributes the work of calculating NCC for different pixel shifts across multiple CPU cores (instead of just one). This allows multiple computations to run simultaneously instead of sequentially, so it's much faster! This reduced runtime with the image pyramid from 2 minutes ± 5 seconds to 30 seconds ± 2 seconds for all images. Speaking of the image pyramid, let's go to the next section. ☺

Constructing an Image Pyramid

For high-resolution glass plate scans, exhaustive search will become prohibitively expensive since the pixel displacement is too large. An image pyramid is an alternative, faster search procedure. The pyramid represents the input image at multiple scales (I scaled by a factor of 2) and sequentially performs exhaustive search by starting from the coarsest scale (smallest image) and going down the pyramid, updating the estimate as you go.

In short, the image pyramid progressively refines the displacement at different image scales. It first downsamples the two images by half the resolution and recursively searches for the displacement at the lower resolution. Once the best displacement is found at the lowest resolution, it scales the displacement back up and refines the alignment at the original resolution using exhaustive search (parallelized version). This is way more efficient because it narrows down the search space at each level!


Results

Image 1
cathedral.jpg
R: (12, 3), G: (5, 2)
Image 2
church.jpg
R: (58, -4), G: (25, 4)
Image 3
emir.jpg
R: (0, -405), G: (49, 24)
Image 4
harvesters.jpg
R: (127, 14), G: (59, 17)
Image 5
icon.jpg
R: (89, 23), G: (41, 17)
Image 6
lady.jpg
R: (117, 11), G: (52, 8)
Image 7
melons.jpg
R: (191, 13), G: (81, 10)
Image 8
monastery.jpg
R: (3, 2), G: (-3, 2)
Image 9
onion_church.jpg
R: (113, 36), G: (51, 27)
Image 10
sculpture.jpg
R: (147, -26), G: (33, -11)
Image 11
self_portrait.jpg
R: (191, 38), G: (78, 29)
Image 12
three_generations.jpg
R: (113, 11), G: (52, 13)
Image 13
tobolsk.jpg
R: (6, 3), G: (3, 3)
Image 14
train.jpg
R: (87, 32), G: (42, 6)

All images looked beautifully colored and relatively clear, except for Emir. A potential issue with him is that his blue coat shows up as pixels equal or close to 1 in the blue channel image, but much less light/information is captured in the green and red channel images. This can throw off NCC as it is based off of the RGB color channel matching - a metric like SSIM that doesn't depend on this could have better performance.


Additional Examples

Image 1
canal.jpg
R: (47, 48), G: (28, 24)
Image 2
flower_bush.jpg
R: (97, -24), G: (49, -6)
Image 3
flower_field.jpg
R: (60, 13), G: (8, 8)
Image 4
forest_path.jpg
R: (129, 41), G: (56, 21)
Image 5
lady_thinking.jpg
R: (76, 35), G: (38, 21)
Image 6
lugano.jpg
R: (93, -29), G: (41, -16)