Remember the typical scene in a crime series when they have a blurry image of a suspect and ask their technology expert to “zoom in and enhance”?
Although those scenes are nowhere technically accurate, there exist some techniques that take low-resolution images as input and upscale them to higher resolution ones. Super-resolution is one of them and for a long time the idea was thought to be science fiction as the “data processing inequality theorem” states that the post-processing of data cannot add any information that was not already there. However, with the advent of neural networks and GAN’s, you can add information that was learned by training these networks on large amounts of examples thus allowing for actual reconstruction of faces for example.
Super-resolution has a lot of interesting real-world applications that are only just starting to be explored, such as reducing the file sizes of images and videos, as a preprocessing step for various AI applications such as for example deepfakes and as a post-processing step in various industries such as in the medical field, cosmology or simply for enhancing your favorite old movies and pictures.
Figure 1. Example of upscaling a blurry image.
Although the idea is not new, the field of super-resolution has revived with the advent of GAN’s and made significant improvements in only a couple of years. Moreover, a big advantage of this particular field is that an unlimited amount of data is available, since you can easily downscale high resolution images and use these pairs as training data. There are also lots of publicly available datasets such as for example https://data.vision.ee.ethz.ch/cvl/DIV2K/.
While there is growing interest, super resolution still faces major challenges in developing effective algorithms. These challenges are summarized below:
A recently published paper called DFDNet [1] achieved state of the art results on the upscaling of human faces. However, it only can scale up the faces itself, but keeps the surroundings as is. In this thesis, you would investigate the possibility of also upscaling the background, as a separate network or incorporated in the DFDNet architecture. This would open the door for video upscaling, as now there are clearly visible artifacts when only upscaling someone's face next to the background staying blurred.
Research and create a Machine Learning algorithm that can upscale the resolution of an image and fill in the details realistically. Technologies that can be used are Python, Tensorflow, Keras and in general the Python data science and machine learning track.
Always wondered how the old photo album of your family heritage would look like in color? Interested in bringing the past more to life? Then this might be a subject for you.
Image colorization is the process of trying to convert a grayscale image to a colored one, while filling in the colors as realistically as possible. The idea is not new, people have been hand-coloring photos since decades and also some computer-aided, reference based techniques popped up in the early 2000’s. However, there has been tremendous progress in the last 5 years through the use of diverse deep-learning architectures ranging from the early brute-force networks [3] to more recent custom-designed Generative Adversarial Networks [4].
Figure 2. Image colorization example.
While there is growing interest, image colorization still faces major challenges in developing effective algorithms. These challenges are summarized below:
Figure 3. Comparison between Color Image and Gray Image. [5]
What if these colorization techniques could be applied to videos? The research around image colorization has almost exclusively been centered around images, and currently video colorization is mostly just the application of image colorization to the individual frames of the video. There are a lot of possibilities to improve the state of the art for video colorization by for example taking the temporal component into account when coloring in the frames or trying to fix some of the challenges that are specific to old videos such as mitigating the flickering effect.
Research and create a Machine Learning algorithm that can colorize videos realistically, improving on the current state of the art of colorizing individual frames by taking temporal components into account. Technologies that can be used are Python, Tensorflow, Keras and in general the Python data science and machine learning track.
Ever wonder how you would look in a certain t-shirt or pair of shoes without having to try it on? Well, that’s the problem that garment transfer is trying to solve. Given an image of a person and piece of clothing as input, the goal is to get a photo-realistic picture of that person wearing that piece of clothing.
Garment transfer existed as science fiction for a long time, but only recently became possible to solve with the advent of GAN’s. Since, it has already evolved into a popular subtopic for research and seen a lot of progress, as can be seen on the figure below.
Figure 4. Garment transfer example. [7]
Garment transfer comes in a variety of flavours with slight variations on the inputs (e.g. from a single image of the clothing that should be transferred, to a collection of images, to an image of another person wearing the clothes that should be transferred), but in general the problem can be divided into 2 subproblems. First the algorithm should learn to separate a person’s body (pose, shape, skin color) from their clothing. Secondly, it should generate new images of the person wearing a new clothing item. The outputs also come in different forms and range from generating a single image, to generating a full 3D clothing transfer [8] where images of different viewpoints and poses can be generated.
While there is growing interest, garment transfer still faces major challenges in developing effective algorithms. These challenges are summarized below:
Because garment transfer research is still in its infancy and due to the lack of consensus on how to approach the problem, it can be hard to see the forest for the trees. Summarizing and organizing the different approaches and their advances along with an analysis and comparison of their advantages and drawbacks can add a lot of value to the field. Lowering the threshold for new researchers to enter the field and helping current researchers make connections between current approaches.
Research, analyse and summarize the current state of art for garment transfer techniques.
In a well trained GAN, the generator part of the network is able to generate new, photo-realistic examples of the type of images that the network was trained on. However, it’s hard to control what kind of image you want the GAN to generate, other than a random image that comes from the same distribution as the training set.
Let’s take for example the StyleGAN architecture from NVIDIA [9] that is behind the well-known website thispersondoesnotexist.com that generates photo-realistic faces of people that don’t exist.
Figure 5. Examples from thispersondoesnotexist.com.
Once fully trained, it’s easy to ask StyleGAN to generate a new realistic looking face, but there is no way to ask it to generate for example an image of a middle-aged asian man with long hair, except to keep generating images until you get a face with the desired properties.
This problem significantly reduces the usability of GAN’s in real-world applications.
There have already been various approaches to solve this problem with the most popular being conditional GAN’s and controllable generation. Conditional GAN’s are GAN’s that receive additional input during the training phase, which is the label of which class the image belongs to. Controllable generation happens after training and consists of adjusting the latent feature vector in an attempt to control the features of the output image.
While there is growing interest, conditional GAN’s still faces major challenges in developing effective algorithms. These challenges are summarized below:
With controllable generation, you try to tweak the latent feature vector of the generator in a way that the output changes in the desired direction. However, when different features have a high correlation in the data set that was used to train your GAN, it becomes difficult to control specific features without modifying the ones that are correlated to them. For example, if you want to add a beard to the picture of a woman this will likely also change other facial features like the nose and jawline in a way that it looks more masculin. This is not desirable if you only want to edit a single feature. Furthermore, this also applies to features that aren’t correlated in the training set since without special attention, the Z-space is learned to become entangled.
Research and create a GAN that has a disentangled Z-space in a particular subdomain such as medical imaging. The goal is to be able to influence single, relevant features of medical images such as for example the size of a tumor. Technologies that can be used are Python, Tensorflow, Keras and in general the Python data science and machine learning track.