https://snufk.in/blog/silly-compression.html

Blog

Silly Image Compression Idea

Wed 06 April 2022
3 min read

An idea that I had a few days ago on twitter:

    image compression algorithm that resizes to the smallest size an
    AI can still classify the image correctly https://twitter.com/
    sn_fk_n/status/1510562695808000001

So I made a quick demo in python.

Technical Stuff

I started by implementing a quick class that defines an 'evaluator',
or something that classifies an image. Essentially just a wrapper
around a pre-trained model from torchvision.

import torchvision.models as models
from torchvision import datasets, transforms as T


class Evaluator:
    def __init__(self):
        self.net = models.resnet152(pretrained=True)

        # transforms from torchvision docs
        self.transform = T.Compose(
            [
                T.Resize(256),
                T.CenterCrop(224),
                T.ToTensor(),
                T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
            ]
        )

        # set model to eval
        self.net.eval()

    def __call__(self, image):
        """
        Returns predicted class for image
        """
        transformed_image = self.transform(image).unsqueeze(0)
        output = self.net(transformed_image)
        _, predicted = output.max(1)

        return predicted

This is fairly simple, load a pre-trained model (Resnet152, here for
no particular reason.) the transforms from the documentation and set
the mode to eval, since we aren't training. Then a quick method that
applies those transforms and takes the predicted class as the max.

This maybe isn't the smartest way to approach the predicted class as
we will see later, but this isn't a serious problem! It'll do for
now.

The other component is the 'compressor'. Used here extremely loosely
(is resizing 'compression', in some respects maybe), this implements
an iterative method that simply runs an iterative 'compression'
method and checks if the evaluator's prediction has changed.

from abc import ABC, abstractmethod

class Compressor(ABC):
    def __init__(self, evaluator):
        self.evaluator = evaluator

    @abstractmethod
    def iteration(self, image):
        pass

    def __call__(self, image):
        initial_prediction = self.evaluator(image)

        iteration = previous_iteration = image
        iter_prediction = initial_prediction

        while iter_prediction == initial_prediction:
            previous_iteration = iteration
            iteration = self.iteration(previous_iteration)
            iter_prediction = self.evaluator(iteration)

        return previous_iteration

Using compressor as an abstract class means it's easy to implement
just the iteration.

from PIL import Image
from random import randint
from io import BytesIO
from compressor import Compressor

class ResizeCompressor(Compressor):
    def iteration(self, image):
        """
        Returns a resized image that is (at most)
        10 pixels smaller in both directions
        """
        current_size = image.size
        new_size = (current_size[0] - 10, current_size[1] - 10)
        # thumbnail preserves aspect ratio
        image.thumbnail(new_size)
        return image


class JPEGCompressor(Compressor):
    def iteration(self, image):
        buffer = BytesIO()
        image.save(buffer, "JPEG", quality=randint(0, 100))
        compressed = Image.open(buffer)
        return compressed

In the first case, are we really 'compressing'? Well we are 'encoding
information using fewer bits that the original', and it certainly is
lossy.

resize example

I have also implemented a jpg compressor which maybe is more like
what compression actually is. This saves the image with a random
quality level, which whilst stochastic, does eventually accumulate
the classic overly jpegged look. This means that the final image may
look different but does produce cool effects.

jpeg example

Philosophical Questions

This opens a question around compression in general. Whilst formats
like JPEG are designed around human perception. What would image
compression look like if designed around other things perceptions?

In a world where data is processed by things that aren't human, why
should we settle for human-friendly representations of these data? In
the same way that JSON and XML might not be the greatest way to move
data between services when there's no requirement for human
intervention (why use text-based formats when a binary format might
make more sense?). Why do we compress images for humans?

Surely there are other formats that similarly get the semantics of an
image (or other piece of media as it may be) across without being
necessarily human-friendly.

Conclusion

Whilst this scheme for 'compression' is obviously silly, and
definitely more artistic than practical, we might consider it as
maybe a decent tool for thinking about other ways that we might want
to encode semantic information?

In the future, changes could be made to consider the hierarchy of the
classes, or operate on thresholds of the changing outputs of the
neural network. Currently it changes only when the max class changes
which leads to termination where it changes label to something with a
very similar semantic meaning (e.g. 'Tiger Cat' to 'Egyptian Cat').

Back to Frontpage