Python - Data Compression

Hello there, future Python wizards! Today, we're diving into the fascinating world of data compression. As your friendly neighborhood computer teacher, I'm excited to guide you through this journey, even if you've never written a single line of code before. Don't worry; we'll start from the very basics and work our way up. So, grab your virtual wands (keyboards), and let's make some data magic happen!

Python - Data Compression

Introduction to Data Compression

What is Data Compression?

Imagine you're trying to fit all your clothes into a suitcase for a vacation. Data compression is like folding those clothes really neatly so you can fit more in the same space. In the digital world, it's about making files smaller without losing important information.

Why is Data Compression Important?

  1. Saves storage space
  2. Reduces transmission time
  3. Lowers bandwidth usage
  4. Improves system performance

Now that we know why it's important, let's roll up our sleeves and get into some actual Python code!

Basic String Compression

Let's start with a simple example of compressing a string. We'll use a technique called run-length encoding.

def compress_string(s):
    compressed = ""
    count = 1
    for i in range(1, len(s)):
        if s[i] == s[i-1]:
            count += 1
        else:
            compressed += s[i-1] + str(count)
            count = 1
    compressed += s[-1] + str(count)
    return compressed

# Let's try it out
original = "aaabbbccccddeeee"
compressed = compress_string(original)
print(f"Original: {original}")
print(f"Compressed: {compressed}")

When you run this code, you'll see:

Original: aaabbbccccddeeee
Compressed: a3b3c4d2e4

What's happening here? We're counting consecutive characters and replacing them with the character followed by the count. Cool, right?

File Compression with zlib

Now, let's level up and compress actual files using the zlib module. Don't worry if you don't know what a module is yet – think of it as a toolbox of pre-written code we can use.

import zlib

def compress_file(input_file, output_file):
    with open(input_file, 'rb') as file_in:
        data = file_in.read()

    compressed_data = zlib.compress(data, level=9)

    with open(output_file, 'wb') as file_out:
        file_out.write(compressed_data)

    print(f"Original size: {len(data)} bytes")
    print(f"Compressed size: {len(compressed_data)} bytes")
    print(f"Compression ratio: {len(compressed_data) / len(data):.2%}")

# Let's compress a file
compress_file('example.txt', 'example.txt.gz')

This script reads a file, compresses its contents, and saves the compressed data to a new file. The compression level (9) is the highest, meaning maximum compression.

Decompression: Bringing Your Data Back

Of course, compressed data isn't much use if we can't decompress it. Let's write a function to do just that:

def decompress_file(input_file, output_file):
    with open(input_file, 'rb') as file_in:
        compressed_data = file_in.read()

    decompressed_data = zlib.decompress(compressed_data)

    with open(output_file, 'wb') as file_out:
        file_out.write(decompressed_data)

    print(f"Decompressed size: {len(decompressed_data)} bytes")

# Let's decompress our file
decompress_file('example.txt.gz', 'example_decompressed.txt')

This function does the opposite of our compression function. It reads the compressed file, decompresses the data, and writes it to a new file.

Comparing Compression Methods

Now that we've seen a couple of compression techniques, let's compare them. We'll use a table to make it easy to see the differences:

Method Pros Cons Best Used For
Run-length encoding Simple to implement Only effective for repeated characters Bitmap images, simple patterns
zlib High compression ratio, widely supported Slower than simpler methods General-purpose compression, network protocols

Advanced Topic: Image Compression

For those of you feeling adventurous, let's take a quick peek at image compression using the Pillow library. Don't worry if this seems complex – it's just to give you a taste of what's possible!

from PIL import Image

def compress_image(input_file, output_file, quality):
    with Image.open(input_file) as img:
        img.save(output_file, optimize=True, quality=quality)

# Let's compress an image
compress_image('example.jpg', 'compressed_example.jpg', 50)

This script opens an image, compresses it by reducing its quality, and saves it as a new file. The quality parameter ranges from 1 (worst) to 95 (best).

Conclusion

Congratulations! You've just taken your first steps into the world of data compression with Python. We've covered basic string compression, file compression and decompression, and even touched on image compression. Remember, compression is all about finding patterns and representing them more efficiently.

As you continue your Python journey, you'll discover even more powerful compression techniques. Who knows? Maybe you'll invent the next breakthrough in data compression! Until then, keep coding, stay curious, and don't forget to have fun along the way.

Credits: Image by storyset