Supercharge Your Embeddings: Unveiling `embzip` for Extreme Compression and Speed

Are your embedding sizes slowing you down? embzip is a Python package designed to drastically reduce the size of your embeddings without sacrificing too much accuracy. Learn how to leverage product quantization for efficient compression and lightning-fast performance in your NLP workflows.

What is `embzip` and Why Should You Care?

embzip leverages product quantization to compress embeddings. This means smaller file sizes, faster loading times, and reduced memory usage. Whether you're working with large language models or complex search algorithms, embzip can offer a significant performance boost.

Solve the embedding bloat problem: Radically shrink your embedding sizes.
Faster load times: Spend less time waiting and more time training.
Lower memory footprint: Run larger models on limited hardware.

Getting Started: Installation

Before you can unlock the power of embzip, you need to install it. Make sure you have Python, PyTorch, and FAISS installed.

pip install embzip

Compressing Your Embeddings: A Step-by-Step Guide

embzip makes compressing your embeddings incredibly simple. Here's how to quantize your embeddings using the embzip.quantize() function:

import torch
import embzip

# Create sample embeddings (replace with your actual embeddings)
embeddings = torch.randn(1000, 768)

# Quantize the embeddings with default parameters
quantized = embzip.quantize(embeddings)

# Adjust compression level with "m" parameter, quantize embeddings with M = 24
# Lower M = higher compression but lower accuracy
quantized_high_compression = embzip.quantize(embeddings, m=24) # More compressed

# Quantize embeddings with M = 96
# Higher M = lower compression but higher accuracy
quantized_high_quality = embzip.quantize(embeddings, m=96) # Less compressed

This example demonstrates the basic usage. The "m" parameter controls the compression level, allowing you to fine-tune the balance between size and accuracy. Let's see how to store and load data.

Saving and Loading: Persisting Your Compressed Embeddings

Once you've quantized your embeddings, you'll want to save them for later use. embzip provides convenient functions for saving and loading compressed embeddings:

import torch
import embzip

# Create sample embeddings
embeddings = torch.randn(1000, 768) # 1000 vectors with 768 dimensions

# Save embeddings to a file
embzip.save(embeddings, "embeddings.ezip")

# Save with custom M parameter
embzip.save(embeddings, "embeddings_higher_compression.ezip", m=24)

# Load embeddings from a file
loaded_embeddings = embzip.load("embeddings.ezip")

# Verify similarity (optional)
similarity = torch.nn.functional.cosine_similarity(
    embeddings.view(-1), loaded_embeddings.view(-1), dim=0
)
print(f"Cosine similarity: {similarity.item()}")

Be aware that randomly generated vectors won't compress nearly as well as real-world embeddings. This is where embzip truly shows utility.

Fine-Tuning for Optimal Results: Understanding the "m" Parameter

The m parameter is the key to controlling the compression level in embzip. It determines the number of sub-quantizers used in the product quantization process. A lower m results in higher compression but potentially lower accuracy, while a higher m offers lower compression and higher accuracy.

Lower m: Higher compression, lower accuracy.
Higher m: Lower compression, higher accuracy.
Default m: dimension // 16 (A good starting point for experimentation).

For example, with 768-dimension embeddings the default m is the result of 768 // 16 = 48.

Real-World Performance: Benchmarking `embzip`

The following benchmark results are for 1000 text embeddings embedded using all-MiniLM-L6-v2. These results show the impact of the "m" parameter on compression ratio and cosine similarity:

Model	n	m	Original Size (bytes)	Compressed Size (bytes)	Compression Ratio	Cosine Similarity
all-MiniLM-L6-v2	1000	64	1536000	56372	27.25	0.7882668375968933
all-MiniLM-L6-v2	1000	128	1536000	87920	17.47	0.9056953191757202
all-MiniLM-L6-v2	1000	384	1536000	216112	7.11	0.9953747391700745

As you can see, adjusting the m parameter allows you to tailor the compression to your specific needs. Choose the m which works best for the use case in question!

Conclusion: Unlock the Potential of Compressed Embeddings with `embzip`

embzip offers a powerful and efficient solution for compressing embeddings. By leveraging product quantization, you can dramatically reduce the size of your embeddings, leading to faster loading times, reduced memory usage, and improved overall performance. Start experimenting with embzip today and unlock the potential of compressed embeddings in your NLP projects. Using this package is the easiest way to quantize embeddings.

What is embzip and Why Should You Care?

Solve the embedding bloat problem: Radically shrink your embedding sizes.

Faster load times: Spend less time waiting and more time training.

Lower memory footprint: Run larger models on limited hardware.

Compressing Your Embeddings: A Step-by-Step Guide

embzip makes compressing your embeddings incredibly simple. Here's how to quantize your embeddings using the embzip.quantize() function:

This example demonstrates the basic usage. The "m" parameter controls the compression level, allowing you to fine-tune the balance between size and accuracy. Let's see how to store and load data.

Saving and Loading: Persisting Your Compressed Embeddings

Once you've quantized your embeddings, you'll want to save them for later use. embzip provides convenient functions for saving and loading compressed embeddings:

Be aware that randomly generated vectors won't compress nearly as well as real-world embeddings. This is where embzip truly shows utility.

Fine-Tuning for Optimal Results: Understanding the "m" Parameter

Lower m: Higher compression, lower accuracy.

Higher m: Lower compression, higher accuracy.

Default m: dimension // 16 (A good starting point for experimentation).

For example, with 768-dimension embeddings the default m is the result of 768 // 16 = 48.

Real-World Performance: Benchmarking embzip

The following benchmark results are for 1000 text embeddings embedded using all-MiniLM-L6-v2. These results show the impact of the "m" parameter on compression ratio and cosine similarity:

Model

Original Size (bytes)

Compressed Size (bytes)

Compression Ratio

Cosine Similarity

all-MiniLM-L6-v2

1000

1536000

56372

27.25

0.7882668375968933

all-MiniLM-L6-v2

1000

128

1536000

87920

17.47

0.9056953191757202

all-MiniLM-L6-v2

1000

384

1536000

216112

7.11

0.9953747391700745

As you can see, adjusting the m parameter allows you to tailor the compression to your specific needs. Choose the m which works best for the use case in question!

Conclusion: Unlock the Potential of Compressed Embeddings with embzip