Run LLMs Inside a PDF File? Yes, It's Possible (Here's How!)
Imagine running a Large Language Model (LLM) not on a server, but directly within a PDF file. This groundbreaking concept is now a reality, opening doors to exciting new possibilities. Let's explore how this innovative project makes the impossible possible and what it means for the future of LLMs.
What is llm.pdf and Why Should You Care?
llm.pdf is a proof-of-concept project demonstrating the capability to run an entire Large Language Model within a PDF file. This is achieved by compiling llama.cpp into asm.js using Emscripten, which is then executed within the PDF utilizing an old PDF JS injection technique. By embedding the LLM file (encoded in base64) directly into the PDF, the project achieves fully self-contained LLM inference. If you are a tech enthusiast or a developer interested in local LLM deployment, this project pushes boundaries of what's possible.
How Does llm.pdf Actually Work?
The genius behind llm.pdf lies in its clever combination of existing technologies:
- Emscripten: Compiles the llama.cpp code, which is designed for LLM operation, into asm.js
- PDF JS Injection: Leverages an older method of injecting JavaScript code into a PDF to execute the compiled LLM.
- Base64 Encoding: Embeds the complete LLM data as a base64 string, making the PDF entirely self-contained.
This ingenious method allows the PDF to function as an LLM execution environment, creating a portable and readily distributable inference framework.
Getting Started: Creating Your Own LLM-Powered PDF
Want to experiment? The scripts/generatePDF.py
file lets you create PDFs with compatible LLMs. Here's the quickest way to get started:
-
Navigate to the
scripts
directory:cd scripts
-
Run the generation script, specifying your model and output path:
Choosing the Right Model for Your LLM PDF
Selecting the right model is essential for optimal performance:
- GGUF Quantized Models are a Must: Only GGUF quantized models will work.
- Q8 Quantization Recommended: Q8 quantized models typically offer the best speed.
- Parameter Size Matters: Smaller models (like 135M parameter models, which take about 5 seconds per token) are preferable. Larger models will be significantly slower and potentially unusable.
Inspiration and Gratitude
This project draws inspiration from earlier work in related fields, like ading2210's DoomPDF and rahuldshetty's llm.js. The project is also thankful for Tiny LLMs such as EleutherAI's pythia models, TinyStories LLM by Ronen Eldan and Yuanzhi Li, and arnir0's Tiny-LLM models.
The Future of LLMs: Portable and Accessible?
llm.pdf represents a fascinating step toward making LLMs more portable and accessible. While still a proof-of-concept, it highlights the incredible potential of running sophisticated AI models in unexpected environments. As technology continues to advance, expect to see even more innovative approaches to deploying and utilizing Large Language Models in various novel applications.