RamaLama: The Easiest Way to Run AI Models Locally (Securely!)
Tired of wrestling with complex configurations just to run AI models? The RamaLama project simplifies AI model management and execution using OCI containers, making AI accessible to everyone. Whether you're a seasoned developer or just starting out, RamaLama eliminates the headaches of environment setup, ensuring a smooth and secure AI experience.
Why RamaLama? Say Goodbye to Configuration Nightmares
RamaLama automates the entire process of running AI models locally. Here's why it's a game-changer:
- Effortless Setup: RamaLama automatically detects your system's GPU support (or falls back to CPU), pulling the perfect container image.
- Containerized AI: Runs AI models inside containers, isolating them from your host system and preventing conflicts.
- Broad Model Support: RamaLama can execute various AI models from different registries thanks to its flexible transport system.
- Simplified Management: Manage models via the command line using intuitive commands.
RamaLama makes working with AI so straightforward, it's almost boring. Focus on your projects, not troubleshooting installations!
Security First: Run AI Models with Confidence
Worried about security risks when running AI models? RamaLama prioritizes your safety. By default, it runs models inside rootless containers using Podman or Docker, creating a secure sandbox. Your data remains protected.
Here's a breakdown of RamaLama's security features:
- Container Isolation: Prevents AI models from directly accessing your host system.
- Read-Only Model Access: AI models are mounted as read-only, preventing modifications to host files.
- Network Isolation: Uses
--network=none
to block outbound network access, ensuring no data leaks. - Automatic Cleanup: Temporary data is wiped out after each session with the
--rm
option.
Run your models securely with RamaLama's robust security footprint!
Installation: Get Started in Minutes
RamaLama offers multiple installation methods to suit your preferences:
Fedora
If you are on Fedora 40 or later, simply run:
PyPi
Install RamaLama via PyPi with the following command:
Installation Script (Recommended for macOS)
Note: For NVIDIA GPU users, see ramalama-cuda(7) to configure your host system correctly.
Running Models: A Quick Start Guide
Ready to run your first AI model? Here's how:
RamaLama will automatically pull the necessary container image and start the model. This may take some time especially for the first run.
List Models
To inspect models in the local storage one can use the list command.
Pull Models
Need a specific model? Use the pull
command:
Serving Models (with Web UI!)
Serve multiple models simultaneously with the serve
command:
A web UI is enabled by default, allowing you to interact with your models in a browser. To disable:
Supported Model Registries: Choose Your Source
RamaLama's flexible transport system works with multiple AI model registries. It defaults to the Ollama registry, but you can easily switch to others.
To use Hugging Face, for example, set the environment variable:
You can also specify individual model transports using prefixes like huggingface://
, oci://
, or ollama://
.
Streamline Model Selection with Shortnames
RamaLama supports shortnames.conf
files, allowing you to define aliases for fully specified AI Models. This makes it easier to refer to models using shorter, more memorable names.
Example shortnames.conf
:
[shortnames]
"tiny" = "ollama://tinyllama"
"granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
Now you can run:
Join the Community
RamaLama is an open-source project always improving. For questions, join RamaLama's Matrix channel. For bug reports or feature requests, use the GitHub Issues and PRs tracking system.
Try RamaLama today and experience the easiest, most secure way to run AI models locally!