RamaLama: The Easiest Way to Run AI Models Locally (Securely!)

Tired of wrestling with complex configurations just to run AI models? The RamaLama project simplifies AI model management and execution using OCI containers, making AI accessible to everyone. Whether you're a seasoned developer or just starting out, RamaLama eliminates the headaches of environment setup, ensuring a smooth and secure AI experience.

Why RamaLama? Say Goodbye to Configuration Nightmares

RamaLama automates the entire process of running AI models locally. Here's why it's a game-changer:

Effortless Setup: RamaLama automatically detects your system's GPU support (or falls back to CPU), pulling the perfect container image.
Containerized AI: Runs AI models inside containers, isolating them from your host system and preventing conflicts.
Broad Model Support: RamaLama can execute various AI models from different registries thanks to its flexible transport system.
Simplified Management: Manage models via the command line using intuitive commands.

RamaLama makes working with AI so straightforward, it's almost boring. Focus on your projects, not troubleshooting installations!

Security First: Run AI Models with Confidence

Worried about security risks when running AI models? RamaLama prioritizes your safety. By default, it runs models inside rootless containers using Podman or Docker, creating a secure sandbox. Your data remains protected.

Here's a breakdown of RamaLama's security features:

Container Isolation: Prevents AI models from directly accessing your host system.
Read-Only Model Access: AI models are mounted as read-only, preventing modifications to host files.
Network Isolation: Uses --network=none to block outbound network access, ensuring no data leaks.
Automatic Cleanup: Temporary data is wiped out after each session with the --rm option.

Run your models securely with RamaLama's robust security footprint!

Installation: Get Started in Minutes

RamaLama offers multiple installation methods to suit your preferences:

Fedora

If you are on Fedora 40 or later, simply run:

sudo dnf install python3-ramalama

PyPi

Install RamaLama via PyPi with the following command:

pip install ramalama

Installation Script (Recommended for macOS)

curl -fsSL https://raw.githubusercontent.com/containers/ramalama/s/install.sh | bash

Note: For NVIDIA GPU users, see ramalama-cuda(7) to configure your host system correctly.

Running Models: A Quick Start Guide

Ready to run your first AI model? Here's how:

ramalama run instructlab/merlinite-7b-lab

RamaLama will automatically pull the necessary container image and start the model. This may take some time especially for the first run.

List Models

To inspect models in the local storage one can use the list command.

ramalama list

Pull Models

Need a specific model? Use the pull command:

ramalama pull granite3-moe

Serving Models (with Web UI!)

Serve multiple models simultaneously with the serve command:

ramalama serve --name mylama llama3

A web UI is enabled by default, allowing you to interact with your models in a browser. To disable:

ramalama serve --webui off llama3

Supported Model Registries: Choose Your Source

RamaLama's flexible transport system works with multiple AI model registries. It defaults to the Ollama registry, but you can easily switch to others.

To use Hugging Face, for example, set the environment variable:

export RAMALAMA_TRANSPORT=huggingface

You can also specify individual model transports using prefixes like huggingface://, oci://, or ollama://.

Streamline Model Selection with Shortnames

RamaLama supports shortnames.conf files, allowing you to define aliases for fully specified AI Models. This makes it easier to refer to models using shorter, more memorable names.

Example shortnames.conf:

[shortnames]
"tiny" = "ollama://tinyllama"
"granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"

Now you can run:

ramalama run tiny

Join the Community

RamaLama is an open-source project always improving. For questions, join RamaLama's Matrix channel. For bug reports or feature requests, use the GitHub Issues and PRs tracking system.

Try RamaLama today and experience the easiest, most secure way to run AI models locally!