Reinforcement Learning with LLMs: A Guide to Nous Research's Atropos Framework
Are you looking for a powerful framework to enhance language models using reinforcement learning? Look no further than Atropos, Nous Research's innovative LLM RL Gym. Inspired by the Greek Fate who determined the end of life, Atropos guides LLMs to their full potential through RL.
What is Atropos?
Atropos is a framework designed for Language Model Reinforcement Learning Environments. It facilitates collecting and evaluating LLM trajectories across various environments.
Key Features of the Atropos Framework
Atropos offers robust and scalable features for Reinforcement Learning Environments with LLMs, making it a go-to choice for researchers and developers.
- Multi-Turn & Asynchronous RL: Supports intricate, multi-turn interactions effectively, decoupling environment advancement from policy updates.
- Inference Agnostic: Compatible with standard inference APIs like OpenAI, vLLM, and SGLang. This allows for smooth transitions of LLM providers.
- Trainer Independent: A standardized training interface promotes experimentation with diverse RL algorithms without needing extensive code alterations.
- Scalable & Decentralized: Facilitates easy scaling by deploying numerous environment instances.
- Diverse Environment Integration: Atropos manages multiple environment types concurrently for heterogeneous, multi-modal training.
The primary objective is to establish a versatile, scalable, and standardized platform, accelerating research in LLM-based RL across diverse settings.
Tool Calling & Financial Prediction: Key Experiments and Results
Atropos has demonstrated remarkable advancements in specific areas:
- Tool Calling Environment: Models: DeepHermes-ToolCalling-Specialist-Atropos, Environment: tool_calling_server.py
- Financial Fundamentals Prediction Environment: Models: DeepHermes-Financial-Fundamentals-Prediction-Specialist-Atropos, Environment: fundamental_prediction_environment.py
- RLAIF Experiment Artifacts: DeepHermes Egregore v1 and v2 8B, DeepHermes Ascension Maze 8B, Environment: rlaif_server.py
How to Install Atropos
-
Ensure you have Python 3.10 or later installed.
-
Use pip to install Atropos:
For development, examples, or all features, use:
Quick Start with Atropos
Create a New Environment
Refer to the Base Class Documentation to grasp the fundamental concepts. Explore existing environments in the environments/
directory for practical examples.
Quick Start - Run an Example Environment
Modify the config_init
section by pointing it to a running VLLM or SGLang inference. Afterward complete the following steps:
-
Start the API server and run the GSM8K environment:
-
- (Optional) Start getting rollouts and not use a trainer, see API Docs to explore the REST API interface.
-
Follow the training example guide for detailed instructions to train your model.
Useful Debugging Tools
The trajectory-handler includes debugging tools, allowing environment developers to test and understand their environments locally.
- Flexible Model Provider Support: Supports model providers compliant with the OpenAI API standard.
- View Run: Launch a Gradio UI to inspect rollouts.
- Offline Data Generation: Use
atropos-sft-gen
andatropos-dpo-gen
to convert rollouts for SFT or DPO.
Upcoming Atropos Hackathon
Mark your calendars! On May 18th, 2025, join an exciting hackathon in San Francisco focused on LLM RL Environments. Stay tuned for details by following @NousResearch on Twitter.
Licensing and Contribution
Atropos operates under the MIT license. Furthermore, the team welcomes and appreciates community involvement! Check out the contributing guide for