Paper2Code: Automatically Generate Code from Research Papers Using AI
Tired of manually implementing complex algorithms from research papers? Paper2Code is a groundbreaking AI system that automatically transforms scientific papers into functional code repositories. This innovative tool uses a multi-agent LLM system to streamline the coding process, saving you time and effort.
What is Paper2Code?
Paper2Code is a system designed to convert research papers, especially those in machine learning, into working code. It uses a three-stage pipeline, handled by specialized AI agents:
- Planning: The system analyzes the paper to understand the overall structure and requirements.
- Analysis: Deep dives into the paper's specifics, identifying key algorithms and data structures.
- Code Generation: Finally transforms the gathered information into a functional code repository.
This innovative approach allows for quick prototyping and experimentation with state-of-the-art research. It also helps to reduce coding errors by providing high-quality implementations.
Get Started with Paper2Code: A Quick Guide
Eager to try Paper2Code? Here's how to get started quickly:
Using OpenAI API
These steps will guide you through setting up Paper2Code with the OpenAI API, allowing you to leverage powerful language models for code generation:
- Install OpenAI:
pip install openai
- Set API Key:
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
- Run the Script:
cd scripts && bash run.sh
(This example uses the "Attention Is All You Need" paper).
The estimated cost when using OpenAI's o3-mini model is around $0.50–$0.70 per run.
Using Open Source Models with vLLM
Here’s how to harness the power of open-source models through vLLM to create code from research papers.
- Install vLLM:
pip install vllm
(Refer to the official vLLM repository for installation issues). - Run the Script:
cd scripts && bash run_llm.sh
(Defaults to the deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct model).
Understanding the Output Folder Structure
After running Paper2Code, the output will be organized as follows:
outputs
├── Transformer
│ ├── analyzing_artifacts
│ ├── coding_artifacts
│ └── planning_artifacts
└── Transformer_repo # Final output repository
This structure allows you to easily access intermediate steps and the final generated code.
Detailed Setup Instructions for Paper2Code
Follow these detailed instructions to properly set up Paper2Code on your system.
Environment Setup
First, ensure you have the necessary packages installed. You can install only what you need:
- For OpenAI API:
pip install openai
- For open-source models:
pip install vllm
Converting PDF to JSON
Paper2Code requires the paper to be in JSON format. You can convert your PDF using the s2orc-doc2json
repository:
-
Clone the Repository:
git clone https://github.com/allenai/s2orc-doc2json.git
-
Start the PDF Processing Service:
-
Convert PDF to JSON:
Running Paper2Code on Your Own Papers
To run Paper2Code on your own research papers, modify the environment variables in the scripts to point to your converted JSON files. Using Paper2Code makes translating research into code far easier.
Paper2Code Benchmark Datasets
The Paper2Code benchmark dataset, described in Section 4.1 of the paper, offers resources to further explore the system’s capabilities. You'll find the dataset details within the data/paper2code
directory.
Evaluating Repositories Generated by Paper2Code
Paper2Code supports model-based evaluation of generated repositories in both reference-based and reference-free settings:
Environment Setup
pip install tiktoken
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
Reference-Free Evaluation
target_repo_dir
should point to the generated repository.
Reference-Based Evaluation
target_repo_dir
is the generated repository, and gold_repo_dir
should point to the official repository (if available).
Example Output
========================================
🌟 Evaluation Summary 🌟
📄 Paper name: Transformer
🧪 Evaluation type: ref_based
📁 Target repo directory: ../outputs/Transformer_repo
📊 Evaluation result:
📈 Score: 4.5000
✅ Valid: 8/8
========================================
🌟 Usage Summary 🌟
[Evaluation] Transformer - ref_based
🛠️ Model: o3-mini
📥 Input tokens: 44318 (Cost: $0.04874980)
📦 Cached input tokens: 0 (Cost: $0.00000000)
📤 Output tokens: 26310 (Cost: $0.11576400)
💵 Current total cost: $0.16451380
🪙 Accumulated total cost so far: $0.16451380
============================================
Maximize Your Research Impact Today!
Paper2Code streamlines the conversion of research papers into usable code repositories. Integrating Paper2Code into your workflow significantly boosts productivity. Start leveraging AI to bring research to life with this automated code generation tool.