Automate Code from Research Papers: A Guide to Paper2Code for Machine Learning

Tired of manually translating research papers into functional code? Paper2Code automates this process, saving you time and effort. This guide provides a deep dive into Paper2Code, a multi-agent LLM system designed to transform scientific papers into code repositories.

Paper2Code Overview

What is Paper2Code?

Paper2Code employs a three-stage pipeline – planning, analysis, and code generation – with specialized agents handling each stage. This innovative method demonstrably outperforms strong baselines, delivering faithful and high-quality implementations from research papers.

Jump Right In: Quick Start Guide

Want to see Paper2Code in action? These quick start instructions will get you up and running fast. Importantly, the following commands run an example using the "Attention Is All You Need" paper.

Using OpenAI API

If you choose to use the OpenAI API, be aware of the estimated cost. Using the o3-mini model will likely cost between $0.50 and $0.70.

Install the OpenAI package:
```
pip install openai
```

Set your OpenAI API key:

export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"

Run the script:
```
cd scripts
bash run.sh
```

Leveraging Open Source Models with vLLM

For those preferring open-source solutions, Paper2Code supports integration with vLLM. The default model is deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct. Note, If you encounter installation issues, consult the official vLLM repository.

Install vLLM:
```
pip install vllm
```
Execute the script:
```
cd scripts
bash run_llm.sh
```

Understanding the Output Folder Structure

After running Paper2Code, the generated output will be organized as follows:

outputs
├── Transformer
│   ├── analyzing_artifacts
│   ├── coding_artifacts
│   └── planning_artifacts
└── Transformer_repo # Final output repository

This structure includes intermediate artifacts from each stage of the process, as well as the final, generated code repository.

Detailed Setup Instructions: Configuring Your Environment

To harness the full power of Paper2Code, follow these detailed steps to set up your environment correctly, ensuring a smooth and efficient workflow.

Essential Environment Setup

Make sure you install the necessary packages. For the o3-mini version, you must have the latest openai package installed. Install only the packages you need:

OpenAI API: openai
Open-source models: vllm

Install the packages using pip:

pip install openai
pip install vllm

Converting PDFs to JSON Format

Paper2Code requires the input paper to be in JSON format. Use the s2orc-doc2json repository for this conversion. (Refer to the official repository for detailed configuration options.)

Clone the repository:

git clone https://github.com/allenai/s2orc-doc2json.git

Run the PDF processing service:

cd ./s2orc-doc2json/grobid-0.7.3
./gradlew run

Convert your PDF to JSON:

mkdir -p ./s2orc-doc2json/output_dir/paper_coder
python ./s2orc-doc2json/doc2json/grobid2json/process_pdf.py \
 -i ${PDF_PATH} \
 -t ./s2orc-doc2json/temp_dir/ \
 -o ./s2orc-doc2json/output_dir/paper_coder

Running Paper2Code with Your Own Papers

After setting up your environment, you can run Paper2Code on your own research papers. Remember to modify the environment variables accordingly.

Paper2Code Benchmark Datasets: Evaluating Performance

Explore the data/paper2code directory for a description of the Paper2Code benchmark dataset. Section 4.1, "Paper2Code Benchmark" in the paper, provides further details.

Model-Based Evaluation: Assessing Repository Quality

Paper2Code utilizes a model-based approach to evaluate the quality of generated repositories. This includes both reference-based and reference-free settings. Section 4.3.1 of the paper, "Paper2Code Benchmark", elaborates on the evaluation process.

The model critiques key implementation components, assigns severity levels, and generates a 1-5 correctness score (averaged over 8 samples using o3-mini-high). Modify the paths and arguments to evaluate different repositories.

Setting Up the Evaluation Environment

Before running the evaluation scripts, install the tiktoken library and set your OpenAI API key (if applicable).

pip install tiktoken
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"

Reference-Free Evaluation

This evaluation method assesses the generated repository without comparing it to a gold reference.

cd codes/
python eval.py \
    --paper_name Transformer \
    --pdf_json_path ../examples/Transformer_cleaned.json \
    --data_dir ../data \
    --output_dir ../outputs/Transformer \
    --target_repo_dir ../outputs/Transformer_repo \
    --eval_result_dir ../results \
    --eval_type ref_free \
    --generated_n 8 \
    --papercoder

Reference-Based Evaluation

This method compares the generated repository to an official, author-released repository.

cd codes/
python eval.py \
    --paper_name Transformer \
    --pdf_json_path ../examples/Transformer_cleaned.json \
    --data_dir ../data \
    --output_dir ../outputs/Transformer \
    --target_repo_dir ../outputs/Transformer_repo \
    --gold_repo_dir ../examples/Transformer_gold_repo \
    --eval_result_dir ../results \
    --eval_type ref_based \
    --generated_n 8 \
    --papercoder

Example Evaluation Output

The evaluation script provides a summary of the results, including a correctness score and usage statistics.

========================================
🌟 Evaluation Summary 🌟
📄 Paper name: Transformer
🧪 Evaluation type: ref_based
📁 Target repo directory: ../outputs/Transformer_repo
📊 Evaluation result:
 📈 Score: 4.5000
 ✅ Valid: 8/8
========================================
🌟 Usage Summary 🌟
[Evaluation] Transformer - ref_based
🛠️ Model: o3-mini
📥 Input tokens: 44318 (Cost: $0.04874980)
📦 Cached input tokens: 0 (Cost: $0.00000000)
📤 Output tokens: 26310 (Cost: $0.11576400)
💵 Current total cost: $0.16451380
🪙 Accumulated total cost so far: $0.16451380
========================================