Paper2Code: Generate Code Repositories from Machine Learning Papers via LLMs

Stop manually translating research papers into functional code. Paper2Code automates the process, turning complex scientific documents into ready-to-use code repositories. This article dives into how this innovative tool works, its benefits, and how you can start using it today. Automate your research workflow and bring cutting-edge algorithms to life faster than ever!

Paper2Code Overview

What is Paper2Code and How Does it Work?

Paper2Code is a multi-agent, Large Language Model (LLM) system designed to convert research papers into functional code repositories. It streamlines the process of implementing machine learning algorithms described in academic literature, saving you countless hours of manual coding and debugging. Forget struggling with cryptic descriptions. Paper2Code makes research accessible and actionable.

Planning Stage: The system first analyzes the paper to understand the overall structure and goals.
Analysis Stage: Specialized agents dissect the paper's sections, identifying key algorithms and data structures.
Code Generation Stage: The system crafts a functional code repository based on the analysis.

Why Use Paper2Code? Key Benefits

Saves Time & Resources: Automate code generation, freeing up time for research and experimentation.
Reduces Errors: Minimize human error in the translation from paper to code.
Improves Reproducibility: Ensure faithful implementations of research findings.
Increases Accessibility: Makes complex algorithms readily available for practical application.

Quick Start Guide: Running Paper2Code

Ready to try Paper2Code? These steps will get you started quickly, using the "Attention Is All You Need" paper as an example. Here's how to get started using the OpenAI API:

Install OpenAI: pip install openai
Set API Key: export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
Run the Script: cd scripts; bash run.sh

If you prefer open-source models with vLLM:

Install vLLM: pip install vllm (Refer to the official vLLM repository if you face issues.)
Run the Script: cd scripts; bash run_llm.sh (The default model is deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct.)

Detailed Setup Instructions: Configure Your Environment

Follow these detailed steps to set up your environment for Paper2Code. Be sure your environment is correctly configured before diving in.

Setting Up Your Environment

Install the necessary packages:
- For OpenAI API: pip install openai
- For open-source models: pip install vllm

Converting PDF to JSON

Paper2Code requires the research paper to be in JSON format. Use the s2orc-doc2json repository to convert your PDF.

Clone the Repository: git clone https://github.com/allenai/s2orc-doc2json.git
Run the PDF Processing Service:
- cd ./s2orc-doc2json/grobid-0.7.3
- ./gradlew run
Convert PDF to JSON:
- mkdir -p ./s2orc-doc2json/output_dir/paper_coder
- python ./s2orc-doc2json/doc2json/grobid2json/process_pdf.py -i ${PDF_PATH} -t ./s2orc-doc2json/temp_dir/ -o ./s2orc-doc2json/output_dir/paper_coder

Once converted, you can run PaperCoder with your own papers by modifying the environment variables in the scripts.

Exploring the Paper2Code Benchmark Datasets

Paper2Code includes benchmark datasets for evaluation. Find detailed descriptions in the data/paper2code directory. Refer to Section 4.1 of the paper for more information on the Paper2Code Benchmark.

Evaluating Generated Repositories: Model-Based Approach

Paper2Code uses a model-based approach to evaluate the quality of generated repositories. This includes both reference-based and reference-free settings. The model assesses key implementation components, assigns severity levels, and generates a correctness score. Ensure the generated code meets high-quality standards.

How to Evaluate

Install tiktoken: pip install tiktoken
Set API Key: export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"

Reference-Free Evaluation

cd codes/
python eval.py \
    --paper_name Transformer \
    --pdf_json_path ../examples/Transformer_cleaned.json \
    --data_dir ../data \
    --output_dir ../outputs/Transformer \
    --target_repo_dir ../outputs/Transformer_repo \
    --eval_result_dir ../results \
    --eval_type ref_free \
    --generated_n 8 \
    --papercoder

Reference-Based Evaluation

cd codes/
python eval.py \
    --paper_name Transformer \
    --pdf_json_path ../examples/Transformer_cleaned.json \
    --data_dir ../data \
    --output_dir ../outputs/Transformer \
    --target_repo_dir ../outputs/Transformer_repo \
    --gold_repo_dir ../examples/Transformer_gold_repo \
    --eval_result_dir ../results \
    --eval_type ref_based \
    --generated_n 8 \
    --papercoder

Be sure to adjust paths and arguments to fit your evaluation needs.

Understanding Evaluation Output

The evaluation process provides a comprehensive summary of the generated code's performance. Key metrics include the correctness score, validity, and cost analysis.

Here's an example output:

========================================
🌟 Evaluation Summary 🌟
📄 Paper name: Transformer
🧪 Evaluation type: ref_based
📁 Target repo directory: ../outputs/Transformer_repo
📊 Evaluation result:
 📈 Score: 4.5000
 ✅ Valid: 8/8
========================================
🌟 Usage Summary 🌟
[Evaluation] Transformer - ref_based
🛠️ Model: o3-mini
📥 Input tokens: 44318 (Cost: $0.04874980)
📦 Cached input tokens: 0 (Cost: $0.00000000)
📤 Output tokens: 26310 (Cost: $0.11576400)
💵 Current total cost: $0.16451380
🪙 Accumulated total cost so far: $0.16451380
============================================

This output tells you the paper name, evaluation type, target repository directory, evaluation score, and cost analysis. It gives insights into the efficiency and effectiveness of the code generation process.