Paper2Code: Automatically Generate Code from Research Papers Using AI

Tired of manually implementing complex algorithms from research papers? Paper2Code is a groundbreaking AI system that automatically transforms scientific papers into functional code repositories. This innovative tool uses a multi-agent LLM system to streamline the coding process, saving you time and effort.

Paper2Code Overview

What is Paper2Code?

Paper2Code is a system designed to convert research papers, especially those in machine learning, into working code. It uses a three-stage pipeline, handled by specialized AI agents:

Planning: The system analyzes the paper to understand the overall structure and requirements.
Analysis: Deep dives into the paper's specifics, identifying key algorithms and data structures.
Code Generation: Finally transforms the gathered information into a functional code repository.

This innovative approach allows for quick prototyping and experimentation with state-of-the-art research. It also helps to reduce coding errors by providing high-quality implementations.

Get Started with Paper2Code: A Quick Guide

Eager to try Paper2Code? Here's how to get started quickly:

Using OpenAI API

These steps will guide you through setting up Paper2Code with the OpenAI API, allowing you to leverage powerful language models for code generation:

Install OpenAI: pip install openai
Set API Key: export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
Run the Script: cd scripts && bash run.sh (This example uses the "Attention Is All You Need" paper).

The estimated cost when using OpenAI's o3-mini model is around $0.50–$0.70 per run.

Using Open Source Models with vLLM

Here’s how to harness the power of open-source models through vLLM to create code from research papers.

Install vLLM: pip install vllm (Refer to the official vLLM repository for installation issues).
Run the Script: cd scripts && bash run_llm.sh (Defaults to the deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct model).

Understanding the Output Folder Structure

After running Paper2Code, the output will be organized as follows:

outputs
├── Transformer
│   ├── analyzing_artifacts
│   ├── coding_artifacts
│   └── planning_artifacts
└── Transformer_repo # Final output repository

This structure allows you to easily access intermediate steps and the final generated code.

Detailed Setup Instructions for Paper2Code

Follow these detailed instructions to properly set up Paper2Code on your system.

Environment Setup

First, ensure you have the necessary packages installed. You can install only what you need:

For OpenAI API: pip install openai
For open-source models: pip install vllm

Converting PDF to JSON

Paper2Code requires the paper to be in JSON format. You can convert your PDF using the s2orc-doc2json repository:

Clone the Repository: git clone https://github.com/allenai/s2orc-doc2json.git

Start the PDF Processing Service:

cd ./s2orc-doc2json/grobid-0.7.3
./gradlew run

Convert PDF to JSON:

mkdir -p ./s2orc-doc2json/output_dir/paper_coder
python ./s2orc-doc2json/doc2json/grobid2json/process_pdf.py \
 -i ${PDF_PATH} \
 -t ./s2orc-doc2json/temp_dir/ \
 -o ./s2orc-doc2json/output_dir/paper_coder

Running Paper2Code on Your Own Papers

To run Paper2Code on your own research papers, modify the environment variables in the scripts to point to your converted JSON files. Using Paper2Code makes translating research into code far easier.

Paper2Code Benchmark Datasets

The Paper2Code benchmark dataset, described in Section 4.1 of the paper, offers resources to further explore the system’s capabilities. You'll find the dataset details within the data/paper2code directory.

Evaluating Repositories Generated by Paper2Code

Paper2Code supports model-based evaluation of generated repositories in both reference-based and reference-free settings:

Environment Setup

pip install tiktoken
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"

Reference-Free Evaluation

cd codes/
python eval.py \
 --paper_name Transformer \
 --pdf_json_path ../examples/Transformer_cleaned.json \
 --data_dir ../data \
 --output_dir ../outputs/Transformer \
 --target_repo_dir ../outputs/Transformer_repo \
 --eval_result_dir ../results \
 --eval_type ref_free \
 --generated_n 8 \
 --papercoder

target_repo_dir should point to the generated repository.

Reference-Based Evaluation

cd codes/
python eval.py \
 --paper_name Transformer \
 --pdf_json_path ../examples/Transformer_cleaned.json \
 --data_dir ../data \
 --output_dir ../outputs/Transformer \
 --target_repo_dir ../outputs/Transformer_repo \
 --gold_repo_dir ../examples/Transformer_gold_repo \
 --eval_result_dir ../results \
 --eval_type ref_based \
 --generated_n 8 \
 --papercoder

target_repo_dir is the generated repository, and gold_repo_dir should point to the official repository (if available).

Example Output

========================================
🌟 Evaluation Summary 🌟
📄 Paper name: Transformer
🧪 Evaluation type: ref_based
📁 Target repo directory: ../outputs/Transformer_repo
📊 Evaluation result:
 📈 Score: 4.5000
 ✅ Valid: 8/8
========================================
🌟 Usage Summary 🌟
[Evaluation] Transformer - ref_based
🛠️ Model: o3-mini
📥 Input tokens: 44318 (Cost: $0.04874980)
📦 Cached input tokens: 0 (Cost: $0.00000000)
📤 Output tokens: 26310 (Cost: $0.11576400)
💵 Current total cost: $0.16451380
🪙 Accumulated total cost so far: $0.16451380
============================================

Maximize Your Research Impact Today!

Paper2Code streamlines the conversion of research papers into usable code repositories. Integrating Paper2Code into your workflow significantly boosts productivity. Start leveraging AI to bring research to life with this automated code generation tool.

Paper2Code: Automatically Generate Code from Research Papers Using AI

Paper2Code Overview

What is Paper2Code?

Paper2Code is a system designed to convert research papers, especially those in machine learning, into working code. It uses a three-stage pipeline, handled by specialized AI agents:

Planning: The system analyzes the paper to understand the overall structure and requirements.
Analysis: Deep dives into the paper's specifics, identifying key algorithms and data structures.
Code Generation: Finally transforms the gathered information into a functional code repository.

This innovative approach allows for quick prototyping and experimentation with state-of-the-art research. It also helps to reduce coding errors by providing high-quality implementations.

Get Started with Paper2Code: A Quick Guide

Eager to try Paper2Code? Here's how to get started quickly:

Using OpenAI API

These steps will guide you through setting up Paper2Code with the OpenAI API, allowing you to leverage powerful language models for code generation:

Install OpenAI: pip install openai
Set API Key: export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
Run the Script: cd scripts && bash run.sh (This example uses the "Attention Is All You Need" paper).

The estimated cost when using OpenAI's o3-mini model is around $0.50–$0.70 per run.

Using Open Source Models with vLLM

Here’s how to harness the power of open-source models through vLLM to create code from research papers.

Install vLLM: pip install vllm (Refer to the official vLLM repository for installation issues).
Run the Script: cd scripts && bash run_llm.sh (Defaults to the deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct model).

Understanding the Output Folder Structure

After running Paper2Code, the output will be organized as follows:

outputs
├── Transformer
│   ├── analyzing_artifacts
│   ├── coding_artifacts
│   └── planning_artifacts
└── Transformer_repo # Final output repository

This structure allows you to easily access intermediate steps and the final generated code.

Detailed Setup Instructions for Paper2Code

Follow these detailed instructions to properly set up Paper2Code on your system.

Environment Setup

First, ensure you have the necessary packages installed. You can install only what you need:

For OpenAI API: pip install openai
For open-source models: pip install vllm

Converting PDF to JSON

Paper2Code requires the paper to be in JSON format. You can convert your PDF using the s2orc-doc2json repository:

Clone the Repository: git clone https://github.com/allenai/s2orc-doc2json.git

Start the PDF Processing Service:

cd ./s2orc-doc2json/grobid-0.7.3
./gradlew run

Convert PDF to JSON:

mkdir -p ./s2orc-doc2json/output_dir/paper_coder
python ./s2orc-doc2json/doc2json/grobid2json/process_pdf.py \
 -i ${PDF_PATH} \
 -t ./s2orc-doc2json/temp_dir/ \
 -o ./s2orc-doc2json/output_dir/paper_coder

Running Paper2Code on Your Own Papers

Paper2Code Benchmark Datasets

Evaluating Repositories Generated by Paper2Code

Paper2Code supports model-based evaluation of generated repositories in both reference-based and reference-free settings:

Environment Setup

pip install tiktoken
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"

Reference-Free Evaluation

cd codes/
python eval.py \
 --paper_name Transformer \
 --pdf_json_path ../examples/Transformer_cleaned.json \
 --data_dir ../data \
 --output_dir ../outputs/Transformer \
 --target_repo_dir ../outputs/Transformer_repo \
 --eval_result_dir ../results \
 --eval_type ref_free \
 --generated_n 8 \
 --papercoder

target_repo_dir should point to the generated repository.

Reference-Based Evaluation

cd codes/
python eval.py \
 --paper_name Transformer \
 --pdf_json_path ../examples/Transformer_cleaned.json \
 --data_dir ../data \
 --output_dir ../outputs/Transformer \
 --target_repo_dir ../outputs/Transformer_repo \
 --gold_repo_dir ../examples/Transformer_gold_repo \
 --eval_result_dir ../results \
 --eval_type ref_based \
 --generated_n 8 \
 --papercoder

target_repo_dir is the generated repository, and gold_repo_dir should point to the official repository (if available).

Example Output

========================================
🌟 Evaluation Summary 🌟
📄 Paper name: Transformer
🧪 Evaluation type: ref_based
📁 Target repo directory: ../outputs/Transformer_repo
📊 Evaluation result:
 📈 Score: 4.5000
 ✅ Valid: 8/8
========================================
🌟 Usage Summary 🌟
[Evaluation] Transformer - ref_based
🛠️ Model: o3-mini
📥 Input tokens: 44318 (Cost: $0.04874980)
📦 Cached input tokens: 0 (Cost: $0.00000000)
📤 Output tokens: 26310 (Cost: $0.11576400)
💵 Current total cost: $0.16451380
🪙 Accumulated total cost so far: $0.16451380
============================================