Unlock the Power of 3D Scene Reconstruction with Large Spatial Model (LSM)

Ready to turn 2D images into stunning 3D scenes? This guide dives into NVlabs' Large Spatial Model (LSM), a groundbreaking technology for end-to-end unposed image to semantic 3D reconstruction. We'll break down installation, data preparation, training, and inference, making this powerful tool accessible to all. Prepare to be amazed by the possibilities of LSM.

Dive into LSM: Key Features and Capabilities

End-to-End Reconstruction: Transform unposed 2D images directly into semantic 3D models.
Feature Visualization: Gain insights into scene understanding through feature visualization.
RGB Color Rendering: Generate realistic and visually appealing 3D scene renderings.
Supports ScanNet and ScanNet++: Train and test your models with industry-standard datasets.

Get Started: A Step-by-Step Guide

1. Installation: Setting Up Your Environment

Let's get LSM up and running! Follow these steps carefully:

Clone the Repository:
```
git clone --recurse-submodules https://github.com/NVlabs/LSM.git
```
This command downloads the LSM codebase along with all necessary submodules.
Create a Conda Environment:
```
conda create -n lsm python=3.10
conda activate lsm
```
This creates an isolated environment to avoid dependency conflicts. Activating it ensures all subsequent installations apply to LSM.

Install PyTorch and Related Packages:

conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y
conda install pytorch-cluster pytorch-scatter pytorch-sparse -c pyg -y

This installs the core deep learning framework, PyTorch, along with essential CUDA-enabled libraries for GPU acceleration.

Install Other Python Dependencies:
```
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
```
This installs all other required Python packages from the requirements.txt file and installs flash-attn package.
Install PointTransformerV3:
```
cd submodules/PointTransformerV3/Pointcept/libs/pointops
python setup.py install
cd../../../../..
```
Compile and install PointTransformerV3, a crucial component for point cloud processing.

Install 3D Gaussian Splatting Modules:

pip install submodules/3d_gaussian_splatting/diff-gaussian-rasterization
pip install submodules/3d_gaussian_splatting/simple-knn

Integrate 3D Gaussian Splatting modules for high-quality rendering.

Install OpenAI CLIP:
```
pip install git+https://github.com/openai/CLIP.git
```
Incorporate OpenAI's CLIP model for semantic understanding.

Build croco Model:

cd submodules/dust3r/croco/models/curope
python setup.py build_ext --inplace
cd../../../../..

Compile the croco model, essential for depth estimation.

Download Pre-trained Models:

mkdir -p checkpoints/pretrained_models
wget -P checkpoints/pretrained_models https://download.europe.naverlabs.com/ComputerVision/DUSt3R/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth
gdown 1FTuHY1xPUkM-5gaDtMfgCl3D0gR89WV7 -O checkpoints/pretrained_models/demo_e200.ckpt
gdown 1q57nbRJpPhrdf1m7XZTkBfUIskpgnbri -O checkpoints/pretrained_models/checkpoint-final.pth

Download pre-trained weights for optimal performance (DUSt3R, LSEG demo model, and LSM final checkpoint).

2. Data Preparation: Readying Your Input

LSM thrives on well-prepared data. Here's what you need to know:

Datasets: ScanNet and ScanNet++ are supported for training the Large Spatial Model.
Agreements: Accessing these datasets requires signing agreements.
Detailed Instructions: Find comprehensive data preparation guidelines in data_process/data.md.

For testing, refer to data_process/data.md for specific details on the test dataset.

3. Training: Building Your 3D Reconstruction Powerhouse

Once your data is ready, initiate training with:

bash scripts/train.sh

Training results will be saved to the directory specified by --output_dir in scripts/train.sh. The default is checkpoints/output.

4. Inference: Bringing Your Images to Life in 3D

Time to reconstruct 3D scenes from your own images!

Prepare Your Images: Place two indoor scene images into a chosen directory.

Example:

demo_images/
└── indoor/
    ├── scene1/
    │   ├── image1.jpg
    │   └── image2.jpg
    └── scene2/
        ├── room1.png
        └── room2.png

Run Inference:
```
bash scripts/infer.sh
```
Adjust parameters like --file_list, --output_path, and --resolution in scripts/infer.sh. The default settings are generally recommended.

Inspiration and Acknowledgements

This project builds upon the shoulders of giants. The creators of LSM extend heartfelt gratitude to the authors of: Gaussian-Splatting, DUSt3R, LSeg, Point Transformer V3, pixelSplat, Feature 3DGS, ScanNet and ScanNet++

Citation

If you find LSM useful, please cite the following paper:

@misc { fan2024largespatialmodelendtoend,
 title = { Large Spatial Model: End-to-end Unposed Images to Semantic 3D},
 author = { Zhiwen Fan and Jian Zhang and Wenyan Cong and Peihao Wang and Renjie Li and Kairun Wen and Shijie Zhou and Achuta Kadambi and Zhangyang Wang and Danfei Xu and Boris Ivanovic and Marco Pavone and Yue Wang},
 year = { 2024},
 eprint = { 2410.18956},
 archivePrefix = { arXiv},
 primaryClass = { cs.CV},
 url = { https://arxiv.org/abs/2410.18956},
}