Unlock the Power of 3D Scene Reconstruction with LSM: A Step-by-Step Guide

Want to create stunning 3D models from simple 2D images? Dive into the world of Large Spatial Models (LSM), a cutting-edge technology that brings indoor scenes to life. This guide will walk you through the installation, data preparation, and usage of LSM, empowering you to generate immersive 3D experiences. Get ready to transform ordinary images into extraordinary 3D renderings! Explore feature visualization and RGB color rendering with LSM.

Table of Contents: Navigate Your 3D Journey

Updates: Stay current with the latest LSM enhancements.
Installation: Set up your environment for 3D magic.
Data Preparation: Get your images ready for the transformation.
Training: Customize LSM for optimal performance.
Inference: Generate your 3D scenes.

Stay Ahead: The Latest LSM Updates

Keep up-to-date with the newest features and improvements to LSM:

[2025-04-12]: Test dataset download and testing instructions added (see data_process/data.md).
[2025-03-09]: ScanNet++ data preprocessing pipeline implemented.
[2025-03-06]: ScanNet data preprocessing pipeline upgraded.

These updates ensure you're working with the most efficient and powerful version of LSM.

Get Started: Installation for Immersive 3D

Ready to unleash the power of LSM? Follow these steps to set up your environment:

Clone the Repository: Download the LSM code

git clone --recurse-submodules https://github.com/NVlabs/LSM.git

Create a Conda Environment: Isolate your project dependencies.
```
conda create -n lsm python=3.10
conda activate lsm
```

Install PyTorch: Essential for deep learning tasks.

conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y
conda install pytorch-cluster pytorch-scatter pytorch-sparse -c pyg -y

Install Dependencies: Load the required Python packages.

pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Install PointTransformerV3: Enhance point cloud processing.

cd submodules/PointTransformerV3/Pointcept/libs/pointops
python setup.py install
cd ../../../../..

Install 3D Gaussian Splatting Modules: Enable realistic rendering.

pip install submodules/3d_gaussian_splatting/diff-gaussian-rasterization
pip install submodules/3d_gaussian_splatting/simple-knn

Install OpenAI CLIP: Integrate powerful image understanding.
```
pip install git+https://github.com/openai/CLIP.git
```

Build Croco Model: Refine feature matching.

cd submodules/dust3r/croco/models/curope
python setup.py build_ext --inplace
cd ../../../../..

Download Pre-trained Models: Accelerate your projects.

Create a directory for checkpoints:
```
mkdir -p checkpoints/pretrained_models
```

Download DUSt3R model weights:

wget -P checkpoints/pretrained_models https://download.europe.naverlabs.com/ComputerVision/DUSt3R/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth

Download LSEG demo model weights:

gdown 1FTuHY1xPUkM-5gaDtMfgCl3D0gR89WV7 -O checkpoints/pretrained_models/demo_e200.ckpt

Download LSM final checkpoint:

gdown 1q57nbRJpPhrdf1m7XZTkBfUIskpgnbri -O checkpoints/pretrained_models/checkpoint-final.pth

Data Preparation: Fueling Your 3D Engine

To get the most out of LSM, it must be trained to work well with specific datasets.

Training Data: LSM supports ScanNet and ScanNet++. Access requires signing agreements. See data_process/data.md for detailed instructions.
Testing Data: Refer to data_process/data.md for test data instructions.

Training LSM: Fine-Tuning for Excellence

After prepping your datasets, train using this command:

Training results are saved in SAVE_DIR (default: checkpoints/output).
Optional parameters in scripts/train.sh:
- --output_dir: Specifies the directory for your training outputs.

By customizing the training, you get highly optimized results using Large Spatial Model technology.

Inference: Bringing Your Scenes to Life

Ready to generate compelling 3D reconstructions using the power of the Large Spatial Model?

Prepare Images: Choose two indoor scene images. Store them in a directory:

demo_images/
└── indoor/
    ├── scene1/
    │   ├── image1.jpg
    │   └── image2.jpg
    └── scene2/
        ├── room1.png
        └── room2.png

Run Inference: Execute the script:
```
bash scripts/infer.sh
```
Parameters in scripts/infer.sh:
- --file_list: Paths to your input images.
- --output_path: Output directory for Gaussian points and video.
- --resolution: Image resolution (default recommended).

Acknowledgement: Standing on the Shoulders of Giants

LSM is built upon the work of many researchers and open-source projects:

Gaussian-Splatting and diff-gaussian-rasterization
DUSt3R
Language-Driven Semantic Segmentation (LSeg)
Point Transformer V3
pixelSplat
Feature 3DGS
ScanNet
ScanNet++

Citation: Give Credit Where It's Due

If you use LSM in your research, please cite the following paper:

@misc { fan2024largespatialmodelendtoend,
 title = { Large Spatial Model: End-to-end Unposed Images to Semantic 3D},
 author = { Zhiwen Fan and Jian Zhang and Wenyan Cong and Peihao Wang and Renjie Li and Kairun Wen and Shijie Zhou and Achuta Kadambi and Zhangyang Wang and Danfei Xu and Boris Ivanovic and Marco Pavone and Yue Wang},
 year = { 2024},
 eprint = { 2410.18956},
 archivePrefix = { arXiv},
 primaryClass = { cs.CV},
 url = { https://arxiv.org/abs/2410.18956},
}

Star the project to show your support! By following this guide, you gain the ability to unlock the potential of LSM, making strides in 3D reconstruction.