Unlock the Power of 3D Scene Reconstruction with Large Spatial Model (LSM)
Ready to turn 2D images into stunning 3D scenes? This guide dives into NVlabs' Large Spatial Model (LSM), a groundbreaking technology for end-to-end unposed image to semantic 3D reconstruction. We'll break down installation, data preparation, training, and inference, making this powerful tool accessible to all. Prepare to be amazed by the possibilities of LSM.
Dive into LSM: Key Features and Capabilities
- End-to-End Reconstruction: Transform unposed 2D images directly into semantic 3D models.
- Feature Visualization: Gain insights into scene understanding through feature visualization.
- RGB Color Rendering: Generate realistic and visually appealing 3D scene renderings.
- Supports ScanNet and ScanNet++: Train and test your models with industry-standard datasets.
Get Started: A Step-by-Step Guide
1. Installation: Setting Up Your Environment
Let's get LSM up and running! Follow these steps carefully:
- Clone the Repository: This command downloads the LSM codebase along with all necessary submodules.
- Create a Conda Environment: This creates an isolated environment to avoid dependency conflicts. Activating it ensures all subsequent installations apply to LSM.
- Install PyTorch and Related Packages: This installs the core deep learning framework, PyTorch, along with essential CUDA-enabled libraries for GPU acceleration.
- Install Other Python Dependencies:
requirements.txt
file and installsflash-attn
package.
This installs all other required Python packages from the - Install PointTransformerV3: Compile and install PointTransformerV3, a crucial component for point cloud processing.
- Install 3D Gaussian Splatting Modules: Integrate 3D Gaussian Splatting modules for high-quality rendering.
- Install OpenAI CLIP: Incorporate OpenAI's CLIP model for semantic understanding.
- Build croco Model: Compile the croco model, essential for depth estimation.
- Download Pre-trained Models: Download pre-trained weights for optimal performance (DUSt3R, LSEG demo model, and LSM final checkpoint).
2. Data Preparation: Readying Your Input
LSM thrives on well-prepared data. Here's what you need to know:
- Datasets: ScanNet and ScanNet++ are supported for training the Large Spatial Model.
- Agreements: Accessing these datasets requires signing agreements.
- Detailed Instructions: Find comprehensive data preparation guidelines in
data_process/data.md
.
For testing, refer to data_process/data.md
for specific details on the test dataset.
3. Training: Building Your 3D Reconstruction Powerhouse
Once your data is ready, initiate training with:
Training results will be saved to the directory specified by --output_dir
in scripts/train.sh
. The default is checkpoints/output
.
4. Inference: Bringing Your Images to Life in 3D
Time to reconstruct 3D scenes from your own images!
-
Prepare Your Images: Place two indoor scene images into a chosen directory.
-
Example:
demo_images/ └── indoor/ ├── scene1/ │ ├── image1.jpg │ └── image2.jpg └── scene2/ ├── room1.png └── room2.png
-
-
Run Inference:
Adjust parameters like
--file_list
,--output_path
, and--resolution
inscripts/infer.sh
. The default settings are generally recommended.
Inspiration and Acknowledgements
This project builds upon the shoulders of giants. The creators of LSM extend heartfelt gratitude to the authors of: Gaussian-Splatting, DUSt3R, LSeg, Point Transformer V3, pixelSplat, Feature 3DGS, ScanNet and ScanNet++
Citation
If you find LSM useful, please cite the following paper:
@misc { fan2024largespatialmodelendtoend,
title = { Large Spatial Model: End-to-end Unposed Images to Semantic 3D},
author = { Zhiwen Fan and Jian Zhang and Wenyan Cong and Peihao Wang and Renjie Li and Kairun Wen and Shijie Zhou and Achuta Kadambi and Zhangyang Wang and Danfei Xu and Boris Ivanovic and Marco Pavone and Yue Wang},
year = { 2024},
eprint = { 2410.18956},
archivePrefix = { arXiv},
primaryClass = { cs.CV},
url = { https://arxiv.org/abs/2410.18956},
}