Generate Your Own 3D Dataset with SynCD: A Step-by-Step Guide
Want to create your own synthetic 3D dataset tailored to your specific needs? SynCD (Synthetic Correspondence Dataset) offers the tools and guidance to do just that. This guide breaks down the process, covering everything from setup to dataset generation. Let's dive in!
Download the Pre-Generated SynCD Dataset
If you want to skip the generation process and get straight to working with a dataset, a filtered, pre-generated option is available for download.
Getting Started: Setting Up Your Environment
Generating your own dataset requires a beefy GPU with at least 48GB of VRAM. The base environment setup involves a few key steps, which are detailed in the SynCD documentation.
Generating a Deformable Dataset
Creating a deformable dataset with SynCD is straightforward. Just follow these steps once your environment is ready:
-
Navigate to the "dataset" directory.
-
Run the provided Python script:
This command generates a deformable dataset and saves the attention masks to the specified output directory.
Generating a Rigid Dataset: A Step-by-Step Approach
Creating a rigid dataset requires a bit more setup and involves generating prompts and handling Objaverse assets. Here's the breakdown:
-
Download Prompts: Download pre-generated prompts for Objaverse assets:
-
Unzip Rendering Data: Extract the Objaverse rendering data:
-
Run the Generation Script: Execute the
gen_rigid.py
script usingtorchrun
. Here's an example command:
Objaverse-Guided Rigid Dataset Generation: Leveraging Pre-Rendered Assets
SynCD offers an Objaverse-guided rigid dataset generation approach, utilizing pre-rendered assets and multi-view correspondence. Note, this differs slightly from the original paper, using FLUX.1-Depth-dev
instead of xflux
for depth conditioning.
We use about 75,000 assets from Objaverse, following Cap3D for re-rendering process of the assets.
Steps to generate dataset using SynCD:
-
Install PyTorch3D and Dependencies: Install the necessary libraries:
-
Download and Unzip Renderings: Download a subset of Objaverse renderings:
-
Generate Correspondence Data: Run the
gen_corresp.py
script to calculate multi-view correspondence: -
Generate the Dataset: Finally, generate the dataset using
gen_rigid.py
:
Generating Prompts from LLMs for Custom Categories: Tailoring Your Dataset
To create a dataset focused on specific object categories, consider generating prompts using Large Language Models (LLMs).
SynCD includes tools for generating object and image background descriptions based on categories defined in assets/categories.txt
. For Objaverse assets, you can use the following resource:
This script leverages the Cap3D dataset to generate prompts, allowing you to build a 3D dataset specifically tailored to your research or application needs.
By following these steps, you can effectively leverage SynCD to generate high-quality synthetic datasets for various 3D research and development tasks.