Semantic3D
The Semantic3D dataset is a collection of point clouds containing in total over four billion points. The dataset comes with two different versions, semantic-8 and reduced-8 (down sampled) where we uniquely use the first one.
Description
The original dataset has the following structure.
data/semantic3d/original/
├── testing
│ ├── neugasse_1.laz
│ └── untermaederbrunnen_3.laz
├── training
│ ├── bildstein_1.laz
│ ├── bildstein_3.laz
│ ├── domfountain_1.laz
│ ├── domfountain_2.laz
│ ├── sg27_5.laz
│ └── untermaederbrunnen_1.laz
└── validation
├── bildstein_5.laz
├── domfountain_3.laz
└── sg27_9.laz
Each file is a large point cloud representing a church, street, etc. Moreover, each point cloud contains the usual attributes x, y, z, intensity, red, blue and green. A classification label is present in the point cloud files used for training any segmentation model, that is inside data/semantic3d/original/training directory.
Note
To better process the dataset in batches when training and validating a deep learning model, we already offer a split version of it that can be found within the data/semantic3d/split/ directory of the project. Each big point cloud file, for instance data/semantic3d/original/training/sg27_5.laz, is split in many smaller clouds of roughly 102400 points located at data/semantic3d/split/training/. If you are curious how we handle the splitting, take a look at Partition a large dataset.
The cloud points belong to 9 different classes where 0 corresponds to unlabeled.
@staticmethod
def classes() -> list[str]:
return [
'unlabeled',
'man-made terrain',
'natural terrain',
'high vegetation',
'low vegetation',
'buildings',
'hard scape',
'scanning artefacts',
'cars'
]
Usage
Using the dataset is dead simple with the lightning.LightningDataModule-based interface.
from deepoints.datasets.semantic3d import Semantic3D
import lightning
# defines a trainer model
trainer = lightning.Trainer(...)
# any segmentation model you like
model = ...
# declares a Semantic3D loader that randomly samples
# 8 points clouds files per batch per 4096 points per file
datamodule = Semantic3D(sample_size = 4096, batch_size = 8)
# train your model on the dataset
trainer.fit(model, datamodule = datamodule)
References
Here you find a reference to the original paper that introduced the dataset.