An easier way to train a model

Let's see how to train a deep learning architecture on some point cloud dataset.

Info

Up to now, only segmentation task is possible and classification will be as well implemented in the next future.

Segmenting some data

The script located at scripts/segmentation.py is exactly what we need to train and validate a segmentation model on point cloud data.

Note

By default, datasets from DeePoints are downloaded into the data directory of the current project. Therefore, when running the script for the first time, you will see an the following added to your project directory.

data
└── semantic3d
    ├── original
    │   ├── testing
    │   │   ├── ...
    │   │   └── untermaederbrunnen_3.laz
    │   ├── training
    │   │   ├── ...
    │   │   └── untermaederbrunnen_1.laz
    │   └── validation
    │       ├── ...
    │       └── sg27_9.laz
    └── split
        ├── testing
        │   ├── ...
        │   └── untermaederbrunnen_3_275.laz
        ├── training
        │   ├── ...
        │   └── untermaederbrunnen_1_274.laz
        └── validation
            ├── ...
            └── sg27_9_2177.laz

This is an example when downloading the Semantic3D dataset using the argument semantic3d.

The flexibility of this command lies in the choice of the deep learning architecture and the point cloud dataset. However, we can also tweak other parameters such as the batch size, the number of training epochs or we can enable CUDA acceleration.

Example

Let us see a brief example where we train a PointNet model on the Semantic3D to perform segmentation tasks.

Linux/macOSWindows

python scripts/segmentation.py train pointnet semantic3d --batchsize 64 --samplesize 4096 --epochs 100

py .\scripts\segmentation.py train pointnet semantic3d --batchsize 64 --samplesize 4096 --epochs 100

If you add the --testing option to the previous command, it will validate the trained model on testing each and whole testing file within the data/semantic3d/split/testing directory.

Tracking the simulations

We could now wonder where data is saved after running some instances of the script. Luckily, the script sets this up for us using MLflow to track and monitor our running simulations.

Tip

All the data is automatically saved to the mlruns directory within your project. You can check in real-time or afterwards the metrics logged by your simulations using the MLflow's local server located at localhost:8000.

mlflow ui

Getting some predictions

The command line interface also lets you easily label a testing partition of a dataset with the predictions of a trained model. The output will be saved to an output predictions directory. As a matter of fact, for each file input {dataset}/split/testing/{file}.parquet an output file predictions/{file}.parquet will be written with the original spatial coordinates plus the inferred labels.

Example

After executing a simulation with python scripts/segmentation.py train in non-silent mode, we are provided with the URI the reload the train model as output. It will be something like file:///path/to/mlruns/264873137792741683/349e3c04cc2645ed878ab0cf399feadf/artifacts/experiment.pointnet.semantic3d.bs:4.ss:1024.

Linux/macOSWindows

python scripts/segmentation.py predict \
    file:///path/to/mlruns/264873137792741683/349e3c04cc2645ed878ab0cf399feadf/artifacts/experiment.pointnet.semantic3d.bs:4.ss:1024 \
    --destination predictions

py .\scripts\segmentation.py predict \
    file:///path/to/mlruns/264873137792741683/349e3c04cc2645ed878ab0cf399feadf/artifacts/experiment.pointnet.semantic3d.bs:4.ss:1024 \
    --destination predictions

If you add the --testing option to the previous command, it will validate the trained model on testing each and whole testing file within the data/semantic3d/split/testing directory.