Preprocessing point cloud data

In this section, we show how to perform some basic operations on point cloud datasets through the script scripts/preprocessing.py.

Note

The aforementioned script provides commands convert, partition and check. The first two commands do accept a directory of input point cloud files and output the results of the selected operation in another directory, while the last one only operates on an input directory. Therefore, the script is better suited for block operations on multiple files.

Convert many files at once

To convert a set of point cloud files from one format to another, it is sufficient to invoke the script command scripts/preprocessing.py convert with the expected arguments. Follow the previous command with the option --help to have more extensive clarification.

Example

In this example, we are converting a bunch of .laz point cloud files into .parquet format. As you notice the directory where input files are located is set as data/input/ while the directory where converted files are output is chosen as data/output/.

Linux/macOSWindows

python scripts/preprocessing.py convert data/input/ data/output/  --input laz --output parquet

py .\scripts\preprocessing.py convert .\data\input\ .\data\output\  --input laz --output parquet

Partition a large dataset

Partitioning a set of large point cloud files into smaller chunks can be instead performed with the command scripts/preprocessing.py partition. The usage is similar to conversion as we select an input and output directory where files are respectively read and split.

Example

In this example, we are splitting some large .laz point cloud files into smaller point clouds of roughly 102400 points grouped by spatial proximity. As before, the input directory is data/input/ while the output one is data/output/.

Linux/macOSWindows

python scripts/preprocessing.py partition data/input/ data/output/ --capacity 102400 --input laz --output laz

py .\scripts\preprocessing.py partition .\data\input\ .\data\output\ --capacity 102400 --input laz --output laz

The command uses pdal under the hood to split each point cloud according to square tiles of the requested capacity. However, the capacity is approximate and generated files tend to have slighly less points than as specified with --capacity.

Sanity check your dataset

Checking the healthiness of your point cloud files is dead easy using the command check. This supports both .laz and .parquet formats.

Example

Here, we are checking some .laz point cloud files located inside data/input/ where a single file is considered valid if it can be safely decoded according to its format and contains at least 4096 points as show below.

Linux/macOSWindows

python scripts/preprocessing.py check data/input/ --capacity 4096 --input laz --purge

py .\scripts\preprocessing.py check .\data\input\ --capacity 4096 --input laz --purge

Consider that using --purge will permanently delete corrupted point cloud files.

A more intricate use case

Imagine we want to split large .laz point cloud files into smaller .parquet point cloud files Unfortunately, only .laz to .laz partitioning is supported as for now. Therefore, to achieve the desired result it is sufficient to chain the previous commands as show below.

Linux/macOSWindows

# partition big .laz files into smaller .laz files
python scripts/preprocessing.py partition data/input/ data/intermediate/ --capacity 102400 --input laz --output laz
# converting small .laz files into .parquet files
python scripts/preprocessing.py convert data/intermediate/ data/output/ --input laz --output parquet

# partition big .laz files into smaller .laz files
py .\scripts\preprocessing.py partition .\data\input\ .\data\intermediate\ --capacity 102400 --input laz --output laz
# converting small .laz files into .parquet files
py .\scripts\preprocessing.py convert .\data\intermediate\ .\data\output --input laz --output parquet

It can be also convenient to add some sanity checks in between the commands using the scripts/preprocessing.py check utility command.