improve osm data quality with deep learning · compare predicts against osm on osm but not nothing...
TRANSCRIPT
Improve OSM data quality with Deep Learning
@o_courtin
@fosdem 2019
Detect inconsistencies between two DataSets
Goal
NeuralsNetwork
Imagery
Labels
Loss Function
NeuralsNetwork
Imagery
Labels
Loss Function
TrainedModel
Prediction
NeuralsNetwork
Imagery
Labels
Loss Function
TrainedModel
Prediction
AlternateDataSet
Compare
RoboSat.pink
@RobosatP Semantic Segmentation ecosystem for GeoSpatial Imagery
DataSet Quality Analysis
Change Detection highlighter
Features extraction
RoboSat.pink Spirit
State of Art SemSeg
Industrial code robustness
Code minimalism as a code aesthetic
Modular and extensible, by design
OSM and MapBox ecosystem friendly
GeoSpatial standards compliant
MIT Licence
DownloadWMS
TMS
XYZ
RasterizeGeoJSON
ExtractOSM pbf
Cover
Image
Tile
Raster
Label
Subset
TrainingDataSetBbox
XYZ dir
Data Preparation
https://arxiv.org/pdf/1806.00844.pdf
PreTrained Encoder
Image Label Cross Entropy mIoU Lovasz
http://www.cs.toronto.edu/~wenjie/papers/iccv17/mattyus_etal_iccv17.pdfhttp://www.cs.umanitoba.ca/~ywang/papers/isvc16.pdfhttps://arxiv.org/abs/1705.08790
Semantic Loss
From OpenData to OpenDataSet
https://github.com/datapink/robosat.pink/blob/master/docs/from_opendata_to_opendataset.md
rsp cover --zoom 18 --type bbox 4.795,45.628,4.935,45.853 ~/rsp_dataset/cover
rsp download --type WMS 'https://download.data.grandlyon.com/wms/grandlyon?SERVICE=WMS&REQUEST=GetMap&VERSION=1.3.0&LAYERS=Ortho2015_vue_ensemble_16cm_CC46&WIDTH=512&HEIGHT=512&CRS=EPSG:3857&BBOX={xmin},{ymin},{xmax},{ymax}&FORMAT=image/jpeg' --web_ui --ext jpeg ~/rsp_dataset/cover ~/rsp_dataset/images
Imagery
wget -O ~/rsp_dataset/lyon_roofprint.json 'https://download.data.grandlyon.com/wfs/grandlyon?SERVICE=WFS&REQUEST=GetFeature&TYPENAME=ms:fpc_fond_plan_communaut.fpctoit&VERSION=1.1.0&srsName=EPSG:4326&outputFormat=application/json; subtype=geojson'
rsp rasterize --config config.toml --zoom 18 --web_ui ~/rsp_dataset/lyon_roofprint.json ~/rsp_dataset/cover ~/rsp_dataset/labels
Labels
mkdir ~/rsp_dataset/training ~/rsp_dataset/validation
cat ~/rsp_dataset/cover | sort -R > ~/rsp_dataset/cover.shuffledhead -n 16384 ~/rsp_dataset/cover.shuffled > ~/rsp_dataset/training/covertail -n 7924 ~/rsp_dataset/cover.shuffled > ~/rsp_dataset/validation/cover
rsp subset --web_ui --dir ~/rsp_dataset/images --cover ~/rsp_dataset/training/cover --out ~/rsp_dataset/training/imagesrsp subset --web_ui --dir ~/rsp_dataset/labels --cover ~/rsp_dataset/training/cover --out ~/rsp_dataset/training/labelsrsp subset --web_ui --dir ~/rsp_dataset/images --cover ~/rsp_dataset/validation/cover --out ~/rsp_dataset/validation/imagesrsp subset --web_ui --dir ~/rsp_dataset/labels --cover ~/rsp_dataset/validation/cover --out ~/rsp_dataset/validation/labels
rsp train --config config.toml ~/rsp_dataset/pth
Split DataSet and first Training
Buildings IoU metric on validation dataset,after 10 epochs : 0.82
rsp predict --config config.toml --checkpoint ~/rsp_dataset/pth/checkpoint-00010-of-00010.pth --web_ui ~/rsp_dataset/images ~/rsp_dataset/masks
Predict
Detect wrong labels (zoom out)
rsp compare --images ~/rsp_dataset/images ~/rsp_dataset/labels ~/rsp_dataset/masks --mode stack --labels ~/rsp_dataset/labels --masks ~/rsp_dataset/masks --config config.toml --ext jpeg --web_ui ~/rsp_dataset/compare
rsp compare --mode list --labels ~/rsp_dataset/labels --maximum_qod 80 --minimum_fg 5 --masks ~/rsp_dataset/masks --config config.toml --geojson ~/rsp_dataset/compare/tiles.json
Detect wrong labels (zoom in)
Detect wrong labels (zoom in)
GIGO
Semi-manually select wrong labels
rsp compare --mode side --images ~/rsp_dataset/images ~/rsp_dataset/compare --labels ~/rsp_dataset/labels --maximum_qod 80 --minimum_fg 5 --masks ~/rsp_dataset/masks --config config.toml --ext jpeg --web_ui ~/rsp_dataset/compare_side
rsp subset --mode delete --dir ~/rsp_dataset/training/images --cover ~/rsp_dataset/cover.to_remove > /dev/nullrsp subset --mode delete --dir ~/rsp_dataset/training/labels --cover ~/rsp_dataset/cover.to_remove > /dev/nullrsp subset --mode delete --dir ~/rsp_dataset/validation/images --cover ~/rsp_dataset/cover.to_remove > /dev/nullrsp subset --mode delete --dir ~/rsp_dataset/validation/labels --cover ~/rsp_dataset/cover.to_remove > /dev/null
rsp train --config config.toml --epochs 100 ~/rsp_dataset/pth_clean
Buildings IoU metric on validation datasetafter 10 epochs : 0.84after 100 epochs : 0.87
Remove selected wrong labels and Train again
Both Prediction and DataSet are quite consistents
Change Detection
Prediction False Negative
Compare Predicts against OSM
wget -O /tmp/ra_osm.pbf http://download.geofabrik.de/europe/france/rhone-alpes-latest.osm.pbf
osmosis --read-pbf file="/tmp/ra_osm.pbf" --bounding-box left=4.795 bottom=45.628 right=4.935 top=45.853 completeWays=yes completeRelations=yes cascadingRelations=yes --write-pbf file="/tmp/osm_lyon.pbf"
rsp extract --type building /tmp/osm_lyon.pbf ~/rsp_dataset/osm.json
rsp rasterize --config ~/robosat.pink/config.toml --zoom 18 ~/rsp_dataset/osm.json ~/rsp_dataset/cover ~/rsp_dataset/osm
rsp compare --images ~/rsp_dataset/images ~/rsp_dataset/osm ~/rsp_dataset/masks_clean --mode stack --labels ~/rsp_dataset/osm --masks ~/rsp_dataset/masks_clean --config config.toml --web_ui ~/rsp_dataset/compare_osm
rsp vectorize --type building --config config.toml ~/rsp_dataset/masks_clean /tmp/building.json
Compare Predicts against OSM
On OSM but not nothing related on the imagery
- building was builded since imagery - building was destroyed but since on OSM
Predict by Imagery but not in OSM :
- polygon OSM is OK but without buildings attribute (most frequent) - building is really missing in OSM - building was destroyed since imagery - model prediction artefact
OSM and Training DataSet classification divergence
Performances
Whole Data Preparation : About an hour and half (downloads included)
Manual Filtering : About two hours
Training : ~20mn per epoch (i.e ~30 hours for 100 epochs)
Prediction : ~3 MegaPixels per second
On a single GTX 1080 Ti
Training can scale with multi GPUs.
Stacks
Proj 4
GEOS GDAL
Rasterio
CUDAcuDNN
PyTorch
NumPy
OpenCV
RoboSat.pink
PillowShapelib Osmium
Mercantile
SuperMercado
August RoboSat 1.0 MapBox RoboSat Initial release daniel-j-h bkowshik
September RoboSat 1.1 Training perfs increase Jesse-jApps ocourtin
October RoboSat master OSM Roads extraction DragonEmperorG
mIoU and Lovasz losses ocourtin
November RoboSat PR 138 MultiBands support and Tools refactor ocourtin
November RoboSat.pink 0.1 QoD support and whole refactor ocourtin
February RoboSat.pink 0.2 Feature Extraction ocourtin
From RoboSat to RoboSat.pink
Next ?
- Lower resolution Imagery SemSeg as Sentinel-2
- Predict performance improvments
- OSM OpenDataSet and Pre Trained models
Take Away
- Industrial state of art Aerial SemSeg ecosystem available, and playful
- Plain OpenData can be use to train model
- Predict speed performances still to be improve to scale at large