CV Model Improvement Journey - Pothole Detection

Domain Shift Problem in Computer Vision

A systematic experiment to improve pothole detection from overhead aerial imagery, addressing the challenge of models trained on street-level data failing on different perspectives.

Use Case: Detect potholes from overhead view on dirt roads with little traffic

95x

Improvement in mAP50

1,444

Manually Annotated Potholes

Benchmark Frames

0.429

Final mAP50 Score

Model Performance Progression (mAP50)

Baseline (1 Epoch)

0.0045

20 Epochs (Control)

0.0042

New Dataset (1 Epoch)

0.102

New Dataset (20 Epochs)

0.429

Key Finding

Domain-specific data is the primary driver of model performance improvement. While increasing training epochs on the original dataset showed minimal improvement (0.0045 → 0.0042), training on domain-specific aerial imagery resulted in a 20x improvement after just 1 epoch. Combining domain-specific data with extended training achieved a 95x improvement over baseline.

Experimental Methodology

A structured 8-step approach to systematically improve model performance

Define the Problem

Identify a specific, narrow use case to focus the experiment.

Recognized domain shift issue from street-level to overhead imagery
Defined use case: Overhead pothole detection on dirt roads
Scope limited to low-traffic environments

Hypothesize Data Requirements

Determine what data characteristics are needed to solve the problem.

Diverse overhead images of dirt roads
Various weather conditions and angles
Manual annotation requirement identified

Define Success Metrics

Establish measurable criteria to validate the hypothesis.

Created benchmark set: 23 frames, 1,444 annotated potholes
Target: Achieve mAP50 > 0.5 (baseline was 0.0045)
Implemented model.val() for validation

Source Training Data

Acquire or create the necessary dataset for training.

3 stock videos from Pexels (free)
1 synthetic video from Nano Bananas
Explored synthetic data generation options

Prepare & Annotate Data

Process and label the data for training.

Used Roboflow with Meta's SAM3 for initial labeling
Manual labeling for challenging videos
Applied augmentation: brightness & exposure
Converted bounding boxes to polygons for better precision

Train Models

Train multiple model variants to test hypotheses.

Baseline: yolo11n.pt pretrained model
Control: 1 & 20 epochs on original dataset
Experimental: 1 & 20 epochs on new dataset

Test Hypotheses

Validate results against the benchmark dataset.

Ran validation on all model variants
Compared mAP50 and inference times
Created "proof of life" side-by-side comparisons

Draw Conclusions

Analyze results and determine next steps.

Hypothesis validated: Domain-specific data is critical
95x improvement achieved with new data + training
Contributed validation scripts to OSS repo

Data Strategy

Understanding the importance of domain-specific training data

Benchmark Dataset

Total Frames 23

Annotated Potholes 1,444

Annotation Type Polygons

Source Target Video

Training Dataset

Video Sources 4

Stock Videos 3 (Pexels)

Synthetic Videos 1

Split Ratio 90/10 Train/Val

Augmentation

Brightness