Domain Shift Problem in Computer Vision

A systematic experiment to improve pothole detection from overhead aerial imagery, addressing the challenge of models trained on street-level data failing on different perspectives.

Use Case: Detect potholes from overhead view on dirt roads with little traffic
95x
Improvement in mAP50
1,444
Manually Annotated Potholes
23
Benchmark Frames
0.429
Final mAP50 Score

Model Performance Progression (mAP50)

Baseline (1 Epoch)
0.0045
20 Epochs (Control)
0.0042
New Dataset (1 Epoch)
0.102
New Dataset (20 Epochs)
0.429

Key Finding

Domain-specific data is the primary driver of model performance improvement. While increasing training epochs on the original dataset showed minimal improvement (0.0045 → 0.0042), training on domain-specific aerial imagery resulted in a 20x improvement after just 1 epoch. Combining domain-specific data with extended training achieved a 95x improvement over baseline.

Experimental Methodology

A structured 8-step approach to systematically improve model performance

1

Define the Problem

Identify a specific, narrow use case to focus the experiment.

  • Recognized domain shift issue from street-level to overhead imagery
  • Defined use case: Overhead pothole detection on dirt roads
  • Scope limited to low-traffic environments
2

Hypothesize Data Requirements

Determine what data characteristics are needed to solve the problem.

  • Diverse overhead images of dirt roads
  • Various weather conditions and angles
  • Manual annotation requirement identified
3

Define Success Metrics

Establish measurable criteria to validate the hypothesis.

  • Created benchmark set: 23 frames, 1,444 annotated potholes
  • Target: Achieve mAP50 > 0.5 (baseline was 0.0045)
  • Implemented model.val() for validation
4

Source Training Data

Acquire or create the necessary dataset for training.

  • 3 stock videos from Pexels (free)
  • 1 synthetic video from Nano Bananas
  • Explored synthetic data generation options
5

Prepare & Annotate Data

Process and label the data for training.

  • Used Roboflow with Meta's SAM3 for initial labeling
  • Manual labeling for challenging videos
  • Applied augmentation: brightness & exposure
  • Converted bounding boxes to polygons for better precision
6

Train Models

Train multiple model variants to test hypotheses.

  • Baseline: yolo11n.pt pretrained model
  • Control: 1 & 20 epochs on original dataset
  • Experimental: 1 & 20 epochs on new dataset
7

Test Hypotheses

Validate results against the benchmark dataset.

  • Ran validation on all model variants
  • Compared mAP50 and inference times
  • Created "proof of life" side-by-side comparisons
8

Draw Conclusions

Analyze results and determine next steps.

  • Hypothesis validated: Domain-specific data is critical
  • 95x improvement achieved with new data + training
  • Contributed validation scripts to OSS repo

Data Strategy

Understanding the importance of domain-specific training data

Benchmark Dataset

Total Frames 23
Annotated Potholes 1,444
Annotation Type Polygons
Source Target Video

Training Dataset

Video Sources 4
Stock Videos 3 (Pexels)
Synthetic Videos 1
Split Ratio 90/10 Train/Val

Augmentation

Brightness