Issue 58

Detecting Roads from Aerial Images using Deep Learning

Bogdan Gliga
Associate Software Engineer @ Telenav


In recent years, Deep Neural Networks have been used to generate state-of-the-art results in numerous sub-fields of computer vision. This category of algorithms can be used to extract substantial amount of information from many types of imagery. Our work is focuses on training a neural network to accurately detect roads, by analyzing tens of thousands of satellite images.

In the mapping industry, having very detailed and accurate maps is of crucial importance in order to be able to build high-quality and precise routing applications. The first step towards achieving this goal is detecting all the roads in a certain area. Every other piece of traffic-relevant map information, such as turn restrictions or speed limits, depends on knowing the underlying road network.

Recent development in aerial imagery technology makes it possible for satellites to provide high-resolution images where navigable roads are easily distinguishable for a human. The task at hand is to build an intelligent system which, given a certain satellite image, can accurately extract the existing road topology. In technical terms, for each pixel of the input image, we need to generate a prediction: that pixel's probability to be part of a road. In academia, this problem is called Semantic Segmentation, as we have to segment the image into objects of interest. Finally, once we have road-labeled pixels, constructing the road network for subsequent uses in other applications is trivial.

The Solution

The first prerequisite for building such a system is having a comprehensive dataset of pictures and the expected road topology comprised in each picture. This dataset will be used for training the deep neural network, to 'teach' it to recognize roads given a satellite image. There already is an available dataset for this specific problem named "Massachusetts Roads Dataset". An example of the pictures and road topology representations contained in the dataset is:

The second prerequisite is choosing the optimal architecture for the problem at hand. As this is inherently an image-processing task, our solution is based on a modified version of a Convolutional Neural Network, which is designed specifically for analyzing images.

The Architecture

Our architecture resembles the standard Deep Learning architecture for the task at hand, namely Semantic Segmentation, which is detailed in a paper by Long et al named "Fully Convolutional Networks for Semantic Segmentation". Another similar approach has been described in a Ronneberger et al "U-Net: Convolutional Networks for Biomedical Image Segmentation".

The pipeline involves two stages, the Convolutional Network and the Deconvolutional Network. The main purpose of the Convolutional Network stage is to output a reduced representation of the image. This is achieved by sequentially analyzing every sub-region of the image and performing various complex mathematical operations on pixel values. At the end that contrived output will represent the probability for a road to appear in each of the analyzed sub-regions.

The next task is taking this probability vector and transforming it into the final road-topology output. More specifically, take the probability of a road appearing in each sub-region and transform it into a pixel representation of that particular road. This operation will be performed by the second stage of the pipeline, the Deconvolutional Network. To achieve this, the same mathematical operations performed in the first stage will be performed, but in reverse order. Now, instead of taking an image and producing a probability vector, we take a probability vector and output an image.

The Results

Even after a short training time, lasting only a few minutes, on an NVIDIA GTX 1080, it can be easily noticed that the results are promising and that the network started to recognize the road topology in an accurate way. In the pictures below, the satellite image is on the left, the ground truth is in the middle and the neural network output is on the right.

In the future we plan to extract the road topology from the net output and compare it to the existing roads in OpenStreetMap in order to add the missing ones. If a significant number of high-quality satellite images will be available, this can mean thousands of kilometers of missing roads being added in OSM, which will greatly improve the map quality.



  • Accenture
  • BT Code Crafters
  • Accesa
  • Bosch
  • Betfair
  • MHP
  • Connatix
  • BoatyardX
  • .msg systems
  • Yardi
  • Colors in projects