Scaling Deep Learning-Based Analysis of High-Resolution Satellite Imagery with Distributed Processing

Scaling Deep Learning-Based Analysis of High-Resolution Satellite Imagery with Distributed Processing

Scaling Deep Learning-Based Analysis of High-Resolution Satellite Imagery with Distributed Processing

High-resolution satellite imagery is a rich source of data applicable to a variety of domains, ranging from demo-graphics and land use to agriculture and hazard assessment. We have developed an end-to-end analysis pipeline that uses deep learning and unsupervised learning to process high-resolution satellite imagery and have applied it to various applications in previous work. As high-resolution satellite imagery is large-volume data, scalability is important to be able to analyze data from large geographical areas. To add scalability to our process, we converted our original pipeline, implemented using the Caffe deep learning library and the Python machine learning library Scikit-Learn, to other platforms that make use of distributed computation. Specifically, to add scalability, we use Keras for deep learning, and evaluate two different distributed platforms, Spark and Dask, for unsupervised learning. We report on results in scaling up our satellite analysis pipeline.