Международная конференция «Математические и информационные технологии, MIT-2016»

28 августа – 5 сентября 2016 г.

Врнячка Баня, Сербия - Будва, Черногория

Смагин С.И.   Лупян Е.А.   Сорокин А.А.   Бурцев М.А.   Королев С.П.   Прошин А.А.   Крамарева Л.С.  

Analysis of possibilities of cloud technologies for distributed storage and processing of remote sensing data for environmental monitoring

Докладчик: Смагин С.И.

Currently, remote sensing data systems are a major source of objective information for evaluation and monitoring of the environment and environmental management in various regions of the planet. The growing number of specialized satellites and their features favours almost explosive growth of the amounts of received instrumental data.
Technology and methods of remote sensing data analysis and processing have been rapidly developing in recent years aimed to solve large-scale scientific problems (e.g., studies of volcanic activity, the study of changes in vegetation cover, etc.) [1-5]. For example, a geographically distributed system FGBI SRC "Planeta" daily receives over 1 terabyte of instrumental data from 16 foreign and Russian Earth observation satellites, producing about 350 types of information products and using the IKI technologies and application systems. The total volume of operational and long-term data archives is currently exceeding 0.6 petabytes.
For efficient use of such information, it is necessary to have a specialized system that addresses technological problems related to the data collection and unification, the formation of structured data archives and very large-scale data sets and provides researchers with the instruments implemented in the form of software tools for data analysis.
In solving such a problem, it is impossible to follow the traditional path implying that for each scientific task a separate set of data archives is individually formed and filled with all available data sources, and specific tools are designed to work with them. This approach has a number of significant drawbacks, because it is necessary to develop and maintain large-scale technological services for every, even a small research project. In addition, it requires significant computational resources allocated for data storage and processing, and for the operation of a large variety of software. Therefore, more and more designers are turning to cloud computing [6-8], translating the solution of these laborious tasks into virtual computing environment and implementing its interaction with their applications and information systems through flexible RESTful web-services.
This trend exists in present-day Russia, where a modern unified geographically distributed system for work with remote sensing data (ERS ETRIS) began to be established a few years ago with the support of the Russian Space Agency. The system should particularly solve the problems of efficient access to distributed archives of satellite data, but now the work is still far from complete.
Under the support of RFBR, the authors are conducting research and developing modern methods and algorithms for storage, processing and analysis of extra large volumes of remote sensing data using the capabilities of cloud computing environments. The work involves the creation of software tools to build and maintain distributed data archives for remote sensing data and their processing results, as well as provides users (including a variety of specialized information systems) with convenient services for distributed data access and analysis.
As part of the research, the efficiency of application of modern information technologies is evaluated for secure data storage and access for the project tasks. These technologies include a distributed parallel file system (Lustre, GlusterFS, Ceph, etc.), distributed databases (RasDaMan, Apache HBase, Apache Cassandra, etc.), and cloud storage system (OpenStack, etc.).
The developed algorithms and technologies will be implemented for existing, constantly updated distributed archives of CC FEB RAS, IKI, and SRC "Planeta", the total volume of which already exceeds one PB, growing daily by over 1 Tbyte. This project will allow the following:
• facilitating interdisciplinary integration and ensuring a scalable and secure storage of scientific data with a common data model and query language;
• data recording and reading speed up;
• conducting distributed search of metadata and traceability data (data provenance);
• facilitating the interactive analysis and visual search of regularity and data reuse.
In addition, the derived results will be used to organize the work with extra-large distributed archives of satellite data obtained by SRI RAS and SRC "Planeta" in the information system "Vega - Far East" [2]. This system is designed to provide access to the distributed data information system for collective use of space-based remote sensing data (ERS CLAIMS) for scientific, education and innovation activities in the field of research and monitoring the environment in the Russian Far East regions. This will expand the list of information resources available to users of the system for more than 0.5 petabytes of data and provide the possibilities of remote sensing data processing and analysis based on the cloud-computing environment, formed by the resources available at CC FEB RAS, Space Research Institute and the regional centers of SRC "Planeta".
The studies were supported by the Program of Fundamental Studies of the Far Eastern Branch of the RAS “The Far East” (No. 15-I-4-071, 15-I-4-072) and by the Russian Foundation for Basic Research (No. 15-29-07953).

К списку докладов

© 1996-2019, Институт вычислительных технологий СО РАН, Новосибирск