A gearbox model for processing large volumes of data by using pipeline systems encapsulated into virtual containers

authors

Santiago-Duran, Miguel
Gonzalez-Compean, J. L.
Brinkmann, Andre
Reyes-Anastacio, Hugo G.
CARRETERO PEREZ, JESUS
MONTELLA, RAFFAELE
Toscano Pulido, Gregorio

published in

Future Generation Computer Systems-The International Journal of eScience Journal

publication date

May 2020

start page

304

end page

319

volume

106

Digital Object Identifier (DOI)

https://doi.org/10.1016/j.future.2020.01.014

full text

http://hdl.handle.net/10016/33801

International Standard Serial Number (ISSN)

0167-739X

Electronic International Standard Serial Number (EISSN)

1872-7115

abstract

Software pipelines enable organizations to chain applications for adding value to contents (e.g., confidentially, reliability, and integrity) before either sharing them with partners or sending them to the cloud. However, the pipeline components add overhead when processing large volumes of data, which can become critical in real-world scenarios. This paper presents a gearbox model for processing large volumes of data by using pipeline systems encapsulated into virtual containers. In this model, the gears represent applications, whereas gearboxes represent software pipelines. This model was implemented as a collaborative system that automatically performs Gear up (by using parallel patterns) and/or Gear down (by using in-memory storage) until all gears produce uniform data processing velocities. This model reduces delays and bottlenecks produced by the heterogeneous performance of applications included in software pipelines. The new container tool has been designed to encapsulate both the collaborative system and the software pipelines into a virtual container and deploy it on IT infrastructures. We conducted case studies to evaluate the performance of when processing medical images and PDF repositories. The incorporation of a capsule to a cloud storage service for pre-processing medical imagery was also studied. The experimental evaluation revealed the feasibility of applying the gearbox model to the deployment of software pipelines in real-world scenarios as it can significantly improve the end-user service experience when pre-processing large-scale data in comparison with state-of-the-art solutions such as Sacbe and Parsl.

A gearbox model for processing large volumes of data by using pipeline systems encapsulated into virtual containers Articles

Overview

authors

published in

publication date

start page

end page

volume

Digital Object Identifier (DOI)

full text

International Standard Serial Number (ISSN)

Electronic International Standard Serial Number (EISSN)

abstract

Classification

keywords