This section of my web site points to the main tools I have developed during these years with my team and students.
The last years have seen a steep rise in data generation world wide, with the development and widespread adoption of several software projects targeting the Big Data paradigm. Many companies currently engage in Big Data analytics as part of their core business activities, nonetheless there are no tools or techniques to support the design of the underlying infrastructure configuration backing such systems.
D-SPACE4Cloud is a novel optimization tool implementing a design-time exploration process able to identify the Spark cluster of minimum cost with a priori performance guarantees. In a nutshell, the rationale of D-SPACE4Cloud is to support you in identifying the most cost-effective public or private cluster configuration that fulfils some desired performance requirements, i.e., deadlines, for a set of Spark applications.
D-SPACE4Cloud is a distributed software system able to exploit multi-core architecture to execute the optimization in parallel. In particular, it features:
- A presentation layer (an Eclipse plug-in). All of the D-SPACE4Cloud functionalities are available through this Eclipse plug-in, which can be downloaded from https://github.com/dice-project/DICE-Platform/releases. To install D-SPACE4Cloud you need to download the latest release of the DICE IDE for your operating system, decompress it, and then launch the DICE executable.
- A log pre-processor service. This service is fed with a collection of logs of Spark executions. It analyses these logs and builds a machine learning model for each candidate VM exploiting octave. Finally, it produces the input files which will be used by dagSim, an ad-hoc fast simulator for applications based on Directed Acyclic Graphs) in the optimization service.
- An orchestration service (referred to as front-end), which queues and dispatches the optimization problems to the optimization It can be downloaded from https://github.com/deib-polimi/diceH2020-space4cloud-webGUI/releases.
- A horizontally scalable optimization service (referred to as back-end), which implements a strategy aimed at identifying the minimum cost deployment, and interfaces the supported simulators: dagSim (https://github.com/eubr-bigsea/dagSim), JMT (http://jmt.sourceforge.net/), and GreatSPN (http://www.di.unito.it/~greatspn/index.html).The service can be downloaded from https://github.com/deib-polimi/diceH2020-space4cloudsWS.
D-SPACE4Cloud has been developed within the framework of the DICE H2020 research project.
- License: Apache 2.0
- Reference: Michele Ciavotta, Eugenio Gianniti, Danilo Ardagna.
D-SPACE4Cloud: A Design Tool for Big Data Applications. ICA3PP 2016: 614-629.
OPT_IC & OPT_JR
Nowadays, the big data paradigm is consolidating its central position in the industry, as well as in society at large. Lots of applications, across disparate domains, operate on huge amounts of data and offer great advantages both for business and research.
According to analysts, cloud computing adoption is steadily increasing to support big data analyses and Spark will probably take a prominent market position for the next decade.
As big data applications gain more and more importance over time and given the dynamic nature of cloud resources, it is fundamental to develop intelligent resource
management systems to provide Quality of Service guarantees to application end-users.
OPT_IC and OPT_JR are a set of run-time optimization-based resource management tools for advanced big data analytics. In our framework, users submit Spark
applications characterized by a priority, and by a hard or soft deadline.
OPT_IC identifies the minimum capacity to run a Spark application within the deadline while OPT_JR is able to re-balance the cloud resources in case of heavy load, minimising the weighted applications tardiness. Spark applications execution times are estimated by relying on a gamut of techniques, including machine learning, approximated analyses, and simulation.
OPT_IC & OPT_JR have been developed within the framework of the EUBRA-BIGSEA H2020 research project.
- License: Apache 2.0
- Reference: Danilo Ardangna, Enrico Barbierato, Eugenio Gianniti, Marco Lattuada. Optimal Resource Allocation of Cloud-Based Spark Applications. Submitted
Cloud Computing is assuming a relevant role in the ICT world, changing the way applications are designed, developed, and operated. The cloud offers many useful services application developers and operators can rely upon, but the adoption of such services requires specific expertise. In fact, such services often offer proprietary APIs and show very differentiated Quality of Service (QoS) characteristics. Thus, an approach and tools that guide designers, developers and operators through the adoption of specific cloud solutions is certainly required.
SPACE4Cloud supports the design-time analysis of cloud-based applications and the identification of the optimal strategy for allocating application components onto the services offered by cloud providers. In particular, the tool determines the application cloud configuration that minimizes the execution costs, fulfilling at once QoS and service allocation constraints. SPACE4Cloud embodies an effective meta-heuristic, implementing both Tabu Search and GRASP paradigms for design-time exploration.
SPACE4Cloud has been developed within the framework of the MODAClouds FP7 research project.
- Download: https://github.com/deib-polimi/modaclouds-space4cloud/blob/master/installation/install-docker.sh
- License: Apache 2.0
- Reference: Michele Ciavotta, Giovanni Paolo Gibilisco, Danilo Ardagna, Elisabetta Di Nitto, Marcos Aurelio Almeida da Silva Architectural Design of Cloud Applications: a QoS-aware Cost Minimization Approach. Under preparation.