Thesis

This section of the Web site provides theses and projects proposals for students. The main topics are Cloud computing, Big Data cluster management, Deep Learning applications and GPGPUs systems performance evaluation.

If you are interested in one of the proposal, contact me by e-mail.

Bayesian Optimization for Sizing Big Data and Deep Learning Applications Cloud Clusters

Advisors: Prof. Alessandra Guglielmi, Prof. Danilo Ardagna

Today data mining, along with general big data analytic techniques, are heavily changing our society, e.g., in the financial sector or healthcare. Companies are becoming more and more aware of the benefits of data processing technologies; across almost any sector most of the industries use or plan to use machine learning techniques.

In particular, deep learning methods are gaining momentum across various domains for tackling different problems, ranging from image recognition and classification to text processing and speech recognition.

Picking the right cloud cluster configuration for recurring big data/deep learning analytics is hard, because there can be tens of possible virtual machines/GPUs instance types and even more cluster sizes to pick from. Choosing poorly can lead to performance degradation and higher costs to run an application. However, it is challenging to identify the best configuration from a broad spectrum of cloud alternatives.

The goal of this thesis is to identify novel Bayesian Optimization methods to build performance models for various big data and deep learning applications based on Spark, the most promising big data framework which will probably dominate the big data market in the next 5-10 years.

The aim of this research work is to building accurate machine learning models to estimate the performance of Spark applications (possibly running on GPU clusters) by considering only few test runs on reference systems and identify optimal or close to optimal configurations.  Bayesian methods will be mixed with traditional techniques for performance modelling, which includes computer systems simulations or bounding techniques.

References

  1. Brochu, V. M. Cora, N. de Freitas. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning.
  2. Snoek, H. Larochelle, R. P. Adams. Practical Bayesian Optimization of Machine Learning Algorithms.
  3. Venkataraman, Z. Yang, M. Franklin, B. Recht, I. Stoica. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics. NSDI 2016 Proceedings.
  4. Alipourfard, H. H. Liu, J. Chen, S. Venkataraman, M. Yu, M. Zhang. CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics. NSDI 2017 Proceedings.

Machine Learning techniques to Model Data Intensive and Deep Learning Applications Performance

Nowadays, Big Data are becoming more and more important. Many sectors of our economy are now guided by data-driven decision processes. Spark is becoming the reference framework while at the infrastructural layer, cloud computing provides flexible and cost-effective solutions for allocating on-demand large clusters, often based on GPGPUs.  In order to obtain an efficient use of such resources, it is required a performance model of such systems being at the same time precise and efficient to use.

One common way to model ICT systems performance makes use of analytical models like queueing networks or Petri nets. However, despite having a great accuracy in performance prediction, their significant computational complexity limits their usage. Machine learning techniques can solve this problem and develop models being accurate and scalable at the same time.

This thesis involves the development and validation of models for Big Data clusters based on Spark or based on GPGPUs to support deep learning applications training.  The research work will compare multiple machine learning algorithms like Support Vector Regression, Linear regression, Random Forests, Neural Network and will develop feature engineering solutions to identify compact and, possibly, interpretable models to predict the performance of large clusters.

References

  1. Venkataraman, Z. Yang, M. Franklin, B. Recht, I. Stoica. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics. NSDI 2016 Proceedings.
  2. N. Yigitbasi, T. L. Willke, G. Liao, D. Epema, Towards machine learning-based auto-tuning of MapReduce. MASCOTS 2013, 11-20.
  3. A. D. Popescu, A. Balmin, V. Ercegovac, A. Ailamaki, Predict: Towards predicting the runtime of large scale iterative analytics, VLDB  2013, 1678–1689.
  4. Ataie, E. Gianniti, D. Ardagna, A. Movaghar. A Combined Analytical Modeling Machine Learning Approach for Performance Prediction of MapReduce Jobs in Cloud Environment. SYNASC 2016: 431-439.

 Robust Games for the Run-time Management of Cloud Systems

Cloud Computing aims at streamlining the on-demand provisioning of software, hardware, and data as services, providing end-user with flexible and scalable services accessible through the Internet. Since the Cloud offer is currently becoming wider and more attractive to business owners, the development of efficient resource provisioning policies for Cloud-based services becomes increasingly challenging. Indeed, modern Cloud services operate in an open and dynamic world characterized by continuous changes where strategic interaction among different economic agents takes place.

This thesis aims to study the run-time service provisioning and capacity allocation problem through the formulation of a mathematical model based on noncooperative-game-theoretic approach. We take the perspective of Software as a Service (SaaS) providers which want to minimize the costs associated with the virtual machine/container instances allocated in a multi-IaaSs (Infrastructure as a Service) scenario, while avoiding incurring in penalties for requests execution failures and providing quality of service guarantees. SaaS providers compete and bid for the use of infrastructural resources, while the IaaSs want to maximize their revenues obtained providing the underlying resources. The thesis will focus also on the uncertainty related to workload prediction and estimate of the resource demands leading to a “robust” game.

References

  1. D. Bertsimas, M. Sim. The price of robustness. Operations Research, 52(1):35–53, 2004.
  2. D. Ardagna, B. Panicucci, M. Passacantando. A Game Theoretic Formulation of the Service Provisioning Problem in Cloud Systems. WWW 2011, 177-186.
  3. D. Ardagna, M. Ciavotta, M. Passacantando. Generalized Nash Equilibria for the Service Provisioning Problem in Multi-Cloud Systems. IEEE Trans. Services Computing 10(3): 381-395, 2017.

Hierarchical Resource Management of VeryLarge Cloud Platforms

Worldwide interest in the delivery of computing and storage capacity as a service continues to grow at a rapid pace. Thanks to development of virtualized and container-based systems and micro-services architecture, cloud platforms are becoming more and more flexible but their complexity require advanced resource management solutions that are capable of dynamically adapting the underlying infrastructure while providing continuous service and performance guarantees.

Cloud systems are continuously growing in terms of size: Today, cloud service centers include up to 10,000 servers and each server hosts several VMs and/or possibly more containers. In this context, centralized solutions are subject to critical design limitations, including a lack of scalability and expensive monitoring communication costs, and cannot provide fast and effective control.

The goal of this thesis is to devise resource allocation policies for virtualized and container-based environments that satisfy performance and availability guarantees and minimize operating costs (e.g., energy) of very large cloud service centers. The work will develop a scalable distributed hierarchical framework based on a mixed-integer nonlinear optimization acting at multiple timescales.

References

  1. Nowicki, M.S. Squillante, and C.W. Wu. Fundamentals of Dynamic Decentralized Optimization in Autonomic Computing Systems. Self-Star Properties in Complex Information Systems, 204-218, Springer-Verlag, 2005.
  2. Addis, D. Ardagna, B. Panicucci, M. S. Squillante, L. Zhang. A Hierarchical Approach for the Resource Management of Very Large Cloud Platforms. IEEE Trans. Dependable Sec. Comput. 10(5): 253-272, 2013.
  3. Sedaghat, F. Hernandez-Rodriguez, E. Elmroth. Decentralized Cloud Datacenter Reconsolidation Through Emergent and Topology-Aware Behavior. Future Generation Computer Systems, 56: 51–63, 2016.
  4. Farahnakian, T. Pahikkala, P. Liljeberg, J. Plosila, H. Tenhune. Hierarchical VM Management Architecture For Cloud Data Centers, CloudCom 2014, 306–311.

Some Previous Thesis Works