This section of the Web site provides theses and projects proposals for students. The main topics are Cloud computing, Edge computing, Deep Learning applications, GPGPUs systems performance evaluation, and Big Data cluster management.
If you are interested in one of the proposal, contact me by e-mail.
Optimal Component Placement and Runtime Management of Artificial Intelligence Applications in Edge Systems
Edge computing is a distributed computing paradigm that brings computation and data storage closer to the location where it is needed to improve execution times and save bandwidth. Nowadays, edge computing systems include remote cloud servers, edge servers, and sensors (which recently provide also some limited computing capabilities).
Artificial Intelligence (AI) is becoming pervasive today, with the AI software platforms worldwide market forecast to grow significantly through 2023, approaching USD 11.8 billion in revenue at a compound annual growth rate of 35.3%. Many of the benefits of this evolution will come from using edge computing resources. Many companies are evaluating the use of edge computing for data collection, processing, and online analytics to reduce applications latency and data transfers. A growing number of use cases, e.g., predictive maintenance, machine vision, and healthcare to name a few, can benefit from AI applications spanning edge-to-cloud infrastructures. Edge intelligence, i.e., edge-based inferencing, will become the foundation of all industrial AI applications while most new applications will involve some AI components at various levels of the edge-to-cloud infrastructure.
If on the one side the main advantage of edge systems is to improve applications performance by reducing the latency, on the other side, edge resources have usually less computing capacity than the cloud and can become a bottleneck in the computation. Moreover, the workload can fluctuate during the runtime because a different number of users can connect to the system or different data volumes can be generated in different times. Therefore, application component assignment to the resources should change so as to guarantee Quality of Service (QoS) constraints. QoS constraints usually include response time constraints predicating on application component execution times (e.g., a frame of a complex image-processing application needs to be processed in less than 100ms) or application throughput (e.g., 40 frames per seconds needs to be processed to identify security violations in a video surveillance system).
The goal of this project is to develop a fast heuristic method for optimizing the component placement of AI applications running in edge-to-cloud infrastructures and managing the edge system at runtime to cope with workload variations.
References
- Y.-C. L. ,. J.-X. H. a. H.-T. C. Ying-Dar Lin, “Three-Tier Capacity and Traffic Allocation for Core, Edges, and Devices for Mobile Edge Computing,” IEEE Transactions on Network and Service Management, vol. 15, no. 3, pp. 923-933, 2018.
- L. Z. ,. F. a. X. C. Jianbo Du, “Computation Offloading and Resource Allocation in Mixed Fog/Cloud Computing Systems With Min-Max Fairness Guarantee,” IEEE Transactions on Communications., vol. 66, no. 4, pp. 1594-1608, 2018.
- A. R. D. G. EREN BALEVI, “Optimizing the Number of Fog Nodes for Cloud-Fog-Thing Networks,” IEEE Access, vol. 6, pp. 11173-11183, 2018.
Optimal Partitioning of Deep Neural Networks and Artificial Intelligence models
Advisors: Prof. Matteo Matteucci, Prof. Danilo Ardagna
Deep Learning is a subset of Machine Learning methods based on artificial neural networks that attains great power and flexibility and frequently achieves human accuracy in many tasks (e.g., objects recognition, natural language processing, medical diagnoses, etc.). But deep learning models are usually complex networks including many layers which incur heavy computation even with the use of very powerful resources (e.g., servers with many cores, GPU multi-GPUs systems, or relying on specialized hardware accelerators).
The computing requirements of deep learning models have been historically satisfied by cloud systems, but depending on the target application, to comply with low latency and bandwidth requirements, edge computing technology can be used (edge computing is a distributed computing paradigm that brings computation and data storage closer to the location where it is needed to improve execution times and save bandwidth. Nowadays, edge computing systems include remote cloud servers, edge servers, and sensors which recently provide also some limited computing capabilities).
However, edge resources have usually less computing capacity than the cloud. Therefore, running a deep learning model on edge devices (which usually have frugal memory and processors) is a big challenge. Recent studies have shown that partitioning the deep learning layers between edge and cloud can be a good solution for this problem and allows reducing the latency and increasing the energy efficiency and data privacy for end-users. [6] [7] [8] [9] [10]
The goal of this research is to develop a heuristic method for splitting the layers of a deep learning model optimally across edge-to-cloud infrastructures fulfilling some constraints like bandwidth requirements or resource capacity (e.g., memory limits of the edge devices/sensors) while providing also some guarantees on the latency for the deep model inference.
References
- X. C. Z. Z. a. Q. L. Deyin Liu, ” Hiertrain: Fast hierarchical edge AI learning with hybrid parallelism in mobile-edge-cloud computing,” IEEE Open Journal of the Communications Society, vol. 1, p. 634–645, 2020.
- L. Z. Z. Z. a. X. C. En Li, “Edge AI: On-demand accelerating deep neural network inference via edge computing,” IEEE Transactions on Wireless Communications, vol. 19, no. 1, pp. 447-457, 2020.
- M. S. A. a. M. P.-d. Amir Erfan Eshratifar, “Jointdnn: An efficient training and inference engine for intelligent mobile cloud computing services.,” IEEE Transactions on Mobile Computing, vol. 20, no. 2, p. 565–576, 2021.
- B. M. a. H. K. Surat Teerapittayanon, “Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices,” in EEE 37th ICDCS, 2017.
- J. H. C. G. A. R. T. M. M. a. L. T. Yiping Kang, “Neurosurgeon: Collaborative IntelligenceBetween the Cloud and Mobile Edge,” in ACM ASPLOS ’17, 2017.
Bayesian Optimization for Sizing Big Data and Deep Learning Applications Cloud Clusters
Advisors: Prof. Alessandra Guglielmi, Prof. Danilo Ardagna
Today data mining, along with general big data analytic techniques, are heavily changing our society, e.g., in the financial sector or healthcare. Companies are becoming more and more aware of the benefits of data processing technologies; across almost any sector most of the industries use or plan to use machine learning techniques.
In particular, deep learning methods are gaining momentum across various domains for tackling different problems, ranging from image recognition and classification to text processing and speech recognition.
Picking the right cloud cluster configuration for recurring big data/deep learning analytics is hard, because there can be tens of possible virtual machines/GPUs instance types and even more cluster sizes to pick from. Choosing poorly can lead to performance degradation and higher costs to run an application. However, it is challenging to identify the best configuration from a broad spectrum of cloud alternatives.
The goal of this thesis is to identify novel Bayesian Optimization methods to build performance models for various big data and deep learning applications based on Spark, the most promising big data framework which will probably dominate the big data market in the next 5-10 years.
The aim of this research work is to building accurate machine learning models to estimate the performance of Spark applications (possibly running on GPU clusters) by considering only few test runs on reference systems and identify optimal or close to optimal configurations. Bayesian methods will be mixed with traditional techniques for performance modelling, which includes computer systems simulations or bounding techniques.
References
- Brochu, V. M. Cora, N. de Freitas. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning.
- Snoek, H. Larochelle, R. P. Adams. Practical Bayesian Optimization of Machine Learning Algorithms.
- Venkataraman, Z. Yang, M. Franklin, B. Recht, I. Stoica. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics. NSDI 2016 Proceedings.
- Alipourfard, H. H. Liu, J. Chen, S. Venkataraman, M. Yu, M. Zhang. CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics. NSDI 2017 Proceedings.
Machine Learning techniques to Model Data Intensive and Deep Learning Applications Performance
Nowadays, Big Data are becoming more and more important. Many sectors of our economy are now guided by data-driven decision processes. Spark is becoming the reference framework while at the infrastructural layer, cloud computing provides flexible and cost-effective solutions for allocating on-demand large clusters, often based on GPGPUs. In order to obtain an efficient use of such resources, it is required a performance model of such systems being at the same time precise and efficient to use.
One common way to model ICT systems performance makes use of analytical models like queueing networks or Petri nets. However, despite having a great accuracy in performance prediction, their significant computational complexity limits their usage. Machine learning techniques can solve this problem and develop models being accurate and scalable at the same time.
This thesis involves the development and validation of models for Big Data clusters based on Spark or based on GPGPUs to support deep learning applications training. The research work will compare multiple machine learning algorithms like Support Vector Regression, Linear regression, Random Forests, Neural Network, XGBoost and will develop feature engineering solutions to identify compact and, possibly, interpretable models to predict the performance of large clusters.
References
- A. Maros, F. Murai, A. P. Couto da Silva, J. M. Almeida, M. Lattuada, E. Gianniti, M. Hosseini, D. Ardagna. Machine Learning for Performance Prediction of Spark Cloud Applications. IEEE Cloud 2019 Proceedings. 99-106. Milan, Italy.
- E. Gianniti, L. Zhang, D. Ardagna. Performance Prediction of GPU-based Deep Learning Applications. Closer 2019 Proceedings.
- M. Lattuada, E. Gianniti, M. Hosseini, D. Ardagna, A. Maros, F. Murai, A. P. Couto da Silva, J. M. Almeida. Gray-Box Models for Performance Assessment of Spark Applications. Closer 2019 Proceedings.
Job Scheduling and Optimal Capacity Allocation Problems for Deep Learning Training Jobs with Stochastic Execution Times
The Deep Learning (DL) paradigm has gained remarkable popularity in the last few years. DL models are often used to tackle complex problems in the fields of, e.g., image recognition and healthcare; however, the training of such models requires a very large computational power. The recent adoption of GPUs as general-purpose parallel processors has partially fulfilled this need, but the high costs related to this technology, even in the Cloud, dictate the necessity of devising efficient capacity planning and job scheduling algorithms to reduce operational costs via resource sharing. Starting from an already developed heuristic approach, based on random greedy and path relinking, to tackle the joint capacity planning and job scheduling problem, the aim of this work is to extend the existing framework to a more general setting, exploring the stochastic nature of the jobs’ expected execution times due to the variability of the number of iterations needed to achieve a target accuracy in the training process.
References
- Arezoo Jahani, Marco Lattuada, Michele Ciavotta, Danilo Ardagna, Edoardo Amaldi, Li Zhang: Optimizing on-demand GPUs in the Cloud for Deep Learning Applications Training. ICCCS2019: 1-8
- Federica Filippini, Marco Lattuada, Michele Ciavotta, Arezoo Jahani, Danilo Ardagna, Edoardo Amaldi, Hierarchical Scheduling in on-demand GPU-as-a-Service Systems, SYNASC2020: 125-132″
- NVIDIA. The Challenge of Scaling to Meet the Demands of Modern AI and Deep learning. http://images.nvidia.com/content/pdf/dgx-2-print-datasheet-738070-nvidia-a4-web.pdf
AutoML++ Optimization of Deep Networks
Advisors: Prof. Matteo Matteucci, Prof. Danilo Ardagna
Cloud AutoML is a suite of machine learning products that enables developers with limited machine learning expertise to train high-quality models specific to their business needs, by leveraging Google’s state-of-the-art transfer learning, and Neural Architecture Search technology.
Deep neural networks form a powerful framework for machine learning and have achieved a remarkable performance in several areas in recent years. However, despite the compelling arguments for using neural networks as a general template for solving machine learning problems, training these models and designing the right network for a given task has been filled with many theoretical gaps and practical concerns.
To train a neural network, one needs to specify the parameters of a typically large network architecture with several layers and units, and then solve a difficult non-convex optimization problem. Moreover, if a network architecture is specified a priori and trained using back-propagation, the model will always have as many layers as the one specified a priori. Since not all machine learning problems admit the same level of difficulty and different tasks naturally require varying levels of complexity, complex models trained with insufficient number of layers can provide unsatisfactory accuracy. AutoML helps in automatically changing the network architecture and its parameters. The goal of this thesis is to: (i) compare and analyse available AutoML opensource toolkits, (ii) integrate one of such toolkit with the performance analysis tools developed at Politecnico di Milano, (iii) provide a Bayesian optimization framework that extends AutoML toolkits to drive the search of the best deep (convolutional) neural networks architecture providing also execution time/budget guarantees (e.g., run 100,000 epochs in <8 h, cost <1K$).
References
- Google. Cloud AutoMLBETA https://cloud.google.com/automl
- AdaNet. https://github.com/tensorflow/adanet
- Corinna Cortes, Xavi Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang. AdaNet: Adaptive Structural Learning of Artificial Neural Networks. https://arxiv.org/pdf/1607.01097.pdf
- Eugenio Gianniti, Li Zhang, Danilo Ardagna. Performance Prediction of GPU-based Deep Learning Applications. SBAC-PAD 2018 Proceedings. Lyon, France.
Robust Games for the Run-time Management of Cloud Systems
Cloud Computing aims at streamlining the on-demand provisioning of software, hardware, and data as services, providing end-user with flexible and scalable services accessible through the Internet. Since the Cloud offer is currently becoming wider and more attractive to business owners, the development of efficient resource provisioning policies for Cloud-based services becomes increasingly challenging. Indeed, modern Cloud services operate in an open and dynamic world characterized by continuous changes where strategic interaction among different economic agents takes place.
This thesis aims to study the run-time service provisioning and capacity allocation problem through the formulation of a mathematical model based on noncooperative-game-theoretic approach. We take the perspective of Software as a Service (SaaS) providers which want to minimize the costs associated with the virtual machine/container instances allocated in a multi-IaaSs (Infrastructure as a Service) scenario, while avoiding incurring in penalties for requests execution failures and providing quality of service guarantees. SaaS providers compete and bid for the use of infrastructural resources, while the IaaSs want to maximize their revenues obtained providing the underlying resources. The thesis will focus also on the uncertainty related to workload prediction and estimate of the resource demands leading to a “robust” game.
References
- D. Bertsimas, M. Sim. The price of robustness. Operations Research, 52(1):35–53, 2004.
- D. Ardagna, B. Panicucci, M. Passacantando. A Game Theoretic Formulation of the Service Provisioning Problem in Cloud Systems. WWW 2011, 177-186.
- D. Ardagna, M. Ciavotta, M. Passacantando. Generalized Nash Equilibria for the Service Provisioning Problem in Multi-Cloud Systems. IEEE Trans. Services Computing 10(3): 381-395, 2017.
Some Previous Thesis Works
- Federica Filippini. Job Scheduling and Optimal Capacity Allocation Problems for Deep Learning Training Jobs. Politecnico di Milano. 2020.
- Giacomo Bossi. Predictive Analysis of Deep Neural Networks in Cloud-Edge Computing Continuum. Politecnico di Milano. 2020.
- Matteo Vantadori. Pareto-Optimal Progressive Neural Architecture Search. Politecnico di Milano. 2020.
- José Villafan. Bayesian Optimization of Expensive Black-Box Functions in Big Data Analytics via Feature Selection. Politecnico di Milano. 2020.
- Eugenio Gianniti. Performance Models, Design and Run Time Management of Big Data Applications. Politecnico di Milano. 2018.
- Vahid Heidari. A Hybrid Machine Learning Approach for Big Data Performance Evaluation. Politecnico di Milano. 2018.
- Jacopo Rigoli. A Design-time Optimization Framework for Private Cloud Big Data Systems. Politecnico di Milano. 2017.
- Giovanni Paolo Gibilisco. A Methodology and a Tool for QoS-Oriented Design of Multi-Cloud Applications. Politecnico di Milano. 2016.
- Eugenio Gianniti. Game Theory Models for MapReduce: Joint Admission Control and Capacity Allocation. Politecnico di Milano. 2015.
- Michele Guerriero. Strategies for Cloud Systems Run-time Adaptation. Politecnico di Milano. 2015.
- Riccardo Desantis. A Methodology and a Tool for the Design Time Exploration of Multi-Clouds Applications. Politecnico di Milano. 2014.
- Anna Savi. Service Provisioning Problem in Cloud and Multi-Cloud Systems: a Generalized Nash Equilibrium model. Politecnico di Milano. 2013.
- Davide Molinari. Multiple time-scale Auto-Scaling Algorithms for Multi-Cloud IaaS Systems. Politecnico di Milano. 2013.
- Ettore Trevisiol. Generalized Nash Equilibria for the Service Provisioning Problem in Multi-Cloud Systems. Politecnico di Milano. 2013.