Thesis

This section of the Web site provides theses and projects proposals for students. The main topics include Cloud computing, Edge computing, Deep Learning & AI based augmented reality applications, and GPU-based clusters scheduling.

If you are interested in one of the proposals, contact me by e-mail.

Runtime management and optimization of AI agentic workflows

Artificial intelligence chatbots, powered by generative AI, have transformed user interactions by generating responses based on single exchanges. These systems rely on natural language processing (NLP) to deliver accurate replies to user queries. However, the next evolution in AI—Agentic AI—promises to go beyond simple query-response mechanisms, employing advanced reasoning and iterative planning to autonomously solve complex problems and revolutionize productivity across industries.

Agentic AI is poised to be a transformative force in the global economy. According to market research, the Agentic AI market is projected to grow from $7.28 billion in 2025 to $41.32 billion by 2030, with a compound annual growth rate (CAGR) of 41.48% during this period. This exponential growth reflects increasing adoption across industries such as automotive, healthcare, and enterprise software. Gartner has identified Agentic AI as the top tech trend for 2025, predicting that by 2028, 33% of enterprise software applications will include Agentic AI capabilities, enabling autonomous decision-making for 15% of day-to-day work tasks.

The rapid expansion of Agentic AI underscores its potential to streamline operations, enhance customer experiences, and drive innovation.

The proposed thesis aims to develop a runtime management framework for Agentic AI systems using reinforcement learning (RL). The framework will dynamically select agents and tools while leveraging the principle of “test of time” to optimize application accuracy and ensure efficient resource consumption.

Technical Objectives

Dynamic Agent Selection: The framework will use RL algorithms to autonomously choose the most appropriate agents and tools for specific tasks based on real-time data inputs.
Resource-Conscious Optimization: By balancing computational demands with operational efficiency, the system will ensure sustainable resource utilization while maintaining high accuracy levels.
Feedback-Driven Adaptation: Incorporating continuous feedback loops will enable iterative improvement in decision-making processes and model performance.

With the Agentic AI market expanding rapidly and its applications becoming increasingly diverse, this thesis represents a timely opportunity to contribute cutting-edge innovations that address both technical challenges and industry needs.

References

NVIDIA. What is Agentic AI? https://blogs.nvidia.com/blog/what-is-agentic-ai/
M. Purdy. What Is Agentic AI, and How Will It Change Work? Harward Business Review.

A Reinforcement Learning framework for the runtime management of next generation Smart Glasses (in Collaboration with Luxottica)

Virtual reality and augmented reality have gained popularity during the last decades.
Companies are pushing efforts towards making these technologies profitable for everyday life.
This is the reason why some companies are designing smart glasses that can be used in healthcare, industry, and entertainment. Those smart glasses are meant to run Artificial Intelligence (AI) applications that rely on Deep Neural Networks. Most of these applications have real-time constraints that must be considered while designing the smart glasses. Unfortunately, despite the advancement of chip design technology, many devices still have limited computational and energy capacity when running AI applications. One of the adopted solutions is the offloading of some computations to an edge device or to the cloud to alleviate the workload on the smart glasses, hence reduce the power consumption. The offloading comes at a price of data transfer latency, i.e. when the smart glasses choose to offload parts of its computation it should send data to the other device using a network channel which introduces a latency that depends on the channel throughput.
As there are real-time constraints, we should ensure that the end-to-end execution time (computing time, data transfer time) on all the devices will remain below a specific value. In this case there should be a mechanism to switch among different DNN configurations considering the state of the networks, the current battery capacity of the smart glasses to ensure that the constraints are met.

This project aims at designing a Federated Reinforcement Learning (RL) framework for runtime management of the next generation smart glasses. A RL agent will choose at runtime which DNN configuration to run to minimize the energy consumption and the 5G connection cost while considering the real-time constraint (End-to-end execution time below the maximum admissible value). The framework should be DNN independent which means that it should support different DNNs with different configurations and partitioning points.

References

A. W. Kambale, H. Sedghani, F. Filippini, G. Verticale, D. Ardagna. Runtime Management of Artificial Intelligence Applications for Smart Eyewears. UCC 2023. 31-39, 2023.
Y. Kang et al., Neurosurgeon: Collaborative intelligence between the cloud and mobile edge.
Proc. 22nd Int. Conf. Archit. Support Program. Lang. Oper. Syst. (ASPLOS), pp. 615-629, 2017.
N. M. Kumar, N.K. Singh, and V. Peddiny. Wearable smart glass: Features, applications, current progress and challenges. In 2018 Second International Conference on Green Computing and Internet of Things (ICGCIoT), pages 577–582. IEEE, 2018.
H.-N. Wang et al., “Deep reinforcement learning: A survey,” Front. Inf. Technol. Electron. Eng.,vol. 21, no. 12, pp. 1726–1744, 2020.

Job Scheduling and Optimal Capacity Allocation Problems for Distributed Deep Learning Training Jobs on Multiple GPU-accelerated Nodes

The Deep Learning (DL) paradigm has gained remarkable popularity in the last few years. DL models are often used to tackle complex problems in the fields of, e.g., image recognition and healthcare; however, the training of such models requires a very large computational power. The recent adoption of GPUs as general-purpose parallel processors has partially fulfilled this need, but the high costs related to this technology, even in the Cloud, dictate the necessity of devising efficient capacity planning and job scheduling algorithms to reduce operational costs via resource sharing. Starting from an already developed heuristic approach, that allows to execute jobs on one or multiple GPUs located in a single node on premises or in the cloud, the aim of this work is to extend the existing framework to a more general setting, where DL applications are initially allocated locally and remote cloud execution is explored only when convenient to support cloud burst.

References

F. Filippini, J. Anselmi, D. Ardagna, B. Gaujal. A Stochastic Approach for Scheduling AI Training Jobs in GPU-based Systems. IEEE Transactions on Cloud Computing. 12(1): 53-69, 2024.
Filippini, M. Lattuada, M. Ciavotta, A. Jahani, D. Ardagna, E. Amaldi. A Path Relinking Method for the Joint Online Scheduling and Capacity Allocation of DL Training Workloads in GPU as a Service Systems. IEEE Transactions on Services Computing. 16(3). 1630-1646. 2023.

Bee Colony for Mixed-Variable Optimization in Edge-Cloud

Optimization challenges in edge-cloud computing systems often involve both continuous and discrete variables, requiring efficient and scalable solution strategies. Identifying the optimal configuration for such a complex infrastructure is challenging but necessary, as incorrect configurations can lead to increased computational, energy, and monetary costs for both providers and users.

In this thesis, we propose a Swarm Intelligence framework based on the Artificial Bee Colony (ABC) algorithm for mixed-variable optimization. ABC, inspired by the foraging behaviour of honeybee swarms, is a population-based metaheuristic used to solve numerical and combinatorial optimization problems.

Building upon our existing framework, which effectively handles both continuous and discrete variables, we aim to further enhance its capabilities for improved resource allocation, task scheduling, and performance optimization in edge-cloud systems.

References

Karaboga, D., & Akay, B. (2009). A comparative study of Artificial Bee Colony algorithm. Applied Mathematics and Computation, 214(1), 108-132. https://doi.org/10.1016/j.amc.2009.03.090

Some Previous Thesis Works

Benedetta Presicci. Training Reinforcement Learning agents for the computing continuum: the FIGARO framework. 2024
Andrea De Bettin, Fabio Lavezzo. FIGARO: reinForcement learnInG mAnagement acRoss computing cOntinua. Politecnico di Milano. 2023.
Roberto Sala. A Stackelberg Game approach for managing an AI application in Mobile Edge Cloud systems. Politecnico di Milano. 2023.
Luca Crippa. Profiling and partitioning of Deep Neural Networks on multiple devices. Politecnico di Milano. 2023.
Randeep Singh. SPACE4AI-R: Runtime Management Tool for AI Applications Com- ponent Placement and Resource Selection in Computing Continua. Politecnico di Milano. 2022.
Lorenzo Marchi. A stochastic approach for scheduling AI training jobs in GPU-based systems. Politecnico di Milano. 2022.
Enrico Galimberti. OSCAR-P: a framework for automating application profiling in the computing continuum. Politecnico di Milano. 2022.
Federica Filippini. Job Scheduling and Optimal Capacity Allocation Problems for Deep Learning Training Jobs. Politecnico di Milano. 2020.
Giacomo Bossi. Predictive Analysis of Deep Neural Networks in Cloud-Edge Computing Continuum. Politecnico di Milano. 2020.
Matteo Vantadori. Pareto-Optimal Progressive Neural Architecture Search. Politecnico di Milano. 2020.
Eugenio Gianniti. Performance Models, Design and Run Time Management of Big Data Applications. Politecnico di Milano. 2018.
Vahid Heidari. A Hybrid Machine Learning Approach for Big Data Performance Evaluation. Politecnico di Milano. 2018.
Jacopo Rigoli. A Design-time Optimization Framework for Private Cloud Big Data Systems. Politecnico di Milano. 2017.
Giovanni Paolo Gibilisco. A Methodology and a Tool for QoS-Oriented Design of Multi-Cloud Applications. Politecnico di Milano. 2016.
Eugenio Gianniti. Game Theory Models for MapReduce: Joint Admission Control and Capacity Allocation. Politecnico di Milano. 2015.

Some Previous Thesis Works

Current Projects

Previous Projects