Vacancy Edu

15 PhD Degree-Fully Funded at Inria, France

Inria, France invites online Application for number of  Fully Funded PhD Degree at various Departments. We are providing a list of Fully Funded PhD Programs available atNational Institute for Research in Computer Science and Automation (Inria), France.

Eligible candidate may Apply as soon as possible.

 

(01) PhD Degree – Fully Funded

PhD position summary/title: PhD Position F/M Towards discovering information from very heterogeneous data sources in a “data lake” environment

Context: Heterogeneous Data Lakes Exploiting datasets requires identifying what each dataset contains and what it is about. Users with Information Technology skills may do this using dataset schemata, documentation, or by querying the data. In contrast, non-technical users (NTUs, in short) are limited in their capacity to discover interesting datasets. This hinders their ability to develop useful or even critical applications. The problem is compounded by the large number of datasets which NTUs may be facing, in particular when value lies in exploiting many datasets together, as opposed to one or a few at a time, and when datasets are of different data models, e.g., tables or relational databases, CSV files, hierarchical formats such as XML and JSON, PDF or text documents, etc. Following our team’s experience in collaborating with French media journalists [1, 4] and ongoing collaboration with the International Consortium of Investigative Journalists (ICIJ), we will primarily draw inspiration from journalist NTU applications. These include several high-profile journalistic investigations based on massive, heterogeneous digital data, e.g., Paradise Papers or Forever Pollutants. The setup we consider is: how to help NTUs identify useful parts of very large sets of heterogeneous datasets, assemble and discover the information from these datasets. For example, faced with a corpus of thousands or tens of thousands files (text, spreadsheets, etc.), a journalist may want to know: what subventions did the Region grant, and where geographically? or What shipping companies have shipped on routes towards Yemen, and who contracted with them? The tools we aim to develop generalize also beyond journalism, for instance, to enterprise data lakes containing documents and various internal datasets, scientific repositories with reports and experimental results, etc. State of the art Many techniques and systems target one dataset (or database), of one data model. NTUs are used to work with documents, such as texts, PDFs, or structured documents in Office formats, on which Information Retrieval (IR) enables efficient keyword searches. Large Language Models (LLMs, in short), and tools built on top of them, such as chatbots, or Google’s NotebookLLM, add unprecedented capacities to summarize and answer questions over documents provided as input. However, because of possible hallucinations [9, 15], LLM answers still require manual verification before use in a setting with real-world consequences. In particular, a recent study has shown high error rate on the task of identifying the source of a news, across 8 major chatbots [8]. LLMs are not reliable information sources also (i) for realworld facts that happened after their latest training input (ii) for little-known entities not in the training set, e.g., a small French company active in a given region. Finally, LLMs hosted outside of the user’s premises are not acceptable for users such as the ICIJ, for which dataset confidentiality during their investigation is crucial; locally deployed models are preferable for confidentiality, and smaller (frugal) ones also reduce the computational footprint. While we consider that language models should not be taken a reliable sources of knowledge, they are crucial ingredients for matching (bridging) user questions with answer components from various datasets, thanks to the semantic embeddings we can compute for the questions and the data. Database systems allow users to inspect and use the data via queries. NTUs find these unwieldy, especially if multiple systems must be used for multiple data models. Natural language querying leverages trained language models to formulate structured database queries, typically SQL ones [10]. However, errors still persist in the translation, and SQL is not applicable beyond relational data. Keyword search in database returns sub-trees or sub-graphs answers from a large data graph, which may model a relational database, an XML document, an RDF graph etc., e.g., [2]. However, these techniques have not been scaled up to large sets of datasets. Challenges Dataset summarization and schema inference have been used to extract, from a given dataset, e.g., an XML, JSON, or Property Graphs (PG) one, a suitable schema [3, 6, 11, 14], which is a technical specification that experts or systems can use to learn how the data is structured; each technique is specific to one data model only. Dataset abstraction [5] identifies, in a (semi-)structured dataset, nested entities, and binary relationships (only). Generalizing it to large numbers of datasets, and to text-rich documents, is also a challenge. More recent data lakes hold very large sets (e.g., tens or hundreds of thousands) of datasets, each of which may be large [7]. In a data lake, one may search for a dataset using keywords or a question, for a dataset which can be joined with another [13]. However, modeling, understanding, and exploring large, highly heterogeneous collections of datasets (other than tables) are still limited in data lakes.

Deadline : 2025-12-31

View details & Apply

 

(02) PhD Degree – Fully Funded

PhD position summary/title: PhD Position F/M LLM-Powered Continuous Evolution of Scientific Computing Software

The mission of this thesis revolves mainly around conducting research of excellence, which the DiverSE team strives to achieve.
A state-of-the-art review will be one of the first activities in order to better prepare the ground for the implementation of solutions and prototypes, as well as for conducting empirical experiments for a rigorous evaluation of contributions.

Deadline : 2025-12-19

View details & Apply

 

View All Fully Funded PhD Positions Click Here

 

(03) PhD Degree – Fully Funded

PhD position summary/title: PhD Position F/M Topology Design for Decentralized Federated Learning

The goal of this PhD is to propose algorithms to design the communication topology for decentralized federated learning with the goal of minimizing the total training duration, taking into account how connectivity affect both the number of rounds required and the duration of a single round.
Several settings will be considered: in particular, one may construct the topology in a pre-processing step (prior to learning), or dynamically while learning. Dynamic topology design can be a way to tackle online decentralized learning [asadi22,marfoq23], where the topology is adjusted and refined as clients collect more data.
The candidate will also investigate how to practically quantify the similarity of local data distributions during training in order to exploit the advantage of having a neighborhood representative of the average population distribution [lebars23,dandi22].
Finally, he/she will also study to what extent the existing results can be extended to asymmetric communication links and other distributed optimization algorithms like push-sum ones [kempe03,benezit10].

Deadline : 2025-12-13

View details & Apply

 

(04) PhD Degree – Fully Funded

PhD position summary/title: PhD Position F/M Trustworthy AI-driven interpretation of malware attack behaviours

This PhD thesis will be funded by ANR PEPR Project DefMal. We will focus on conducting 3 missions in this thesis. 

Mission 1: Self-Supervised Learning for Unsupervised Grouping of Malware Behaviors

Traditional malware clustering and family attribution heavily rely on labeled datasets, which are costly to produce, quickly become outdated, and often fail to capture the full spectrum of evolving threats. This mission focuses on leveraging self-supervised learning (SSL) to automatically discover meaningful behavioral patterns in unlabeled malware data—such as network traffic flows from botnets or dynamic execution traces from sandboxed malware—without requiring human-annotated labels.

From a Trustworthy AI perspective, this approach enhances reliability and scalability by reducing dependence on potentially biased or incomplete ground truth. By learning representations directly from raw or minimally processed behavioral data (e.g., system call sequences, API logs, or packet timing features), SSL models can uncover latent structures that reflect real-world attack tactics, such as lateral movement, persistence mechanisms, or command-and-control (C2) communication patterns. Crucially, the learned clusters must be interpretable—not just statistically coherent—so that analysts can understand why certain samples are grouped together. To achieve this, we will integrate contrastive learning frameworks with behavioral feature engineering and post-hoc explanation techniques, enabling human analysts to validate and refine the groupings. This mission thus lays the foundation for trustworthy, label-free malware intelligence that supports proactive threat hunting and early detection of emerging campaigns.

Mission 2: Interpreting and Classifying Malware Sandbox Traces Using AI Models (e.g., Large Language Models)

Sandbox-generated execution traces—comprising sequences of system calls, file operations, registry changes, and network activity—are rich sources of behavioral insight. However, their high dimensionality, noise, and variability make automated analysis challenging. This mission explores the adaptation of Large Language Models (LLMs) and other sequence-based AI architectures (e.g., Transformers, LSTMs) to interpret, summarize, and classify these traces, with a focus on detecting zero-day or novel attack behaviors.

Rather than treating traces as mere input sequences, we will frame malware behavior interpretation as a semantic understanding task, where LLMs are fine-tuned to recognize patterns analogous to “attack narratives” (e.g., privilege escalation → credential dumping → C2 beaconing). By pre-training on vast corpora of benign and malicious execution logs and incorporating domain-specific knowledge (e.g., MITRE ATT&CK tactics), these models can generate natural language explanations of observed behaviors, significantly improving transparency and analyst trust.

A key challenge is ensuring that the model’s predictions are faithful and actionable. We will therefore integrate explainability-by-design mechanisms—such as attention visualization, salient path extraction, and counterfactual reasoning—to allow analysts to interrogate the model’s logic. For instance, if the model flags a trace as “likely ransomware,” it should highlight the specific file encryption loops and mutex creation patterns that led to this conclusion. This mission advances Trustworthy AI by bridging the gap between complex model outputs and human-understandable cybersecurity decision-making, enabling faster response to previously unseen threats.

Mission 3: Enhancing AI Model Robustness Against Intentionally Modified Malware Behaviors

Malware authors routinely employ evasion techniques—such as code obfuscation, control-flow manipulation, and polymorphism—to bypass signature-based and even AI-driven detection systems. This mission addresses the robustness pillar of Trustworthy AI by developing defense mechanisms that maintain high detection accuracy even when adversaries attempt to perturb static features (e.g., packed binaries) or dynamic behaviors (e.g., delayed payload execution, randomized API calls).

We will investigate adversarial trainingbehavioral invariant learning, and anomaly detection under distribution shift to build models that focus on core malicious intent rather than superficial, easily manipulated features. For example, while a malware sample may alter its entry point or API call order, its underlying objective—such as injecting code into a remote process—may leave consistent semantic footprints in sandbox traces. By modeling these high-level attack semantics, we aim to create AI systems that are resilient to both known and unknown evasion strategies.

Additionally, we will explore uncertainty quantification and confidence-aware prediction, so that the system can flag inputs where its decision is uncertain—potentially indicating adversarial tampering—thereby supporting human-in-the-loop validation. This mission ensures that the AI models are not only accurate under normal conditions but also reliable and defensible in adversarial settings, a critical requirement for deploying AI in real-world cybersecurity operations.

Deadline : 2025-12-09

View details & Apply

 

(05) PhD Degree – Fully Funded

PhD position summary/title: PhD Position F/M Filippov Solutions for Discontinuous Differential-Algebraic Equations (DAEs): Control and Simulation

Differential-algebraic equations (DAEs) arise naturally when modeling dynamical systems from first principles. In many cases, physical laws are expressed as combinations of differential and algebraic equations. This modeling approach is common in constrained mechanics, chemical and biological processes, power systems, and especially analog circuit design—where idealized components (e.g., resistors, capacitors, inductors) and Kirchhoff’s laws define the system dynamics. When these systems experience abrupt changes—such as switching in electric circuits, mechanical contacts, or discontinuous control inputs—discontinuous DAEs emerge. However, there is currently no comprehensive theoretical foundation for studying such systems. Challenges include:

  • Their hybrid behaviors, which differ significantly from ODE counterparts,
  • The inconsistent initialization problem caused by switching and algebraic constraints,
  • The occurrence of Dirac impulses due to state jumps.

Deadline :2025-12-06

View details & Apply

 

Polite Follow-Up Email to Professor : When and How You should Write

Click here to know “How to write a Postdoc Job Application or Email”

 

(06) PhD Degree – Fully Funded

PhD position summary/title: PhD Position F/M From stealthy AI audits to stealthy AI attacks: missing links

We will to take classic black-box audit metrics and try to combine/expand them to more
intrusive ones, in particular for other critical features from the attacked model. Here are two related scenarios: 1) a single run of a fingerprinting-oriented audit can assess if the observed model is compliant. Combining multiple such observations along the time dimension may lead to interesting attacks, as one might track the evolution of a model. 2) The efficient identification of remote models through stealthy fingerprinting will lead to more query efficient strategies for attackers, perfectly tailored for the target model (e.g. accurately targeted adversarial examples), increasing the severity of attacks.
Means: From the information hierarchy in T1, and this compositional attack effort, skills in AI security will be leveraged to come out with original attack configurations. These are expected to borrow stealthiness and the economic related aspect from our AI audit background.

Deadline : 2025-11-16

View details & Apply

 

(07) PhD Degree – Fully Funded

PhD position summary/title: PhD Position F/M Cost and Performance-Efficient Caching for Massively Distributed Systems

The ever-growing number of services and Internet of Things (IoT) devices has resulted in data being distributed across different locations (regions and countries). Additionally, data exhibits different usage patterns, including cold data (written once and never read), stream data (produced once and consumed by many), and hot data (written once and consumed by many). Furthermore, these data types have different performance and dependability requirements (e.g., low latency for data streams). 

Data caching is a widely used technique that improves application performance by storing data on high-speed devices close to end users. Most research on data caching has focused on the benefits of different data placement strategies (i.e., which data to place in the cache), data movement, cache partitioning, cache eviction [1, 2, 3, 4, 5, 6, 7, 8], and on realizing cost-efficient data redundancy techniques in caching systems [9]. However, few efforts have studied data management when caches are distributed across different platforms (Edge-to-Cloud), utilize heterogeneous storage devices (in terms of performance and cost), and serve multiple, diverse applications, including traditional data services, serverless workflows and data streaming. 

Deadline : 2025-11-16

View details & Apply

 

(08) PhD Degree – Fully Funded

PhD position summary/title: PhD Position F/M Development of a Personalized Anatomical and Biomechanical Eye Model

Your mission will be to contribute to the development of advanced anatomical and biomechanical models of the human eye and its interaction with the head and neck, as part of the broader goal of improving personalized care for children affected by myopia. These models will form the foundation of a digital twin framework designed to support diagnosis, treatment planning, and the development of custom therapeutic lenses.

You will focus on building accurate 3D representations of the eye using clinical imaging data, such as MRI and OCT scans, and developing average eye models that can be customized based on patient-specific clinical measurements. These personalized models will capture both observable clinical features (e.g. axial length, refraction) and internal anatomical characteristics that are typically not accessible in routine practice. The integration of statistical shape modeling and biomechanical simulation will ensure anatomical realism and predictive capacity, enabling a detailed understanding of how the eye evolves under growth and various treatment options.

In parallel, you will participate in the development of a coupled biomechanical model of the eye-head-neck system. This model will simulate the ocular and postural behavior of children during everyday visual tasks, offering insights into how head and eye movements influence optical performance and the evolution of myopia. It will also serve to evaluate the optical and ergonomic effects of different lens designs during real-life activities, providing essential data to optimize the fit, comfort, and efficacy of therapeutic eyewear.

Ultimately, your work will support the generation of individualized eye models that guide the design of personalized myopia control lenses, based on a combination of anatomical, biomechanical, and behavioral inputs. These models will be integrated into an e-health decision-support tool facilitating real-world application of research outcomes in clinical and commercial settings.

Deadline : 2025-11-10

View details & Apply

 

Click here to know “How to Write an Effective Cover Letter”

 

(09) PhD Degree – Fully Funded

PhD position summary/title: PhD Position F/M Optimizing serverless computing in the edge-cloud continuum

Serverless computing, also known as function-as-a-service, improves upon cloud computing by enabling programmers to develop and scale their applications without worrying about infrastructure management [1, 2]. It involves breaking an application into small functions that can be executed and scaled automatically, offering applications high elasticity, cost efficiency, and easy deployment [3, 4].

Serverless computing is a key platform for building next-generation web services, which are typically realized by running distributed machine learning (ML) and deep learning (DL) applications. Indeed, 50% of AWS customers are now using serverless computing [5]. Significant efforts have focused on deploying and optimizing ML applications on homogeneous clouds by enabling fast storage services to share data between stages [6], by solving the cold-start problem (launching an appropriate container to perform a given function) when scaling resources [7], and by proposing lightweight runtimes to efficiently execute serverless workflows on GPUs [8]; and on building simulation to evaluate resource allocation and task scheduling policies [9] . However, few efforts have focused on deploying serverless computing in the Edge-Cloud Continuum, where resources are heterogeneous and have limited compute and storage capacity [10], or have addressed the simultaneous deployment of multiple applications.

Deadline : 2025-11-09

View details & Apply

 

(10) PhD Degree – Fully Funded

PhD position summary/title: Doctorant F/H Composability from numerical algorithms to programming model

NumPEx est un programme de recherche prioritaire (PEPR) français dédié à l’exascale. Dans ce cadre, le projet Exa-Soft: HPC software and tools vise à consolider l’écosystème logiciel exascale en fournissant une pile logicielle cohérente, prête pour l’exascale, présentant des avancées de recherche révolutionnaires rendues possibles par des collaborations multidisciplinaires entre chercheurs.

La thèse proposée s’inscrit dans ce projet et sera menée en collaboration au sein des équipes-projets Inria Avalon et Concace. Elle aura pour objectif de contribuer à la productivité et portabilité des performances qu’offrira la pile logicielle résultante d’Exa-Soft: HPC software and tools.

Des déplacements réguliers sont prévus, en particulier entre les centres Inria Lyon et Bordeaux. Les frais de déplacements seront bien entendus pris en charge, dans la limite du barème en vigueur.

Deadline : 2025-11-09

View details & Apply

 

Connect with Us for Latest Job updates 

Telegram Group

Facebook

Twitter

(11) PhD Degree – Fully Funded

PhD position summary/title: PhD Position F/M Multimodal prediction for Cardioembolic stroke: etiology and risk

Cardioembolic stroke is a particularly severe type of stroke, with high rates of early and long-term recurrence, and an incidence that increases with age. In developed countries, the absolute number of cardio-embolic strokes has tripled during the past three decades, and projections anticipate it will triple again by 2050. Despite available pharmacological treatments (oral anticoagulation), their preventive administration remains limited due to a largely underestimated risk. Indeed, diagnostics currently rely almost exclusively on the detection of Atrial Fibrillation (AF), which critically lacks efficiency since AF is not the exclusive cause of cardioembolic stroke and it exhibits considerable diversity in associated risks. Recently, cardiac biomarkers, such as left atrial (LA) appendage shape and morphology, or Left Ventricular (LV) function, have emerged as valuable predictors of stroke risk. Such biomarkers can, for instance, be extracted by analyzing thoracic Computed Tomography scans (CT) and Electrocardiograms (ECG), for which advanced tools based on machine/deep learning have now increased accessibility.

In the context of the France 2030 RHU Talent project (Digital prediction and management of cardioembolic stroke risk), extensive multicentric retrospective data, including CT, ECG, and blood-derived biomarkers, have been collected. The clinical goal of this project is to develop a stroke risk and stroke etiology prediction model informed by such heterogeneous sources of information. To achieve this goal, the project will focus on developing innovative AI-based models tailored to the joint analysis of multimodal data. We propose to leverage recent advances on multimodal Bayesian learning and the emerging field of causal discovery, which has the great advantages of 1) allowing seamless inclusion of prior knowledge, 2) supporting the identification of actionable variables to go beyond correlation-informed predictions, and 3) improving model interpretability and explainability.

Deadline : 2025-11-09

View details & Apply

 

Polite Follow-Up Email to Professor : When and How You should Write

 

(12) PhD Degree – Fully Funded

PhD position summary/title: PhD Position F/M Cifre PhD – Mathematical and statistical modeling of cfDNA size fragments for diagnosis and prediction in oncology

Liquid biopsy has established itself as a powerful tool for the early detection of cancer and the diagnosis, prognosis and treatment monitoring in a wide range of cancer types [1]. The BIAbooster technology from Adelis allows precise quantification of the size of cell free DNA fragments from plasma samples, establishing size distributions over a wide range of base pairs (part of the fragmentome). In the SChISM (Size Cfdna Immunotherapies Signature Monitoring, N = 334 patients) study led by Pr S. Salas (APHM and COMPO), we have demonstrated that features derived from these distributions were predictive of response to immunotherapy as univariable biomarkers [2, 3] or integrated within multivariable machine learning predictive models.

One of the main interest of such non-invasive biomarkers is for monitoring the response during treatment. We have developed a first mechanistic model of the longitudinal data collected during SChiSM that demonstrated added value for prediction of progression [4].

Deadline : Open until filled

View details & Apply

 

(13) PhD Degree – Fully Funded

PhD position summary/title: Doctorant F/H Etude des conditions de validité de nouvelles méthodologies innovantes dans les essais cliniques en cancérologie : les bras de contrôle synthétiques

Avec l’aide d’Agathe Guilloux et Sandrine Katsahian, la personne recrutée sera amenée à 

  • faire une revue de la littérature scientifique et une synthèse des directives réglementaires sur les SCAs
  • implémenter les algorithmes existants et mettre en évidence leurs limites sur des simulations et des jeux de données réelles
  • proposer des méthologies dans les cas mis en évidence lors des expériences précédentes

Deadline : 2025-11-01

View details & Apply

 

(14) PhD Degree – Fully Funded

PhD position summary/title: PhD Position F/M Foundation of an HPC Composition Model

The particular issue that this PhD shall tackle is to provide a model to composose code develop accordingly to various HPC programming paradigms. Software composition is an old [NATO] but still key technique [SZY] to develop complex applications by being able to divde them into manageable code. Though it is common in traditionnal sequential programming, composition parallel codes is still challenging. Previous work demonstrated that composition within a parallel programming is possible and do not impact performance [CCA, HLCM, Comet]. However, we are lacking a model to compose distinct paradigms without mentioning the specific problem of runtime cohabitation in an HPC context.

A particular target of this work could be the Multi-Level Intermediate Representation (MLIR) framework [MLIR], aiming to harness MLIR not only as a unifying compilation target for heterogeneous input Domain Specific Languages (DSLs) but also as a semantic backbone for expressing data flow, and reasoning about scheduling. 

Deadline : 2025-10-22

View details & Apply

 

(15) PhD Degree – Fully Funded

PhD position summary/title: PhD Position F/M PhD in numerical optimization for statistical machine learning

The successful candidate will

  • implement and compare different distributed numerical optimization paradigms, that include distributed optimization with noisy communication channels,
  • accurately study the selected algorithms,
  • participate in the development and maintenance of the PEPit (https://pepit.readthedocs.io/) software package, including by adding functionalities related to distributed optimization.

Deadline : 2025-10-18

View details & Apply

 

 

About  The National Institute for Research in Computer Science and Automation (Inria), France –Official Website

The National Institute for Research in Computer Science and Automation (Inria)  is a French national research institution focusing on computer science and applied mathematics. It was created under the name Institut de recherche en informatique et en automatique (IRIA) in 1967 at Rocquencourt near Paris, part of Plan Calcul. Its first site was the historical premises of SHAPE (central command of NATO military forces), which is still used as Inria’s main headquarters. In 1980, IRIA became INRIA. Since 2011, it has been styled Inria.

Inria is a Public Scientific and Technical Research Establishment (EPST) under the double supervision of the French Ministry of National Education, Advanced Instruction and Research and the Ministry of Economy, Finance and Industry.

 

 

Disclaimer: We try to ensure that the information we post on VacancyEdu.com is accurate. However, despite our best efforts, some of the content may contain errors. You can trust us, but please conduct your own checks too.

 

Related Posts

 

Learn AI in 5 minutes a day.

Get the latest AI News, AI Jobs, & understand why it matters, and learn how to apply it in your work.

Subscribe to unlock FREE AI Jobs + AI Course + 1,000+ Prompts