Context

Title: Analyzing big data with Temporal Graphs and machine learning: application to urban traffic analysis and protein function annotation

Type: CNRS-Inria/FAPs,

Period: Mar. 2018 – Feb. 2021

In the TempoGraphs project, we will investigate and propose solutions for both urban traffic-related problems and protein annotation problems. In the case of urban traffic analysis, problems such as traffic speed prediction, travel time prediction, traffic congestion identification and nearest neighbors identification will be tackled. We intend to model the traffic network with a time- dependent graph. Such structure will be processed by machine learning algorithms, including regression, classification and clustering methods, to generate useful information. The usage of machine methods is due to its inherent capacity to deal with uncertainty. In the case of protein annotation problem, we aim to model the protein graph and/or protein–protein interaction (PPI) networks with time-dependent and dynamic graph representation. The resulting dynamic/temporal graph will be used as input for machine learning and label propagation methods to provide valuable annotation for the un-reviewed proteins.

The problems addressed in this project can be divided into two categories: data management and information extraction.

The first research topic is related to the challenges that arise given the amount and the nature of the data. We envisage the problem of storing, processing and accessing large scale graphs in an efficient way. In this context, many research questions arise when dealing with such large time-dependent graphs such as:

  • How to build a graph using spatiotemporal data or temporal traces in general.
  • How to interlink and enrich time-dependent graphs with semantic resources.
  • How to allow scalable query processing over a time-dependent graph.
  • How to organize and maintain a time-dependent graph in distributed architecture?

The second category deals with data processing methods for information extraction over time- dependent graphs. Given the particular data structure and the inherent uncertainty of the data, machine learning methods will be used to answer the following questions:

  • How to efficiently build a graph of proteins using several similarity measures between proteins
  • How to annotate protein sequences using dynamic/temporal graph representation
  • How to predict traffic flow under uncertainty
  • How to predict travel times under uncertainty
  • How to identify an incipient traffic congestion