Anomaly Detection
Anomaly detection is a process of finding outliers and unexpected structures in data. Strong outliers differ significantly from the rest of the data, but more subtle inconsistencies are often more challenging to identify. These inconsistencies might be present due to measurement errors or can indicate rare phenomena that are of interest. We investigate the possibilities of using topological data analysis to identify anomalies in data.
A particular problem we are trying to solve is identifying emission sources of methane, one of the most important greenhouse gases. Despite the near global availability of satellite observation data, identifying the emission sources remains a challenging problem. The reasons for this are twofold. First, the spatial resolution of the current data is still very coarse, and second, the satellites only measure the concentration of the gases. Hence, our primary goal is to infer the emission rates from the local concentration levels. To achieve this, we are trying to identify relevant structures in the concentration field that can be used to infer the sources of pollution. Significant challenges lie in the data's low resolution, high noise levels, and the fact that the changing weather conditions strongly affect the concentration field.
To overcome these problems, we used a neural network that leverages topological information extracted from the data and the time delay reconstruction. This network improved the accuracy of the predictions on the simulated data, but the method still needs more stability with respect to noise present in the satellite measurements. The current pipeline incorporates the topological information via a convolutional neural network trained on the persistence images. This method requires some ad hoc choices and is rather crude. The next step is to use more refined methods to fuse topological data analysis and machine learning, such as using multi-parameter persistence or even more general methods.