Christopher Terence John Dodson
University of Manchester, UK
Title: Information geometry for data characterization and representation
Biography
Biography: Christopher Terence John Dodson
Abstract
Information geometry provides a framework for handling families of probability density functions in Riemannian geometry, with a metric distance structure that has its foundation in information theory. This utilizes the same concept of entropy that plays such an important role in thermodynamics and signal transfer processes. Information geometry has therefore wide applications in physical and biological processes that exhibit stochastic properties. Moreover, the geometric structure can provide a background for the representation of datasets with statistically distributed features, and known analytic results have given us model structures that allow representation of departures from uniformity, randomness and independence. In a number of real situations large datasets arise which contain features of the data origin processes that can be used to characterize the underlying statistical processes. In such cases, information geometry can be used to provide the requisite distance structure on the spaces involved, which may be of high dimension, so enabling proximity comparison, and neighbourhoods for sets of features of interest in data mining. A typical real situation is one in which the features of interest yield mixtures of multi-variate Gaussian distributions, and we describe a method to handle such cases.