Abstract: Much of the data which is currently of a great deal of interest is very high dimensional, in the sense that it comes equipped with a very large number of features or coordinates. In genomics, for example, the variables correspond to a large set of genes, often in the thousands. When the number of variables is this high, our intuition about the data often fails us. A particular failure comes in the form of false discoveries, where spurious correlations occur. The good news, though, is that we typically believe that although the data resides in high dimensions, it is really concentrated around a lower dimensional subset. This subset is often non-linear, though, and in this talk we will discuss methods from topology that allow one to identify these lower dimensional subsets, with examples and applications.