Statistical Theory & Methodology

Department faculty and students work on many topics in statistical theory and methodology. Consult individual faculty web pages for descriptions of interest. Here are some major categories comprising the research done here at Carnegie Mellon:

  • Algebraic statistics
  • Bayesian inference
  • Clustering
  • Functional data analysis
  • Graphical models
  • High-dimensional data analysis
  • Machine learning
  • Network models
  • Nonparametric methods
  • Point processes
  • Spectral methods and manifold learning
  • State-space models
  • Time series

Cross-disciplinary Research

Our Department of Statistics began in a Carnegie Mellon environment that stressed inter-disciplinary research. As a result we have established many long-term collaborations in a wide variety of areas. Select from the list below for a full description of each.

Astrophysics

Astrophysics is the study of stars, galaxies and the large scale structure of the universe. Using data from telescopes and satellites, astrophysicists study questions about the origin, evolution and fate of the universe. In the last decade, there has been a deluge of valuable data and statisticians play an important role in analyzing these data. Genovese and Wasserman are founding members of the International Computational Astrostatistics (INCA) group, a cross-disciplinary research team consisting of astrophysicists, statisticians and computer scientists. Within the department, several faculty, post-docs, graduate students, and several undergraduates are members of the group; and other active members are drawn from the other departments at Carnegie Mellon, the University of Pittsburgh, and several international institutions. The statistics department works closely with the McWilliams Center for Cosmology at Carnegie Mellon as well as with the Physics department at the University of Pittsburgh. Recent projects include: analysis of the cosmic microwave background, estimating the dark energy equation of state, analysis of galaxy spectra, detecting galaxy clusters, identifying filaments, and estimating density functions with truncated data. A common theme in this work is the goal of detecting subtle, nonlinear signals in noisy, high-dimensional data. Our primary focus is on using state-of-the-art data, and analytical methods, to advance cosmology.

Bioinformatics

Bioinformatics is the name given to statistical and computational approaches used to glean understanding from large data sets in molecular biology. Recent developments in genomic and molecular research technologies, combined with developments in information technologies have produced a tremendous amount of excitement in the research community. Major research efforts include genome-wide association studies, the study of copy number variation, gene finding, protein structure prediction, prediction of gene expression and protein-protein interactions, and the modeling of evolution. Statistics faculty interested in computational biology often work in collaboration with the Lane Center for Computational Biology or with faculty at University of Pittsburgh Medical School.

Biostatistics

Faculty and students are involved in a number of collaborative research projects in the biomedical and behavioral sciences, including bioinformatics (see above) of inherited diseases, neuroscience (see below), and the design and analysis of clinical studies. An underlying theme in our work is the use of statistical methods to better understand heterogeneity in study populations. Traditional statistical models attempt to explain the mean response of each individual in the study. Frequently, however, this is not possible because key variables may not be directly observable. For example, genetic differences among patients may lead them to respond differently to a drug therapy. We are also interested in methods for generating and evaluating evidence about what treatments work for whom and under what conditions through combining information generated from different study designs, such as randomized controlled trials, epidemiological studies, and administrative databases. While sometimes we lead the fundamental scientific work, often we collaborate with investigators at the University of Pittsburgh Medical Center and/or with others worldwide.

Education, policy & social sciences

Statistical methods are now a primary tools for the collection and analysis of data to inform the education, policy, and social sciences. From questionnaire development to the selection of probability samples to the design of social experiments, statisticians at Carnegie Mellon collaborate in the collection of social science data. Faculty and students regularly work with others to develop new methods for analyzing these data and they apply up-to-date methods for drawing inferences from diverse social science data sources ranging from large scale sample surveys to social networks, to educational experiments. A number of statistics graduate students work directly in joint programs bringing statistics to bear on problems in education and public policy.

Neuroscience

Cognitive neuroscience attempts to understand the great mystery of the way mind is created by brain. The field is relatively young, yet is among the fastest-growing of all intellectual disciplines, in large part due to enormous technological advances in data acquisition. Together with colleagues from the Center for the Neural Basis of Cognition, faculty and students have developed analytical techniques for neuroimaging, including functional magnetic resonance imaging (fMRI), which produces high-dimensional spatial time series data with complex structure. Another major thrust of our research is concerned with individual and multiple neuron firing patterns recorded from the brains of animals while they perform some task. One of the applications is to brain-machine interfaces, where neural signals are used to guide a prosthetic robot arm.

Privacy & Cyber-security

Data privacy is a fundamental problem of the modern information infrastructure. Increasing volumes of personal and sensitive data are collected and archived by health networks, government agencies, search engines, social networking websites, and other organizations. The social benefits of analyzing these databases are significant. At the same time, the release of information from sensitive data repositories can be devastating to the privacy of individuals and organizations. The challenge is to discover and release analytically useful extracts of these databases, without compromising the privacy of their entities. Together with colleagues and students in Cylab, the Heinz College, and School of Computer Science our research focuses on the trade-off between disclosure risk and utility associated with the release of statistical databases. We are also working towards understanding the practical potential of the developed techniques by applying them to social science data sets.