Fall 2011Organizer: Srikesh ArunajadaiDepartment of Biostatistics, Columbia University
Location: 722 W 168 Street, 6th Floor, Room 656Time: 4:00-5:00pm

September 22Dr. Wenbin LuAssociate Professor, Department of Statistics, North Carolina State UniversityHost: Dr. Yuanjia Wang

Variance Component Selection in Linear Mixed Models for Longitudinal Data



Abstract: The selection of random effects in linear mixed models is an important yet challenging problem in practice. We propose a robust and unified framework for automatically selecting random effects and estimating covariance components in linear mixed models. A moment-based loss function is first constructed for estimating the covariance matrix of random effects. Two types of shrinkage penalties, a hard-thresholding operator and a new sandwich-type soft-thresholding penalty, are then imposed for sparse estimation and random effects selection. Compared with existing approaches, the new procedure does not require any distributional assumption on the random effects and error terms. We establish the asymptotic properties of the resulting estimator in terms of its consistency for both random effects selection and variance component estimation. Optimization strategies are suggested to tackle the computational challenges involved in estimating the sparse variance-covariance matrix. Furthermore, we extend the procedure to incorporate the selection of fixed effects as well. Numerical studies show promising performance of the new approach in selecting both random and fixed effects and estimating model parameters. Finally, we apply the approach to a data set from the Amsterdam Growth and Health study.

Bio: Dr. Wenbin Lu received is doctoral degree in Statistics from Columbia University in 2003. He then joined the Statistics Department in NCSU. His research interests include survival analysis, semiparametric methods, high dimensional data analysis, statistical methods for personalized treatments and statistical genetics. His current research is partly supported by several NIH grants. He is an associate editor for Biostatistics and Journal of Statistical Theory and Practice.



October 13Dr. Rafael IrizzaryProfessor, Department of Biostatistics, Johns Hopkins UniversityHost: Dr. Srikesh Arunajadai
Bump hunting in the Epigenome

Abstract: In this talk I will first give a brief introduction sharing my views on the potential leadership role that statisticians can play in the current measurement revolution. Then I will give a short tutorial on epigenetics and DNA methylation. Finally, I will describe new challenges for statisticians related to the existing field of bump hunting. I will describe the important role modern statistical techniques play in finding regions of the genome that are consistently different between disease and normal groups using both microarray and next-generation sequencing data. I will also describe the batch effect and how we deal with it.

Bio: Dr. Irizarry received his bachelor’s in mathematics in 1993 from the University of Puerto Rico and went on to receive a Ph.D. in statistics in 1998 from the University of California, Berkeley. His thesis work was on Statistical Models for Music Sound Signals. He joined the faculty of the Department of Biostatistics in the Bloomberg School of Public Health in 1998 and was promoted to Professor in 2007. For the past ten years, Dr. Irizarry’s work has focused on Genomics and Computational Biology problems. In particular, he has worked on the analysis and pre-processing of microarray, second-generation sequencing, and genomic data. He is currently interested in leveraging his knowledge in translational work, e.g. developing diagnostic tools and discovering biomarkers.



October 20Dr. David MurrayChair and Professor, Department of Epidemiology, Ohio State UniversityHost: Dr. Roger Vaughan
NOTE: This talk wil be held at Hess Commons, 722 W 168 Street, 10th Floor

Design and Analysis of Group-Randomized and Individually Randomized Group-Treatment Trials


Abstract: Interventions often involve planned interactions among participants post randomization. Where such interaction occurs in pre-existing groups randomized to study conditions, we have a group-randomized trial. Where such interaction occurs in groups created for the study and following individual randomization, we have an individually randomized group-treatment trial. Such trials face design and analytic challenges not found in the more familiar randomized clinical trial. This presentation will review the design and analysis of both group-randomized and individually randomized group-treatment trials. It will also compare these designs to other designs that have been suggested as less expensive alternatives including fractional factorial designs, multiple baseline designs, time series designs, quasi-experimental designs, dynamic wait list or stepped wedge designs, and regression discontinuity designs. While there are limited circumstances in which one of these alternatives may be preferred, in general the group-randomized and individually randomized group-treatment trials remain the most efficient and rigorous comparative designs available to evaluate these interventions.

Bio: Dr. Murray has spent his career evaluating intervention programs designed to improve the public health. He has worked with all age groups, in a variety of settings, and with a variety of health behaviors and disease outcomes. In particular, Dr. Murray has focused on the design and analysis of group-randomized trials in which identifiable social groups are randomized to conditions and members of those groups are observed to assess the effect of an intervention. Dr. Murray wrote the first textbook on that material, published by Oxford University Press in 1998. He is actively involved in many of these trials, collaborating with colleagues around the country on their design, implementation and evaluation. He also conducts research to develop and test new methods for their analysis. Dr. Murray recently completed a two-year term as Chair of the Community-Level Heath Promotion study section at NIH.





November 3Dr. Jean OpsomerChair and Professor, Department of Statistics, Colorado State UniversityHost: Dr. Qixuan Chen

Analytic Inference for Data from Complex Surveys

Abstract: Surveys are an important source of data in many social, health and environmental sciences. When analyzing datasets that were obtained through surveys, failure to take the design aspects into account can lead to invalid estimation and inference, even when the mode of inference is model-based. We begin by describing some of the issues associated with informativeness in analytic inference, and examine the differences between model-based and design-based modes of interference. We then describe a number of new results and approaches in testing and adjusting for design effects, using a combination of parametric and nonparametric tools.

Bio: Dr. Opsomer has been involved in the design and estimation for the National Resources Inventory (NRI) survey, as well as several other surveys conducted by the Center for Survey Statistics and Methodology at Iowa State University. On the methodological side, he has collaborated on the development of nonparametric model-assisted estimation techniques. More recently, he has been applying nonparametric methods in survey estimation problems including small area estimation, imputation, and variance estimation.

His current interest is in the area of nonparametric regression, which includes development of penalized spline regression methodology, and on the application of nonparametric regression techniques in survey statistics.
Dr. Opsomer’s research involves the development of advanced statistical tools to increase our understanding of environmental processes as well as the human impact on the environment. The primary areas of application of his research are environmental economics, environmental toxicology and natural resource surveys. He has collaborated with the faculty in the Department of Agronomy at Iowa State University on building a daily erosion prediction model for Iowa, and was involved in a watershed-level agro-ecological experiment to assess the feasibility and impacts of combining native prairie and intensive agriculture in Iowa.




November 10Dr. Guoqing Diao
Associate Professor
t Professor, Department of Statistics, George Mason University
Host: Dr. Ying Wei


Semiparametric Hazards Rate Model for Modeling Short-term and Long-term Effects


Abstract: Cox proportional hazards (PH) model is widely used in survival analysis and it assumes that relative risks remain constant across time. This assumption is violated in several applications in biosciences and genetics. To address this issue, we propose an extension of the PH model that allows for time-varying relative risks. We develop statistical properties and novel data analysis methods using the nonparametric likelihood methodology. Extensive simulations and data from a clinical trial and a genetic study are used to demonstrate the usefulness of our methods. We also provide an extension of the proposed model to the competing risks area.


Bio: Dr Guoqing Diao is an assistant professor in the Department of Statistics at George Mason University. He received his Ph.D. degree in Biostatistics from the University of North Carolina at Chapel Hill in December 2005. Dr. Diao's research interests include statistical genetics, survival analysis, semiparametric models, and rare event simulation problem.






November 17Dr. Pei WangAssosciate Professor, Department of Biostatistics, Fred Hutchinson Cancer CenterHost: Dr. Shuang Wang
Regularization for Multivariate Missing Data in Proteomic Studies

Abstract: Recent proteomic studies have identified proteins related to specific phenotypes. In addition to marginal association analysis for individual proteins, analyzing pathways (functionally related sets of proteins) may yield additional valuable insights. Identifying pathways that differ between phenotypes can be conceptualized as a multivariate hypothesis testing problem: whether the mean vector of a p-dimensional random vector X is mu0. This problem is complicated by the facts that the sample sizes are often small and there are substantial missing data in proteomic studies. To tackle these challenges, we first propose a regularized Hotelling's T2 (RHT) statistic together with a non-parametric testing procedure, which effectively controls the type I error rate and maintains good power in the presence of complex correlation structures and missing data patterns. We investigate asymptotic properties of the RHT statistic under pertinent assumptions and compare the test performance with other existing methods through simulations and real data examples. In the second part of this talk, we further propose to employ regularization in EM algorithm to more accurately estimate the mean vector and covariance matrix when data are missing at random and when data are missing not at random.

Biographical Notes: Dr. Pei Wang received her Ph.D. degree in Statistics from Stanford University in 2004. She then joined Fred Hutchinson Cancer Research Center, Seattle, WA, and now is an associate faculty member in the program of biostatistics. She is also an Affiliate Associate Professor in the department of biostatistics, University of Washington. Dr. Wang's research interests have focused on high dimensional genomics/proteomics data analysis, network inference and multivariate analysis.She has also been collaborating with scientists in Fred Hutchinson on a range of biomarker and epidemiology studies.




December 1Dr. Yongtao Guan
Professor, Management Science & Biostatistics, University of Miami
Host: Dr. ZheZhen Jin

Estimating Individual-Level Risk in Spatial Epidemiology Using Spatially Aggregated Information on the Population at Risk

Abstract: We propose a novel alternative to case-control sampling for the estimation of individual-level risk in spatial epidemiology. Our approach uses weighted estimating equations to estimate regression parameters in the intensity function of an inhomogeneous spatial point process, when information on risk-factors is available at the individual level for cases, but only at a spatially aggregated level for the population at risk. We develop data- driven methods to select the weights used in the estimating equations and show through simulation that the choice of weights can have a major impact on efficiency of estimation. We develop a formal test to detect non- Poisson behavior in the underlying point process and assess the performance of the test using simulations of Poisson and Poisson cluster point processes. We apply our methods to data on the spatial distribution of childhood meningococcal disease cases in Merseyside, U.K. between 1981 and 2007.

Biographical Notes: Yongtao Guan is a Professor of Management Science and Professor of Biostatistics (secondary appointment) at the University of Miami (UM). Prior to joining UM, Guan was Assistant and Associate Professor of biostatistics at Yale University from 2006-2011 and was Assistant Professor of Management Science at UM from 2003-2006. Guan, who earned his PhD from Texas A&M University in 2003, specializes in modeling data with correlation such as spatial, recurrent event and longitudinal data. Guan's research has been widely published in leading academic journals such as the Journal of the American Statistical Association, Journal of the Royal Statistical Society, Series B, Biometrika and Biometrics. Guan was a recipient of the CAREER Award from the National Science Foundation in 2009 and an R01 award from the National Institute of Health in 2011. He has also served as Guest Editor of Statistics and Its Interface and Associate Editor of the Annals of Applied Statistics




December 8Dr. Jay BartroffAssistant Professor, Department of Mathematics, University of Southern CaliforniaHost: Dr. Ken Cheung

Efficient Phase I-II Designs Using Sequential Generalized
Likelihood Ratio Statistics



Abstract: We focus on the following scenario in early phase cancer trials. Following a Phase I trial in which the maximum tolerated dose (MTD) η of a treatment is estimated, a Phase II trial tests the hypothesis H0: p ≤ p0, where p is the probability of efficacy at the estimated MTD eta.png from Phase I and p0 is the baseline efficacy rate. Standard practice for Phase II remains to treat p = p(eta.png) as a fixed, unknown parameter and to use Simon’s (1989) 2-stage design with all patients dosed at eta.png. In this talk we propose an alternative approach utilizing sequential generalized likelihood theory which accounts for the uncertainty in eta.png, uses both efficacy and toxicity data from both phases, does not require that all patients to be dosed at eta.png, and allows updating of eta.png both during and after Phase II. Efficient group sequential sampling, or even adaptive sampling, can be used within this framework, which allows for early stopping to show treatment effect or for futility. The results of simulation studies will be shown comparing this proposed design to current practice. This is joint work with Tze Lai and Balasubramanian Narasimhan at Stanford.

Biographical Notes: Dr. Bartroff received his Ph.D. in mathematics from Caltech in 2004 and is currently Assistant Professor in the Department of Mathematics at USC. From 2004-2007 he was a postdoctoral fellow in Statistics at Stanford University where he was a member of the Stanford Medical School's Cancer Center Biostatistics Core, through which he was involved in the design and analysis of cancer clinical trials. Dr. Bartroff's methodological research concerns sequential estimation, analysis, and design, particularly with applications to the design and analysis of complex clinical trial data. He is the author of numerous articles on statistics and probability and has a forthcoming book from Springer titled "Sequential Experimentation in Clinical Trials: Design and Analysis."