Professor Jun Liu, Harvard University
Title: On Methods for Controlled Variable Selection in linear, generalized-linear, and index models
Time and Location: 8:30am-9:30am, July 1 (Beijing Time), LONGFENG HALL
Host: Yichuan Zhao
Abstract: A classical statistical idea is to introduce data perturbations and examine their impacts on a statistical procedure. In the same token, the knockoff methods carefully create “matching” fake variables in order to measure how real signals stand out. I will discuss some recent investigations we made regarding both methodology and theory on a few related methods applicable to a wide class of regression models including the knock off filter, data splitting (DS), Gaussian mirror (GM), for controlling false discovery rate (FDR) in fitting linear, generalized linear and index models. We theoretically compare, under the weak-and-rare signal framework for linear models, how these methods compare with the oracle OLS method. We then focus on the DS procedure and its variation, Multiple Data Splitting (MDS), which is useful for stabilizing the selection result and boosting the power. DS and MDS are straightforward conceptually, easy to implement algorithmically, and applicable to a wide class of linear and nonlinear models. Interestingly, their specializations in GLMs result in scale-free procedures that can circumvent difficulties caused by non-traditional asymptotic behaviors of MLEs in moderate-dimensions and debiased Lasso estimates in high-dimensions. For index models, we had developed an earlier LassoSIR algorithm (Lin, Zhao and Liu 2019), which fits the DS framework quite well. I will also discuss some applications and open questions. The presentation is based on joint work with Chenguang Dai, Buyu Lin, Xin Xing, Tracy Ke, Yucong Ma, and Zhigen Zhao.
Bio:Dr. Jun Liu is a Professor of Statistics at Harvard University, with a courtesy appointment at Harvard School of Public Health. Dr. Liu received his BS degree in mathematics in 1985 from Peking University and Ph.D. in statistics in 1991 from the University of Chicago. He held Assistant, Associate, and full professor positions at Stanford University from 1994 to 2003. Dr. Liu received the NSF CAREER Award in 1995 and the Mitchell Award in 2000. In 2002, he won the prestigious COPSS Presidents' Award (given annually to one individual under age 40). He was selected as a Medallion Lecturer in 2002, a Bernoulli Lecturer in 2004, a Kuwait Lecturer of Cambridge University in 2008; and elected to Fellow of the Institute of Mathematical Statistics in 2004, Fellow of the American Statistical Association in 2005, and Fellow of the International Society for Computational Biology in 2022. He was awarded the Morningside Gold Medal in Applied Mathematics in 2010(once every 3 years to an individual of Chinese descent under age 45). He was honored with the Outstanding Achievement Award and the Pao-Lu Hsu Award (once every 3 years) by the International Chinese Statistical Association in 2012 and 2016, respectively. In 2017, he was recognized by the Jerome Sacks Award for outstanding Cross-Disciplinary Research.
Professor Fang Yao, Peking University
Title: Theory of FPCA for discretized functional data
Time and Location: 8:30am-9:30am, July 2 (Beijing Time), LONGFENG HALL
Host: Zhezhen Jin
Abstract: Functional data analysis is an important research field in statistics which treats data as random functions drawn from some infinite-dimensional functional space, and functional principal component analysis (FPCA) plays a central role for data reduction and representation. After nearly three decades of research, there remains a key problem unsolved, namely, the perturbation analysis of covariance operator for diverging number of eigencomponents obtained from noisy and discretely observed data. This is fundamental for studying models and methods based on FPCA, while there has not been much progress since the result obtained by Hall et al. (2006) for a fixed number of eigenfunction estimates. In this work, we establish a unified theory for this problem, deriving the moment bounds of eigenfunctions and asymptotic distributions of eigenvalues for a wide range of sampling schemes. We also exploit double truncation to derive the uniform convergence of such estimated eigenfunctions. The technical arguments in this work are useful for handling the perturbation series of discretely observed functional data and can be applied in models and methods involving inverse using FPCA as regularization, such as functional linear regression.
Bio:Dr. Fang Yao is Chair Professor in School of Mathematical Sciences, Director of Center for Statistical Science at Peking University. He is a Fellow of IMS and ASA, and an elected member of ISI. He received his B.S. degree in 2000 from the University of Science and Technology in China. At the age of 24, he completed both his MSc and Ph.D. degree in Statistics at UC Davis. In 2003, he started his academic career at Colorado State University. In 2006, he moved to Department of Statistical Sciences at University of Toronto and was tenured at the age of 29 in 2008.
Dr. Yao’s research primarily focuses on functional and longitudinal data, complex data structures such as high dimensions, manifolds and dynamics, and their applications in various disciplines. In 2014, he was awarded the CRM-SSC Prize in Canada that recognizes a statistical scientist’s professional accomplishments in research during the first 15 years after earning a doctorate.primarily conducted in Canada. He has served as the Editor for Canadian Journal of Statistic, and served as an Associate Editor for a number of statistical journals, including Annal of Statistics and the Journal of American Statistical Association.
Banquet Speaker
Zongben Xu is a professor in mathematics and computer science at Xi’an Jiaotong University. He received his Ph.D. degrees in mathematics from Xi’an Jiaotong University, China, in 1987. His current research interests include applied mathematics and mathematical methods of big data and artificial intelligence. He established the L(1/2) regularization theory for sparse information processing. He also found and verified Xu-Roach Theorem in machine learning, and established the visual cognition based data modelling principle, which have been widely applied in scientific and engineering fields. he initiated several mathematical theories, including the non-logarithmic transform based CT model, and ultrafast MRI imaging, which provide principles and technologies for the development of a new generation of intelligent medical imaging equipment. He is owner of the Hua Loo-keng Prize of Mathematics in 2022, Tan Kan Kee Science Award in Science Technology in 2018, the National Natural Science Award of China in 2007,and winner of CSIAM Su Buchin Applied Mathematics Prize in 2008. He delivered a 45-minute talk on the International Congress of Mathematicians 2010. He was elected as member of Chinese Academy of Science in 2011.
Zongben Xu was the vice-president of Xi’an Jiaotong University. He currently makes several important services for government and professional societies, including the director for Pazhou Lab (Huangpu), director for the National Engineering Laboratory for Big Data Analytics, a member of National Big Data Expert Advisory Committee and the Strategic Advisory Committee member of National Open Innovation Platform for New Generation of Artificial Intelligence.
Viewing Statistics from a Data Science Perspective
Zongben Xu
(Xi’an Jiaotong University, Pengcheng Lab/Pazhou Lab(Huangpu))
In the era of big data, characterized by digitization, networking, and intelligence, data serves as both a production factor and a powerful tool for scientific discovery. Statistics has long been considered as the scientific foundation and methodology, guiding and leading the generation, analysis, and utilization of data. Can this guiding and leading role continue in the age of big data? The statistics community is supposed to ponder and answer this question. This talk will discuss this issue.
Big data nurtures data science, and data science carries the future of big data. First, we rigorously define data science, elaborating on its multidisciplinary attributes, advanced nature, "three transformations" connotation, and unique disciplinary methodology of "modeling, analysis, computation, and learning fusion." Secondly, we compare the similarities and differences between data science and statistics, pointing out the unique contributions of statistics to data science and its "core" role in the development of data science. Based on this, we outline the limitations of statistics in terms of research objects, methods, and values. We provide examples to demonstrate that the integration of data science and statistics can inspire new research questions and methods in various fields, thereby promoting the novel development of data science. Consequently, we assert that statistics can continue to guide and lead data science disciplines in the big data era if it proactively embraces and strides towards data science.