Welcome to the Library (1400x200)

Archived Talks and Seminars

Reverse Chronological Order

Statistics Seminar
Wednesday, March 20, 2024; 11:00am
Speaker Xin Zhang, Assistant Professor, SDSU Computer Science Department
Title Enabling Urban Intelligence by Harnessing Human-Generated Spatial-Temporal Data
Abstract

Technology advancement in mobile sensing and communication has enabled a massive amount of mobility data to be generated from human decision-makers, which we call human-generated spatial-temporal data (HSTD). Applying the HSTD to extract the unique decision-making strategies of human agents and design human-centered urban intelligent systems (e.g., self-driving ride services) has transformative potential. They can not only promote the individual well-being of gig-workers, and improve service quality and revenue of transportation service providers, but also enable downstream applications in smart transit planning, efficient gig-work dispatching, safe autonomous vehicle (AV) routing, and so on.

However, analyzing human decision strategies from HSTD is a challenging task. Human behaviors are complex and vary in different geographical locations (i.e., spatial challenge), and the quality of the learned strategies is also dependent upon the model expressibility (i.e., theoretical challenge). In addition, leveraging human decisions for human-centered smart cities has some practical gaps.

This talk presents a picture of my work on human behavior analysis from HSTD based on imitation learning and its downstream applications. They focus on tackling the above challenges by providing solutions to the following research questions: (1) How to capture the unique human decision-making strategies leveraging HSTD? (2) How to design human-centered smart city services leveraging human decisions? By answering these questions, a series of works including cGAIL, f-GAIL and CAC are introduced with novel designs in problem formulation, model architecture, and algorithm. Extensive experiments support the effectiveness of the proposed models on human behavior analysis and self-driving decision-making from HSTD, and provide superior performance over state-of-the-art works.

 

Statistics Seminar
Wednesday, March 6, 2024; 11:00am
Speaker Hossein Shirazi, Assistant Professor, SDSU Management and Information Systems
Title Seeing Should Probably Not Be Believing: The Role of Deceptive Support in COVID-19 Misinformation on Twitter
Abstract

With the spread of the SARS-CoV-2, enormous amounts of information about the pandemic are disseminated through social media platforms such as Twitter. Social media posts often leverage the trust readers have in prestigious news agencies and cite news articles as a way of gaining credibility. Nevertheless, it is not always the case that the cited article supports the claim made in the social media post. We present a cross-genre ad hoc pipeline to identify whether the information in a Twitter post (i.e., a "Tweet") is indeed supported by the cited news article. Our approach is empirically based on a corpus of over 46.86 million Tweets and is divided into two tasks: (i) the development of models to detect Tweets containing claims and worth to be fact-checked and (ii) verifying whether the claims made in a Tweet are supported by the newswire article it cites. Unlike previous studies that detect unsubstantiated information by post hoc analysis of the patterns of propagation, we seek to identify reliable support (or the lack of it) before the misinformation begins to spread. We discover that nearly half of the Tweets (43.4%) are not factual and hence not worth checking—a significant filter, given the sheer volume of social media posts on a platform such as Twitter. Moreover, we find that among the Tweets that contain a seemingly factual claim while citing a news article as supporting evidence, at least 1% are not supported by the cited news and are hence misleading..

 

Statistics Seminar
Wednesday, February 28, 2024; 11:00am
Speaker Mary Meyer, Ph.D., Professor of Statistics, Colorado State University
Title Applications of constrained spline density estimation
Abstract

Density estimation methods often involve kernels, but there are advantages to using splines. Especially if the shape of the density is known to be decreasing, or unimodal, or bimodal, or if the shape of the density is the research question, splines allow the shape assumptions to be readily implemented. In addition, spline estimators enjoy a faster convergence rate compared to kernel density estimators. Applications include testing unimodal versus multimodal density, estimating a deconvolution density, robust regression, and testing for sampling bias.

 

Statistics Seminar
Tuesday, February 13, 2024; 11:00am
Speaker Veronica Berrocal, Ph.D., Statistics, UC Irvine
Title Bayesian non-stationary spatial modeling using shrinkage priors
Abstract

Any spatial statistical analysis often starts with a decision regarding how to model the spatial dependence structure: can the spatial process be thought of as stationary or non-stationary? While most parametric covariance functions assume stationarity, in the case of non-stationarity, a modeling choice could be to envision the process as globally non-stationary, but locally stationary. A drawback of this choice lies in the fact that identifying regions of non-stationarity remains still challenging, at least from a computational point of view. In this talk, we present two approaches that allow to identify regions of local stationarity by redefining and repurposing the Multi-Resolution Approximation (MRA) of Katzfuss (2017), which was introduced to lessen the computational burden encountered when analyzing large massive data. Both methods use the representation of the spatial process as a linear combination of appropriate basis function (the MRA basis functions), but differ in the shrinkage prior specification adopted for the basis function weights. Inference on the basis function weights and the spatial variability in the number of levels of resolutions needed provide information on whether the process can be considered stationary or not. We showcase the ability of these methods to correctly capture regions of local stationarity through simulation experiments. We also apply them to identify regions with different strengths of spatial dependence for two soil-related variables that are very important for climate sciences, soil organic carbon and soil moisture.

 

Statistics Seminar
Wednesday, November 29, 2023; 11:00am
Speaker Professor Qingyun (Serena) Zhu, Management Information Systems, SDSU
Title How Loud is Consumer Voice in Product Deletion Decisions? Retail Analytic Insights
Abstract

This study examines the role of online consumer reviews in product deletion decisions. Building upon product portfolio management theory we integrate consumer voice, represented by online consumer review behavior into organizational voice-strategic product deletion decision-making. The study also informs demand management where findings suggest that products with lower attribute ratings and comments having less relevance to lower-ranked attributes are more likely to be deleted. The linguistic retail analytic characteristics of online reviews also provide insights for product deletion decisions. Products with reviews having higher subjectivity, shorter length, and lower readability are more likely to be deleted. Pre-purchase consumer voice - the perceived helpfulness or unhelpfulness of online reviews also informed product deletion decisions. A general conclusion is that online reviews can provide important retail analytics for smarter retail operations planning at the strategic and tactical levels when it comes to product planning and portfolio management through product deletion.

 

Statistics Seminar
Wednesday, November 1, 2023; 11:00am
Speaker Dr. Johanna Hardin, Mathematics and Statistics Department, Pomona College
Title Technical Conditions in Normalizing ChIP-Seq Data
Abstract

ChIP-Seq (Chromatin immunoprecipitation followed by sequencing) data is widely used for studying the behavior of genome-wide protein-DNA interactions. One important biological question addressed by ChIP-Seq experiments is: does the amount of bound protein (at a particular region on the genome) change for different experimental conditions, i.e., is the region differentially bound? Standard statistical methods for finding differentially bound regions derive from well-known two sample tests (think: t-test), but statistical analyses require samples to be pre-normalized. In this talk, I will discuss the challenge of normalization, the methods for normalizing, and the technical conditions required for the normalization method to work. Simulation studies back up our work on deriving technical conditions from the format of the normalization methods.

 

Statistics Seminar
Wednesday, October 11, 2023; 11:00am
Speaker Dr. Ronghui (Lily) Xu, Department of Mathematics and School of Public Health, UC San Diego
Title Doubly robust estimation for time-to-event outcomes
Abstract

In this talk we review our works on doubly robust estimation for time-to-event outcomes, including the popular marginal structural Cox model and for dependently left truncated data. A common theme to these works is the well-known semiparametric theory, and a notable feature is the rate double robustness which allows machine learning or nonparametric approaches to be applied in order to estimate the nuisance parameters or functions. The latter circumvents compatibility issues surrounding nonlinear models like the proportional hazards one. Our main estimand of interest is a treatment effect, with or without randomization.

 

Statistics Seminar
Wednesday, September 27, 2023; 11:00am
Speaker Dr. Zhe Fei, Department of Statistics, UC Riverside
Title U-learning for Prediction Inference: With Applications to LASSO and Deep Neural Networks
Abstract

Epigenetic aging clocks play a pivotal role in estimating an individual's biological age through the examination of DNA methylation patterns at numerous CpG (Cytosine-phosphate-Guanine) sites within their genome. However, making valid inferences on predicted epigenetic ages, or more broadly, on predictions derived from high-dimensional predictors, presents challenges. We introduce a new U-learning approach for making ensemble predictions and constructing prediction intervals for continuous outcomes when traditional asymptotic methods are not applicable. More specifically, our approach conceptualizes the ensemble estimators within the framework of generalized U-statistics and invokes the Hajek projection for deriving the asymptotic properties and yielding consistent variance estimates. We applied our approach with two commonly used predictive algorithms, the Lasso and deep neural networks (DNNs), and illustrate valid prediction inferences with extensive numeric examples. We applied the methods to predict the DNA methylation age of patients with different tissue samples, which may properly characterize the aging process and lead to anti-aging interventions.

 

Statistics Seminar
Wednesday, September 13, 2023; 11:00am
Speaker Dr. Xin Wang, Assistant Professor, Department of Mathematics and Statistics, SDSU
Title Clustered coefficient regression models for Poisson process with an application to seasonal warranty claim data
Abstract

Motivated by a product warranty claims data set, we propose clustered coefficient regression models in a non-homogeneous Poisson process for recurrent event data. The proposed method, referred as CLUPP, can estimate the group structure and parameters simultaneously. In our proposed method, a penalized regression approach is used to identify the group structure. Numerical studies show that the proposed approach can identify the group structure well, and outperforms traditional methods such as hierarchical clustering and K-means. We also establish theoretical properties, which show that the proposed estimators can converge to true parameters in high probability. In the end, we apply our proposed methods to the product warranty claims data set, which achieve better prediction than the state-of-the-art methods.

 

Statistics Seminar
Wednesday, April 26, 2023; 11:00am
Speaker Dr. Vivian Li, Assistant Professor, University of California Riverside
Title Statistical methods for analyzing and comparing single-cell gene expression data
Abstract

Single-cell RNA sequencing (scRNA-seq) experiments enable gene expression measurement at a single-cell resolution, and provide an opportunity to characterize the molecular signatures of diverse cell types and states in tissue development and disease progression. However, it remains a challenge to construct a comprehensive view of single-cell transcriptomes in health and disease, due to the knowledge gap in properly modeling the high-dimensional, sparse, and noisy scRNA-seq data. In this talk, I will introduce two statistical methods we have developed for analyzing and comparing single-cell gene expression data. The first one is an integration method which enables joint analysis of single-cell samples from different biological conditions. This method can learn coordinated gene expression patterns that are common among, or specific to, different biological conditions, and identify cellular types across single-cell samples. I will also discuss the applicability of our method in diverse biomedical problems. The second one is a computational method for identifying, quantifying, and comparing RNA transcripts from scRNA-seq data. Accurate and sensitive profiling of RNA transcripts is of great importance in understanding the mechanisms and consequences of gene expression regulation and can have diagnostic values in clinical settings. We propose a method to address computational questions arising from this biological problem.

 

Mathematics Colloquium Distinguished Lecture Series
Monday, April 17, 2023; 4:00pm
Speaker Dr. Catherine (Kate) Calder, Professor and Chair, Department of Statistics and Data Sciences, University of Texas Austin
Title Statistical and Ethical Considerations in the Analysis of Human Activity Pattern Data
Abstract

In the biomedical and social sciences, mobile phone tracking (MPT) data — collected using location sensing technologies readily available on smartphones — has become an increasingly common component of cohort studies, where it has been employed for purposes of digital phenotyping or estimating personal exposure to the ambient environment or particular social contexts. Notwithstanding meaningful progress for interpolating movement and summarizing activity, there are numerous statistical challenges associated with using it for research purposes. For example, there is not a formal statistical infrastructure for parameter inference and trajectory imputation under various forms of missing data that are ubiquitous in practice. In this talk, I will introduce a foundational statistical model for studying individual human mobility using MPT data by formalizing the so-called flight-pause paradigm for human movement as a likelihood for a random object, called a motion, made up of increments of changes in space and time. Under this model, it is possible to illuminate the consequences of different MPT data collection mechanisms, including the surprising result that common assumptions about the missing data mechanism for MPT are not valid for the mechanism governing the random motions of the flight-pause model. The consequences of missing data and proposed adjustments will be illustrated using both simulations and real data, illustrating how the statistical formulation pursued here can serve as a foundation for continued statistical research on MPT data collection, design, and analysis. Finally, I will briefly discuss some ethical considerations related to the use of MPT data for research purposes. This is joint work with Marcin Jurek, Cory Zigler, and Chris Browning.

Bio

Catherine (Kate) Calder is a professor in the Department of Statistics and Data Sciences (SDS) and currently serves as the department chair. She holds a B.A. in mathematics from Northwestern University and an M.S. and Ph.D. in Statistics and Decision Sciences from Duke University. Before moving to the University of Texas in 2019, she spent 16 years on the faculty of The Ohio State University. She served as an associate director (2015–2018) and co-director (2018–2019) of the Mathematical Biosciences Institute, an NSF Division of Mathematical Sciences Research Institute located on the Ohio State campus. Dr. Calder’s methodological research interests are in spatio-temporal statistics, Bayesian methods, and network analysis. She has made contributions in the areas of convolution-based nonstationary spatial modeling, parameter estimation in spatial regression models, community detection in bi-partite networks, and non-Euclidean latent space models for network data. Most of her applied work focuses on problems that broadly fall under the umbrella of exposure/contextual effects analysis. She has developed statistical methods for quantifying individual- and population- level exposures that account for human mobility and network dependence and for examining both the causes and consequences of social and environmental exposures. For the past twelve years, she has collaborated with a team of interdisciplinary researchers on the Adolescent Health and Development in Context (AHDC) Study, a longitudinal study of a representative sample of adolescents and their caregivers that captures high- resolution geo-referenced activity pattern data, ecological momentary assessments, biomeasures, and other measures. She currently leads an NIH NICHD-funded project that uses AHDC data to construct ecological networks – bipartite networks of youth and the places they spend time – and examines the consequences of ecological network structure and indirect exposures to social contexts on health and developmental outcomes. Dr. Calder has held numerous editorial appointments and has served the profession through various elected roles in sections of the American Statistical Association (ASA) and in the International Society for Bayesian Analysis (ISBA). Currently, she chairs the ASA Committee on Funded Research and is a member of the ASA Board of Director. Her research has been funded by the NIH, NSF, NASA, and other agencies and foundations. She received the ASA Section on Statistics and the Environment’s 2013 Young Investigator Award and is a Fellow of the ASA and American Association for the Advancement of Science (AAAS).

Resources Poster

 

Mathematics Colloquium Distinguished Lecture Series
Monday, March 6, 2023; 4:00pm
Speaker Dr. Robin Wilson, Professor, Department of Mathematics, Loyola Marymount University
Title "Well, it’s obvious": College students experiences with mathematical microaggressions
Abstract

In a 2015 article in the MAA Focus Francis Su introduced the term mathematical microaggression which refers to the subtle ways in which mathematical authorities use language, behavior, and assumptions that communicate negative messages to students that they do not belong in mathematics. This talk will share the results of an analysis of the reflections of 173 undergraduate mathematics students who were asked to read and reflect on Su’s article. Findings from a preliminary analysis by our research team show that students experienced different types of mathematical microaggressions, including microslights, microinsults, and environmental microaggressions. Students indicated that they have perceived receiving them from teachers, peers, and also textbooks. Our analysis revealed that women were more likely to report experiences with math microaggressions than men. Future analysis will look more closely at how the three types of math microaggressions are experienced across racial and ethnic groups and gender. This study supports the need to investigate this phenomenon further and to develop practices that create more inclusive spaces in mathematics classrooms.

Bio

Dr. Robin Wilson is a Professor in the Department of Mathematics at Loyola Marymount University. He finished his PhD at UC Davis and he joined the faculty at Cal Poly Pomona in 2007 after an appointment as a UC President’s Postdoctoral Scholar in the Department of Mathematics at UC Santa Barbara. He has also been a Visiting Professor at Georgetown University and Pomona College. Dr. Wilson is currently Co-Director of the NSF Funded Bolstering the Advancement of Masters in Mathematics (BAMM!) Program. His current research interests include both low-dimensional topology and mathematics education.

Resources Poster

 

Statistics Seminar
Wednesday, November 16, 2022; 11:00am
Speaker Dr. Mingan Yang, Assistant Professor, SDSU School of Public Health
Title Bayesian Variable Selection for Mixed Effects Models
Abstract

In analysis of a linear model, one of the main objectives is to assess significant predictors of the outcome variables. However, this is quite challenging for linear mixed effects models due to added predictors among the random effects.In this article, we address the problem of joint selection of both fixed effects and random effects in mixed models. We use a stochastic search Gibbs sampler to implement a fully Bayesian approach for variable selection. The approach is illustrated using simulated data and a real example.

 

Statistics Seminar
Wednesday, November 9, 2022; 11:00am
Speaker Dr. Faming Liang, Distinguished Professor, Department of Statistics, Purdue University
Title Adapting Deep Learning for Statistical Inference
Abstract

Deep learning has powered the recent developments of modern data science. However, from the perspective of statistical modeling, the deep neural network (DNN) models suffer from many fundamental issues such as over-parameterization, local traps, and unquantifiable prediction uncertainty, making them hard to be used for statistical inference. To address this issue, we introduce two techniques, sparse deep learning and stochastic neural networks, in this talk. For the former, we propose a frequentist-like method for learning sparse DNNs and justify its consistency under the Bayesian framework. The proposed method can learn a sparse DNN with at most $O(n/\log(n))$ connections and nice theoretical guarantees such as posterior consistency, nonlinear variable selection consistency, and quantifiable prediction uncertainty. For the latter, we show that a kernel-expanded stochastic neural network can avoid local traps and lead to accurate predictions with quantifiable uncertainty. We will also show that the stochastic neural network leads to a general nonlinear sufficient dimension method for high-dimensional data.

 

Mathematics Colloquium Distinguished Lecture Series
Monday, November 7, 2022; 4:00pm
Speaker Dr. Mohamed Omar, Associate Professor, Department of Mathematics, Harvey Mudd College
Title How Many Cards Can Avoid a SET?
Abstract

SET is a popular real-time card game where players search for special triples of cards among a table of cards that are face-up. A common issue when playing the game is not having a SET among the face-up cards. What is the maximum number of cards that can be face-up while avoiding a SET? Surprisingly, this question is at the heart of a decades old central problem in extremal combinatorics and additive number theory that had a major breakthrough in 2017. In this talk, we describe the breakthrough, and how the presenter used ideas in its development to make headway on a range of disparate problems in combinatorics.

Bio

Dr. Mohamed Omar is an associate professor of mathematics and the Joseph B. Platt Chair in Effective Teaching at Harvey Mudd College. He has received national awards for his research, including being the inaugural recipient of the American Mathematical Society's Claytor-Gilmer Fellowship and an inaugural recipient of the Karen EDGE Fellowship, both celebrating mid-career research. He has also earned the Henry L. Alder Award, the preeminent junior faculty national prize given by the Mathematical Association of America. He is the author of over 30 peer-reviewed articles in internationally recognized journals, studying the interaction between algebra and combinatorics.

Resources Poster

 

Statistics Seminar
Wednesday, October 19, 2022; 11:00am
Speaker Dr. Jianwei Chen, Professor, SDSU Department of Mathematics and Statistics
Title Identifiability of compartment model for infectious diseases under both perfect and flawed data
Abstract

Compartment modeling has been used extensively in epidemics to understand and predict infectious diseases. With the increasing data availability, mathematical models fit incidence data are used to estimate disease key transmission parameters. During this process, one important question rising regarding the model identifiability which handles the question of whether parameters can be correctly and accurately recovered given available data. In this talk, I will demonstrate the problems in incidence data accuracy with Covid 19 cases in Imperial Vally. Then, I use both a simple SEIR model and a complex eight-compartment model to demonstrate the impact of data type, data resolution, and optimization tools used in parameter estimation in accessing models' identifiability.

 

Statistics Seminar
Wednesday, October 5, 2022; 11:00am
Speaker Dr. Wenxin Zhou, Associate Professor, UCSD Department of Mathematics
Title Joint quantile and expected shortfall regression: a robust approach
Abstract

Expected Shortfall (ES), also known as superquantile or conditional Value-at-Risk, has been recognized as an important measure in risk analysis and stochastic optimization, and also finds applications beyond these areas. In finance, it refers to the conditional expected return of an asset given that the return is below some quantile of its distribution. In this work, we consider a recently proposed joint regression framework that simultaneously models the quantile and the ES of a response variable given a set of covariates, for which the state-of-the-art approach is based on minimizing a joint loss function that is non-differentiable and non-convex. This inevitably raises numerical challenges, and thus limits its applicability for analyzing large-scale data. Motivated by the idea of using Neyman-orthogonal scores to reduce sensitivity with respect to nuisance parameters, we propose a statistically robust (to highly skewed and/or heavy-tailed data) and computationally efficient two-step procedure for fitting joint quantile and ES regression models. Under increasing-dimensional settings, we establish explicit non-asymptotic bounds on estimation and Gaussian approximation errors, which lay the foundation for statistical inference of ES regression.

 

Statistics Seminar
Wednesday, September 21, 2022; 11:00am
Speaker Jianwei Chen, Associate Professor, San Diego State University Department of Mathematics and Statistics
Title Incorporating order restrictions in survey domain mean estimation and inference
Abstract

Recent work in survey domain estimation has shown that incorporating a priori assumptions about orderings of population domain means reduces the variance of the estimators, hence providing smaller confidence intervals with good coverage. The R package csurvey allows users to implement order and shape constraints using a design specified in the well-known survey package. A test for constant versus increasing domain means is implemented, with generalizations to other one-sided tests. A novel method for estimating means in domains for which the sample size is zero is proposed, with a conservative variance estimate and confidence interval, and the method is extended to estimation and inference in domains with sample size of ten or smaller. Several examples with well-known survey data sets show the utility of the methods.

 

Statistics Seminar
Wednesday, September 14, 2022; 11:00am
Speaker Dr. Xialu Liu, Associate Professor, SDSU Management Information Systems Department
Title Factor models for matrix-valued high-dimensional time series
Abstract

In finance, economics and many other fields, observations in a matrix form are often ob- served over time. For example, many economic indicators are obtained in different countries over time. Various financial characteristics of many companies are reported over time. Although it is natural to turn a matrix observation into a long vector then use standard vector time series models or factor analysis, it is often the case that the columns and rows of a matrix represent different sets of information that are closely interrelated in a very structural way. We propose a novel factor model that maintains and utilizes the matrix structure to achieve greater dimensional reduction as well as finding clearer and more interpretable factor struc- tures. Estimation procedure and its theoretical properties are investigated and demonstrated with simulated and real examples.

 

Statistics Seminar
Wednesday, September 7, 2022; 11:00am
Speaker Weining Shen, Associate Professor, UC Irvine Department of Statistics
Title Covariance estimation for matrix data analysis
Abstract

Matrix-valued data has received an increasing interest in applications such as neuroscience, environmental studies and sports analytics. In this talk, I will discuss a recent project on estimating the covariance of matrix data. Unlike previous works that rely heavily on matrix normal distribution assumption and the requirement of fixed matrix size, I will introduce a class of distribution-free regularized covariance estimation methods for high-dimensional matrix data under a separability condition and a bandable covariance structure.Computational algorithms, theoretical results, and applications will be discussed.