Statistics Seminar | |

Wednesday, November 9, 2022; 11:00am | |

Speaker | Dr. Faming Liang, Distinguished Professor, Department of Statistics, Purdue University |

Title | Adapting Deep Learning for Statistical Inference |

Abstract |
Deep learning has powered the recent developments of modern data science. However, from the perspective of statistical modeling, the deep neural network (DNN) models suffer from many fundamental issues such as over-parameterization, local traps, and unquantifiable prediction uncertainty, making them hard to be used for statistical inference. To address this issue, we introduce two techniques, sparse deep learning and stochastic neural networks, in this talk. For the former, we propose a frequentist-like method for learning sparse DNNs and justify its consistency under the Bayesian framework. The proposed method can learn a sparse DNN with at most $O(n/\log(n))$ connections and nice theoretical guarantees such as posterior consistency, nonlinear variable selection consistency, and quantifiable prediction uncertainty. For the latter, we show that a kernel-expanded stochastic neural network can avoid local traps and lead to accurate predictions with quantifiable uncertainty. We will also show that the stochastic neural network leads to a general nonlinear sufficient dimension method for high-dimensional data. |

Mathematics Colloquium Distinguished Lecture Series | |

Monday, November 7, 2022; 4:00pm | |

Speaker | Dr. Mohamed Omar, Associate Professor, Department of Mathematics, Harvey Mudd College |

Title | How Many Cards Can Avoid a SET? |

Abstract |
SET is a popular real-time card game where players search for special triples of cards among a table of cards that are face-up. A common issue when playing the game is not having a SET among the face-up cards. What is the maximum number of cards that can be face-up while avoiding a SET? Surprisingly, this question is at the heart of a decades old central problem in extremal combinatorics and additive number theory that had a major breakthrough in 2017. In this talk, we describe the breakthrough, and how the presenter used ideas in its development to make headway on a range of disparate problems in combinatorics. |

Bio |
Dr. Mohamed Omar is an associate professor of mathematics and the Joseph B. Platt Chair in Effective Teaching at Harvey Mudd College. He has received national awards for his research, including being the inaugural recipient of the American Mathematical Society's Claytor-Gilmer Fellowship and an inaugural recipient of the Karen EDGE Fellowship, both celebrating mid-career research. He has also earned the Henry L. Alder Award, the preeminent junior faculty national prize given by the Mathematical Association of America. He is the author of over 30 peer-reviewed articles in internationally recognized journals, studying the interaction between algebra and combinatorics. |

Resources | Poster |

Statistics Seminar | |

Wednesday, October 19, 2022; 11:00am | |

Speaker | Dr. Jianwei Chen, Professor, SDSU Department of Mathematics and Statistics |

Title | Identifiability of compartment model for infectious diseases under both perfect and flawed data |

Abstract |
Compartment modeling has been used extensively in epidemics to understand and predict infectious diseases. With the increasing data availability, mathematical models fit incidence data are used to estimate disease key transmission parameters. During this process, one important question rising regarding the model identifiability which handles the question of whether parameters can be correctly and accurately recovered given available data. In this talk, I will demonstrate the problems in incidence data accuracy with Covid 19 cases in Imperial Vally. Then, I use both a simple SEIR model and a complex eight-compartment model to demonstrate the impact of data type, data resolution, and optimization tools used in parameter estimation in accessing models' identifiability. |

Statistics Seminar | |

Wednesday, October 5, 2022; 11:00am | |

Speaker | Dr. Wenxin Zhou, Associate Professor, UCSD Department of Mathematics |

Title | Joint quantile and expected shortfall regression: a robust approach |

Abstract |
Expected Shortfall (ES), also known as superquantile or conditional Value-at-Risk, has been recognized as an important measure in risk analysis and stochastic optimization, and also finds applications beyond these areas. In finance, it refers to the conditional expected return of an asset given that the return is below some quantile of its distribution. In this work, we consider a recently proposed joint regression framework that simultaneously models the quantile and the ES of a response variable given a set of covariates, for which the state-of-the-art approach is based on minimizing a joint loss function that is non-differentiable and non-convex. This inevitably raises numerical challenges, and thus limits its applicability for analyzing large-scale data. Motivated by the idea of using Neyman-orthogonal scores to reduce sensitivity with respect to nuisance parameters, we propose a statistically robust (to highly skewed and/or heavy-tailed data) and computationally efficient two-step procedure for fitting joint quantile and ES regression models. Under increasing-dimensional settings, we establish explicit non-asymptotic bounds on estimation and Gaussian approximation errors, which lay the foundation for statistical inference of ES regression. |

Statistics Seminar | |

Wednesday, September 21, 2022; 11:00am | |

Speaker | Jianwei Chen, Associate Professor, San Diego State University Department of Mathematics and Statistics |

Title | Incorporating order restrictions in survey domain mean estimation and inference |

Abstract |
Recent work in survey domain estimation has shown that incorporating a priori assumptions about orderings of population domain means reduces the variance of the estimators, hence providing smaller confidence intervals with good coverage. The R package csurvey allows users to implement order and shape constraints using a design specified in the well-known survey package. A test for constant versus increasing domain means is implemented, with generalizations to other one-sided tests. A novel method for estimating means in domains for which the sample size is zero is proposed, with a conservative variance estimate and confidence interval, and the method is extended to estimation and inference in domains with sample size of ten or smaller. Several examples with well-known survey data sets show the utility of the methods. |

Statistics Seminar | |

Wednesday, September 14, 2022; 11:00am | |

Speaker | Dr. Xialu Liu, Associate Professor, SDSU Management Information Systems Department |

Title | Factor models for matrix-valued high-dimensional time series |

Abstract |
In finance, economics and many other fields, observations in a matrix form are often ob- served over time. For example, many economic indicators are obtained in different countries over time. Various financial characteristics of many companies are reported over time. Although it is natural to turn a matrix observation into a long vector then use standard vector time series models or factor analysis, it is often the case that the columns and rows of a matrix represent different sets of information that are closely interrelated in a very structural way. We propose a novel factor model that maintains and utilizes the matrix structure to achieve greater dimensional reduction as well as finding clearer and more interpretable factor struc- tures. Estimation procedure and its theoretical properties are investigated and demonstrated with simulated and real examples. |

Statistics Seminar | |

Wednesday, September 7, 2022; 11:00am | |

Speaker | Weining Shen, Associate Professor, UC Irvine Department of Statistics |

Title | Covariance estimation for matrix data analysis |

Abstract |
Matrix-valued data has received an increasing interest in applications such as neuroscience, environmental studies and sports analytics. In this talk, I will discuss a recent project on estimating the covariance of matrix data. Unlike previous works that rely heavily on matrix normal distribution assumption and the requirement of fixed matrix size, I will introduce a class of distribution-free regularized covariance estimation methods for high-dimensional matrix data under a separability condition and a bandable covariance structure.Computational algorithms, theoretical results, and applications will be discussed. |

Statistics Seminar | |

Wednesday, April 27, 2022; 11:00am | |

Speaker | Juanjuan Fan, Professor, SDSU Department of Mathematics and Statistics |

Title | Repeated measures random forests: identifying factors associated with nocturnal hypoglycemia |

Abstract |
Nocturnal hypoglycemia is a common phenomenon among patients with diabetes and can lead to a broad range of adverse events and complications. Identifying factors associated with hypoglycemia can improve glucose control and patient care. We propose a repeated measures random forest (RMRF) method that can handle nonlinear relationships and interactions, and the correlated responses from patients evaluated over several nights. RMRF is an extension to the acceptance-rejection (AR) trees algorithm (Calhoun et al., 2019), a random forest variation. As a side, two other random forest (RF) variations will also be introduced including extremely randomized (ER) trees (Geurts et al., 2006) and smooth sigmoid surrogate (SSS) trees (Su et al., 2018), and the four RF algorithms (RF, AR, ER, and SSS) will be compared in terms of algorithm differences, prediction accuracy, variable selection bias, and computation time. We apply the RMRF algorithm to analyze a diabetes study with 2524 nights from 127 patients with type 1 diabetes. We find that nocturnal hypoglycemia is associated with HbA1c, bedtime blood glucose (BG), insulin on board, time system activated, exercise intensity, and daytime hypoglycemia. The RMRF can accurately classify nights at high risk of nocturnal hypoglycemia . |

Statistics Seminar | |

Wednesday, April 6, 2022; 11:00am | |

Speaker | David Goldberg, Assistant Professor, SDSU Management and Information Systems |

Title | Leveraging online reviews for product safety surveillance |

Abstract |
Product safety concerns pose enormous risks to consumers across the world. As discussions of consumer experiences have spread online, this research proposes the use of text mining to rapidly screen online media for mentions of safety hazards. The text mining approach in this research identifies unique words and phrases, or “smoke terms,” in online posts that indicate safety hazard-related discussions. In addition, this research shows that text mining-based risk can be analyzed on the product level, ranking products from most risky to least risky. The research has implications for monitoring the potential safety hazards in products already on the market and reacting to potential issues as quickly as possible. |

Statistics Seminar | |

Wednesday, March 16, 2022; 11:00am | |

Speaker | Mark Huber, Professor, Claremont McKenna College |

Title | Robust estimators for Monte Carlo data |

Abstract |
Data coming from Monte Carlo experiments is often analyzed in the same way as data from more traditional sources. The unique nature of Monte Carlo data, where it is easy to take a random number of samples, allows for estimators where the user can control the relative error of the estimate much more precisely than with classical approaches. In this talk I will discuss three such estimators useful in different problems. The first is a user-specified-relative-error (USRE) estimate for the mean of a Bernoulli random variable. This allows us to obtain exact error results while using slightly fewer samples than the CLT approximation. The second is more general, applying to any random variable where a bound on the relative error is known. For this problem we give exact error bounds using a number of samples that is the same (to first order) as the CLT approximation requires. In other words, the new algorithm is the equivalent of always actually having normal data. Finally, we look at the problem of data with unknown variance and develop an algorithm that runs very close to the minimum number of samples established using results of Wald. |

Mathematics Colloquium Distinguished Lecture Series | |

Monday, March 7, 2022; 4:00 pm |

Speaker | Mark Alber, Department of Mathematics, UC Riverside |

Title | Combined multiscale mathematical modeling and experimental analysis suggests possible mechanism of shoot meristem maintenance in plants |

Abstract | Stem cell maintenance in multilayered shoot apical meristems (SAMs) of plants requires strict regulation of cell growth and division. Exactly how the complex milieu of chemical (WUSCHEL and cytokinin) and mechanical signals interact to determine cell division plane orientation and shape of the SAM is not well understood. By using a newly developed mathematical model, combined with experiments, three hypothesized mechanisms have been tested for the regulation of cell division plane orientation as well as of cell expansion in the deeper SAM cell layers. Simulations predict that in the apical cell layers, WUSCHEL and cytokinin regulate the direction of anisotropic cell expansion, and cells divide according to tensile stress. In the basal cell layers, simulations also show dual roles for WUSCHEL and cytokinin in regulating both cell division plane orientation and the direction of anisotropic expansion. This layer-specific mechanism maintains the experimentally observed shape and structure of the SAM as well as the distribution of WUSCHEL in the tissue [1]. Moreover, by using a dynamical signaling model, an additional mechanism underlying robustness maintenance of WUSCHEL gradient through its negative regulator, has been identified. Sensitivity analysis and perturbation study were performed to show validity of the mechanism across different parameter ranges [2]. Currently, a coupled computational framework is being developed by integrating sub models representing a dynamical signaling network and cell mechanics to explore how the WUSCHEL expression domain and the tissue structure are maintained throughout the growth. |

Bio | Professor Mark Alber earned his Ph.D. in mathematics at the University of Pennsylvania under the direction of J. E. Marsden (UC Berkeley and Caltech). He held several positions at the University of Notre Dame including most recently Vincent J. Duncan Family Chair in Applied Mathematics. He is currently Distinguished Professor in the Department of Mathematics and Director of the Center for Quantitative Modeling in Biology, UC Riverside. Dr. Alber was elected a Fellow of the American Association for the Advancement of Science (AAAS) in 2011. He is currently a deputy editor of PLoS Computational Biology and member of editorial boards of Bulletin of Mathematical Biology and Biophysical Journal. His research interests include mathematical and computational multiscale modeling of blood clot formation, plants development and growth and epithelial tissue growth. |

Resources | Poster |

Statistics Seminar | |

Friday, January 28, 2022; 11:00am | |

Speaker | Hajar Homayouni, Assistant Professor, Computer Science, San Diego State University |

Title | Anomaly detection and explanation in big data |

Abstract |
Data quality tests are used to validate the data stored in databases and data warehouses, and to detect violations of syntactic and semantic constraints. Domain experts grapple with the issues related to the capturing of all the important constraints and checking that they are satisfied. The constraints are often identified in an ad hoc manner based on the knowledge of the application domain and the needs of the stakeholders. Constraints can exist over single or multiple attributes as well as records involving time series and sequences. The constraints involving multiple attributes can involve both linear and non-linear relationships among the attributes. We propose ADQuaTe as a data quality test framework that automatically (1) discovers different types of constraints from the data, (2) marks records that violate the constraints as suspicious, and (3) explains the violations. Domain knowledge is required to determine whether or not the suspicious records are actually faulty. The framework can incorporate feedback from domain experts to improve the accuracy of constraint discovery and anomaly detection. We instantiate ADQuaTe in two ways to detect anomalies in non-sequence and sequence data. The first instantiation (ADQuaTe2) uses an unsupervised approach called autoencoder for constraint discovery in non-sequence data. ADQuaTe2 is based on analyzing records in isolation to discover constraints among the attributes. We evaluate the effectiveness of ADQuaTe2 using real-world non-sequence datasets from the human health and plant diagnosis domains. We demonstrate that ADQuaTe2 can discover new constraints that were previously unspecified in existing data quality tests and can report both previously detected and new faults in the data. We also use non-sequence datasets from the UCI repository to evaluate the improvement in the accuracy of ADQuaTe2 after incorporating ground truth knowledge and retraining the autoencoder model. The second instantiation (IDEAL) uses an unsupervised LSTM-autoencoder for constraint discovery in sequence data. IDEAL analyzes the correlations and dependencies among data records to discover constraints. We evaluate the effectiveness of IDEAL using datasets from Yahoo servers, NASA Shuttle, and Colorado State University Energy Institute. We demonstrate that IDEAL can detect previously known anomalies from these datasets. Using mutation analysis, we show that IDEAL can detect different types of injected faults. We also demonstrate that the accuracy of the approach improves after incorporating ground truth knowledge about the injected faults and retraining the LSTM-Autoencoder model. The novelty of this research lies in the development of a domain-independent framework that effectively and efficiently discovers different types of constraints from the data, detects and explains anomalous data, and minimizes false alarms through an interactive learning process. |