Current Projects

Reliability for Natural Language Processing

Overview : Developed reliability methodologies for assessing natural language processing methods.

  • Developed a suite of three metrics for evaluating topic model reliability, surpassing the standard practice method. Established a definitive, optimal metric, demonstrating its critical role in downstream applications.
  • Currently developing reliability metrics for large language models.

Causal Inference for Natural Language Processing

Overview : Developed a casually sufficient dimension reduction methodology for text data.

  • Currently, when topic models are used for further understanding of text, the topics that are generated are simply interpreted manually by the user and any outcomes predicted by the text are modeled by a regression. This new methodology provides a way to make interpretable causal conclusions about topics and their predicted outcomes through a dimension reduction process. No such methodology exists currently.
  • In additon, this methodology is useful in situations where text is the treatment, which is a complex and multifaceted scenario that is not handled well by existing methodologies.

Past Projects

Experimental Design

Overview : Created a design comparable effect size for the single-case alternating treatment design.

  • Developed estimators of the effect size parameters and derived approximate small sample properties of the estimators analytically using quadratic forms.
  • Evaluated properties of the developed estimators using multilevel simulation models.
  • Created visualizations using trellis plots to communicate the results to a non-technical audience.

Modeling Brain Imagaing Data

Overview : Developed modeling and analysis for a package performing statistical analysis on a software that largely automated processing of MRI of the human brain.

  • Created mixed modeling capabilities for tensor based morphometry (TBM), diffusion parameter maps, and region of interest (ROIs) analyses.
  • Optimized mixed modeling processes to reduce computational intensity.
  • Implemented a variety of other statistical analyses and tests for TBM, diffusion parameter maps, and ROIs as well.
  • Created outputs in R Shiny to facilitate ease of output understanding for users.

Modeling Atmospheric Data Models

Overview : Determined which existing models for prediction of hazardous release movement are best suited for prediction, and uncovered patterns among outlying model types.

  • Modeled the outliers using k-means to determine which categories of models resulted in most and least effective prediction.
  • Uncovered outliers using Local Outlier Factor methodology.
  • Determined existing patterns among the outlying model types.