QML Website: AERA/NCME 2019

Evaluation of Covariate Balance as a Quality Indicator for Propensity Score Matching Analysis
- Kamata, A., Gallegos, E., Patarapichayatham, C., & Kara, Y.
  - Propensity score analysis is becoming increasingly popular in educational research. Examining the covariate balance is considered to be crucial to justify the quality of propensity score analysis results. However, it has been pointed out that solely considering how covariates balance after matching may not be enough for justifying the quality of the propensity score analysis result. Suitable covariate balance may still yield biased estimates of treatment effects. The current study aims to systematically demonstrate this problem by a series of simulation studies by investigating the effect of the number of covariates, types of covariates, and degrees of association between covariates and treatment and/or outcome in the propensity score model.
Hierarchical Bayesian Approach to Estimate Interventional Effect in Randomized Control Trials
- Liang, X., Kamata, A., Sirganci, G., & Li, J.
  - We propose a framework to construct hyper priors for both the mean and variance hyperparameters for estimating interventional effects in a two-group randomized control trial. One important issue in Bayesian estimation is the determination of an effective informative prior, especially with small sample sizes. With hierarchical Bayesian modeling, the uncertainty of hyperparameters of a prior can be further modeled via their own priors, namely, hyper priors. The performance of hierarchical Bayesian models was compared with empirical Bayesian models where hyperparameters were constants. In a preliminary analysis, the hierarchical Bayesian approach showed promising improvement on posterior inferences compared to the empirical Bayesian approach.
Evaluation of the Utility of Informative Priors in SEM with Small Samples
- Ma, H., Kamata, A., & Kara, Y.
  - This study evaluates the performance of different estimators on factor loadings and structural coefficients in terms of bias, RMSE, and SE for factor analysis and SEM models under the ML and Bayesian framework with small sample settings. Simulation conditions varied in sample sizes, mean factor loading, priors, and estimators.
Power Analysis for Moderated Mediation Models with Continuous Moderator Variable
- Kamata, A., Kara, Y., Liang, X., & Le, N.
  - Although there have been a number of studies in the literature that made contributions to guide practitioners on performing power analyses for mediation models, not many focused on moderated mediation models. Although some demonstrated power analysis with moderated mediation effect, they were only with categorical moderator variables with very few categories. The aim in this study is to fill the mentioned gaps in the literature by demonstrating the power analysis of moderated mediation models with continuous moderator variables so that practitioners can implement such analysis.
Quantifying Reliability for Oral Reading Fluency Assessment using the Grubbs Model
- Potgieter, C., & Kamata, A.
  - This study proposes to apply the Grubbs model to estimate oral reading fluency more accurately. It is demonstrated that application of the Grubbs model allows quantifying measurement error variance to determine optimal weights to combine passage-level data to improve the fluency score for each student.
Classification Predictors for Reading Fluency in Oral Reading Fluency Assessment
- Kamata, A., Kara, Y., Patarapichayatham, C., Le, N.
  - This study investigates potential predictors of reading fluency in student oral reading fluency assessment passages. We derive a number of variables from word-level reading time, silence time, and accuracy. Then, we fit machine learning algorithms to explore how the derived variables classify readers into groups of different levels of fluency.
DIF Analysis for Immigrant Status for the 2015 PISA Science Items
- Usta, G., & Kamata, A.
  - The purpose of this study is to use Differential Item Functioning (DIF) analysis to investigate differences in the performance of immigrant and non-immigrant students in 48 cognitive science items for the U.S. sample in 2015 PISA. According to the results, all items displayed level-A DIF, indicating that the items had negligible DIF effect.
The Use of Automated Scoring to Evaluate the In-Depth Vocabulary Knowledge of English Learners
- Baker, D. L., Sano, M., Le, N., Collazo, M., & Kamata, A.
  - The purpose of this paper is to (a) test a new automated system developed to score the hand transcribed speech data of second grade Hispanic students, and (b) compare the accuracy of two machine learning techniques. Speech from 217 Hispanic English Learners who were part of a larger study were analyzed using support vector machine (SVM) and tree-based regression (TBR). Findings indicate that the reliability of the automated scoring systems were comparable to human scoring, and that when comparing SVM and TBR, the latter appeared to improve higher Quadratic Weighted Kappa than SVM. Implications of this study are discussed in the context of finding new ways to analyze student natural speech.
School Leader Instructional Support and Change in Novice Teachers’ Efficacy
- Wilhelm, A. G., Woods, D. M. & Yusuf Kara, Y.
  - Novice teachers’ self-efficacy provides a lens through which we might understand how their self-perceptions of competence relate to teaching practices and retention. A number of cross-sectional studies have reported relations between school leader support, especially instructional leadership, and teacher self-efficacy. In this study we examine change in novice teachers’ self-efficacy using growth models within the structural-equational modeling framework and relate change in self-efficacy to change in teachers’ perceptions of school leader instructional support over time. Our results add support to the notion that school leaders’ instructional leadership is related to teachers’ self-efficacy and that instructional leadership can actually support growth in self-efficacy, at least in the case of first-year teachers.
Impacts of Measurement Error in Pretest on Treatment Effect Estimate in Pretest-and-Post Design Analysis
- Miyazaki, Y., Kamata, A., & Uekawa, K.
  - In program evaluation, evaluating the treatment effect accurately is of most importance. In order to achieve the goal, random assignment is desirable, but in reality the randomization can be imperfect, in which case preexisting difference will exist between treatment conditions. Furthermore, there always will be measurement errors in pretest. These two factors can create serious challenges for evaluators to accurately gauge the treatment effect and the impacts of these factors are not well known. Thus, in this paper, we addressed how much bias occurs in estimating treatment effect when the random assignment is imperfect and when there is a measurement error in pretest in evaluation studies that use the pretest-and-posttest design.
Evaluation of the Education for Sustainability in Galapagos Program
- Rahim, H., Luna, H., & Kamata, A.
  - Any program that seeks to improve its program as well as prove its outcomes benefits from a holistic evaluation approach. As the primary research and evaluation arm of the Education for Sustainability in Galapagos project, we ask and answer questions regarding how the program is going and trying to understand what changes, if any, occur in educators’ instructional practice and attitudes in specific constructs (e.g., grit, growth mindset, peer collaboration). In the process of doing so, we also examine fidelity of implementation and dosage of intervention to identify the key factors of the implementation that might lead to creating these potential outcome changes. In order to address needs of myriad stakeholders and implementers, the evaluation of this program utilizes a blend of developmental, empowerment, formative, process, and outcome evaluation approaches for a comprehensive evaluation that caters to different needs and for different purposes. For the purposes of this presentation, the focus will be on formative, process, and outcome data.