Your conditions: 游晓锋
  • Confidence Interval Width Contours: Sample Size Planning for Linear Mixed-Effects Models

    Subjects: Psychology >> Statistics in Psychology submitted time 2023-10-07

    Abstract: Hierarchical data, which is observed frequently in psychological experiments, is usually analyzed with the linear mixed-effects models (LMEMs), as it can account for multiple sources of random effects due to participants, items, and/or predictors simultaneously. However, it is still unclear of how to determine the sample size and number of trials in LMEMs. In history, sample size planning was conducted based purely on power analysis. Later, the influential article of Maxwell et al. (2008) has made clear that sample size planning should consider statistical power and accuracy in parameter estimation (AIPE) simultaneously. In this paper, we derive a confidence interval width contours plot with the codes to generate it, providing power and AIPE information simultaneously. With this plot, sample size requirements in LMEMs based on power and AIPE criteria can be decided. We also demonstrated how to run sensitivity analysis to assess the impact of the magnitude of experiment effect size and the magnitude of random slope variance on statistical power, AIPE and the results of sample size planning.
    There were two sets of sensitivity analysis based on different LMEMs. Sensitivity analysis Ⅰ investigated how the experiment effect size influenced power, AIPE and the requirement of sample size for within-subject experiment design, while sensitivity analysis Ⅱ investigated the impact of random slope variance on optimal sample size based on power and AIPE analysis for the cross-level interaction effect. The results for binary and continuous between-subject variables were compared. In these sensitivity analysis, two factors regarding sample size varied: number of subjects (I=10, 30, 50, 70, 100, 200, 400, 600, 800), number of trials (J=10, 20, 30, 50, 70, 100, 150, 200, 250, 300). The additional manipulated factor was the effect size of experiment effect (standard coefficient of experiment condition= 0.2, 0.5, 0.8, in sensitivity analysis Ⅰ) and the magnitude of random slope variance (0.01, 0.09 and 0.25, in sensitivity analysis Ⅱ). A random slope model was used in sensitivity analysis Ⅰ, while a random slope model with level-2 independent variable was used in sensitivity analysis Ⅱ. Data-generating model and fitted model were the same. Estimation performance was evaluated in terms of convergence rate, power, AIPE for the fixed effect, AIPE for the standard error of the fixed effect, and AIPE for the random effect.
    The results are as following. First, there were no convergence problems under all the conditions , except that when the variance of random slope was small and a maximal model was used to fit the data. Second, power increased as sample size, number of trials or effect size increased. However, the number of trials played a key role for the power of within-subject effect, while sample size was more important for the power of cross-level effect. Power was larger for continuous between-subject variable than for binary between-subject variable. Third, although the fixed effect was accurately estimated under all the simulation conditions, the width 95% confidence interval (95%width) was extremely large under some conditions. Lastly, AIPE for the random effect increased as sample size and/or number of trials increased. The variance of residual was estimated accurately. As the variance of random slope increased, the accuracy of the estimates of variances of random intercept decreased, and the accuracy of the estimates of random slope increased.
    In conclusion, if sample size planning was conducted solely based on power analysis, the chosen sample size might not be large enough to obtain accurate estimates of effects size. Therefore, the rational for considering statistical power and AIPE during sample size planning was adopted. To shed light on this issue, this article provided a standard procedure based on a confidence interval width contours plot to recommend sample size and number of trials for using LMEMs. This plot visualizes the combined effect of sample size and number of trials per participant on 95% width, power and AIPE for random effects. Based on this tool and other empirical considerations, practitioners can make informed choices about how many participants to test, and how many trials to test each one for.
     

  • 用于处理不努力作答的标准化残差系列方法和混合多层模型法的比较

    Subjects: Psychology >> Social Psychology submitted time 2023-03-27 Cooperative journals: 《心理学报》

    Abstract: Assessment datasets contaminated by non-effortful responses may lead to serious consequences if not handled appropriately. Previous research has proposed two different strategies: down-weighting and accommodating. Down-weighting tries to limit the influence of aberrant responses on parameter estimation by reducing their weight. The extreme form of down-weighting is the detection and removal of irregular responses and response times (RTs). The standard residual-based methods, including the recently developed residual method using an iterative purification process, can be used to detect non-effortful responses in the framework of down-weighting. In accommodating, on the other hand, one tries to extend a model in order to account for the contaminations directly. This boils down to a mixture hierarchical model (MHM) for responses and RTs. However, to the authors’ knowledge, few studies have compared standard residual methods and MHM under different simulation conditions. It is unknown which method should be applied in different situations. Meanwhile, MHM has strong assumptions for different types of responses. It would be valuable to examine the performance of the method when the assumptions are violated. The purpose of this study is to compare standard residual methods and MHM under a fully crossed simulation design. In addition, specific recommendations for their applications are provided. The simulation study included two scenarios. In simulation scenario I, data were generated under the assumptions of MHM. In simulation scenario II, the assumptions of MHM concerning non-effortful responses and RTs were both violated. Simulation scenario I had three manipulated factors. (1) Non-effort prevalence (ππ\pi ), which was the proportion of individuals with non-effortful responses. It had three levels: 0%, 20% and 40%. (2) Non-effort severity (πnoniπinon\pi _{i}^{non}), which was the proportion of non-effortful responses for each non-effortful individual. It varied between two levels: low and high. When πnoniπinon\pi _{i}^{non} was low, πnoniπinon\pi _{i}^{non} was generated from U (0, 0.25); while when πnoniπinon\pi _{i}^{non} was high, πnoniπinon\pi _{i}^{non} was generated from U (0.5, 0.75), where “U” denoted a uniform distribution. (3) Difference between RTs of non-effortful and effortful responses (dRTdRT{{d}_{RT}}). The difference between RTs from two groups, dRTdRT{{d}_{RT}}, had two levels, small and large. The logarithm of RTs of non-effortful responses were generated from normal distribution N (μμ\mu ,0.50.50.52), where μ =−1 μ =−1\text{ }\!\!\mu\!\!\text{ }=-1 when dRTdRT{{d}_{RT}} was small, μ =−2 μ =−2\text{ }\!\!\mu\!\!\text{ }=-2 when dRTdRT{{d}_{RT}} was large. For generating the non-effortful responses, we followed Wang, Xu and Shang (2018), with the probability of a correct response gjgj{{g}_{j}} setting at 0.25 for all non-effortful responses. In simulation scenario II, only the first two factors were considered. Non-effortful RTs were generated from a uniform distribution with a lower bound of exp(−5)exp(−5)\text{exp}\left( -5 \right) and upper bound being the 5th percentile of RT on item j with τ=0τ=0\tau =0. The probability of a correct response for non-effortful responses was dependent on the ability level of each examinee. In all the conditions, sample size was fixed at I = 2,000 and test length was fixed at J = 30. For each condition, 30 replications were generated. For effortful responses, Responses and RTs were simulated from van der Linden’s (2007) hierarchical model. Item parameters were generated with aj ~U(1,2.5)aj ~U(1,2.5){{a}_{j}}\tilde{\ }U\left( 1,2.5 \right), bj ~N(0,1)bj ~N(0,1){{b}_{j}}\tilde{\ }N\left( 0,1 \right), αj ~U(1.5,2.5),βj ~U(−0.2,0.2) αj ~U(1.5,2.5),βj ~U(−0.2,0.2)~{{\alpha }_{j}}\tilde{\ }U\left( 1.5,2.5 \right),{{\beta }_{j}}\tilde{\ }U\left( -0.2,0.2 \right). For simulees, the person parameters (θi,τi)(θi,τi)\left( {{\theta }_{i}},{{\tau }_{i}} \right) were generated from a bivariate normal distribution with the mean vector of μ=(0,0)′μ=(0,0)′\mathbf{\mu }=\left( 0,0 \right)'and the covariance matrix of Σ=[10.250.250.25]Σ=[10.250.250.25]\mathbf{\Sigma }=\left[ \begin{matrix} 1 & 0.25 \\ 0.25 & 0.25 \\ \end{matrix} \right]. Four methods were compared under each condition: the original standard residual method (OSR), conditional estimate standard residual (CSR), conditional estimate with fixed item parameters standard residual method using iterative purifying procedure (CSRI), and MHM. These methods were implemented in R and JAGS using a Bayesian MCMC sampling method for parameter calibration. Finally, these methods were evaluated in terms of convergence rate, detection accuracy and parameter recovery. The results are presented as following. First of all, MHM suffered from convergence issues, especially for the latent variable indicating non-effortful responses. On the contrary, all the standard residual methods achieved convergence successfully. The convergence issues were more serious in simulation scenario II. Secondly, when all the items were assumed to have effortful responses, the false positive rate (FPR) of MHM was 0. Although the standard residual methods had FPR around 5% (the nominal level), the accuracy of parameter estimates was similar for all these methods. Third, when data were contaminated by non-effortful responses, CSRI had higher true positive rate (TPR) almost in all the conditions. MHM showed lower TPR but lower false discovery rate (FDR), exhibiting even lower TPR in simulation scenario II. When πnoniπinon\pi _{i}^{non} was high, CSRI and MHM showed more advantages over the other methods in terms of parameter recovery. However, when πnoniπinon\pi _{i}^{non} was high and dRTdRT{{d}_{RT}} was small, MHM generally had higher RMSE than CSRI. Compared to simulation scenario I, MHM performed worse in simulation scenario II. The only problem CSRI needed to deal with was its overestimation of time discrimination parameter across all the conditions except for when ππ\pi =40% and dRTdRT{{d}_{RT}} was large. In a real data example, all the methods were applied to a dataset collected for program assessment and accountability purposes from undergraduates at a mid-sized southeastern university in USA. Evidences from convergence validity showed that CSRI and MHM might detect non-effortful responses more accurately and obtain more precise parameter estimates for this data. In conclusion, CSRI generally performed better than the other methods across all the conditions. It is highly recommended to use this method in practice because: (1) It showed acceptable FPR and fairly accurate parameter estimates even when all responses were effortful; (2) It was free of strong assumptions, which meant that it would be robust under various situations; (3) It showed most advantages when πnoniπinon\pi _{i}^{non} was high in terms of the detection of non-effortful responses and the improvement of the parameter estimation. In order to improve the estimation of time discrimination parameter in CSRI, the robust estimation methods that down-weight flagged response patterns can be used as an alternative to directly removing non-effortful responses (i.e., the method in the current study). MHM can perform well when all its assumptions are met and πnoniπinon\pi _{i}^{non} is high, dRTdRT{{d}_{RT}} is large. However, some parameters have difficulty in convergence under MHM, which will limit its application in practice.

  • 认知诊断评估中缺失数据的处理:随机森林阈值插补法

    Subjects: Psychology >> Social Psychology submitted time 2023-03-27 Cooperative journals: 《心理学报》

    Abstract: As a new form of test, cognitive diagnostic assessment has attracted wide attention from researchers at home and abroad. At the same time, missing data caused by characteristics of the test design is a rather common issue encountered in cognitive diagnostic tests. It is therefore very important to develop an effective solution for dealing with missing data in cognitive diagnostic assessment ensuring that diagnosis feedback provided to both students and teachers is more accurate and reliable. As a matter of fact, machine learning has been applied to impute missing data in recent years. As one of the machine learning algorithms, the random forest has been proved to be a state-of-the-art learner because it exhibits good performance when handling classification and regression tasks with effectiveness and efficiency, and is capable of solving multi-class classification problems in an efficient manner. Interestingly, this algorithm has a distinct advantage in terms of coping with noise interference. Furthermore, the random forest imputation method, an improved algorithm for dealing with missing data based on the random forest algorithm, makes full use of the available response information and characteristics of response patterns of participants to impute missing data instead of assuming the mechanism of missingness in advance. By combining the advantages of the random forest method in classification and prediction and the assumption-free feature of the random forest imputation method, we attempt to improve the existing random forest imputation algorithm so that the method can be properly applied to handle missing data in cognitive diagnostic assessment. On the basis of the DINA (Deterministic Inputs, Noise "And" Gate) model, widely used in cognitive diagnostic assessment, we introduce the RCI (Response Conformity Index) into missing data imputation to identify threshold of imputation type and hence proposes a new method for handling missing responses in the DINA model: random forest threshold imputation (RFTI) approach. Two simulation studies have been conducted in order to validate the effectiveness of RFTI. In addition, the advantages of the new method have been explored by comparing it with traditional techniques for handling missing data. First, the theoretical basis and algorithm implementation of RFTI were described in detail. Then, two Monte Carlo simulations were employed to validate the effectiveness of RFTI in terms of imputation rate and accuracy as well as the accuracy in DINA model parameter estimation. Moreover, the applicability of RFTI was investigated by considering different mechanisms for missingness (MNAR, MIXED, MAR and MCAR) and different proportions of missing values (10%, 20% 30%, 40% and 50%). The main results indicated: (1) imputation accuracy of RFT was significantly higher than that of the random forest imputation (RFTI) methods, and the data missingness rate treated by RFTI was about 10% under all conditions; (2) the highest attribute pattern match ratio and attribute marginal match ratio of participants were observed using RFTI under all conditions as compared to that of EM algorithm and RFI. Moreover, this behavior depended on the proportion and mechanisms of missing data. Results indicated that this phenomenon became more obvious when the missingness mechanism was MNAR and MIXED and the proportion of missing responses were more than 30%. However, the new algorithm failed to show superiority in estimating DINA model parameter. Based on these results, we conclude the article with an overall summary and recommendations, as well as the further direction.

  • 认知诊断评估中缺失数据的处理:随机森林阈值插补法

    submitted time 2023-03-16 Cooperative journals: 《心理学报》

    Abstract: As a new form of test, cognitive diagnostic assessment has attracted wide attention from researchers at home and abroad. At the same time, missing data caused by characteristics of the test design is a rather common issue encountered in cognitive diagnostic tests. It is therefore very important to develop an effective solution for dealing with missing data in cognitive diagnostic assessment ensuring that diagnosis feedback provided to both students and teachers is more accurate and reliable. As a matter of fact, machine learning has been applied to impute missing data in recent years. As one of the machine learning algorithms, the random forest has been proved to be a state-of-the-art learner because it exhibits good performance when handling classification and regression tasks with effectiveness and efficiency, and is capable of solving multi-class classification problems in an efficient manner. Interestingly, this algorithm has a distinct advantage in terms of coping with noise interference. Furthermore, the random forest imputation method, an improved algorithm for dealing with missing data based on the random forest algorithm, makes full use of the available response information and characteristics of response patterns of participants to impute missing data instead of assuming the mechanism of missingness in advance. By combining the advantages of the random forest method in classification and prediction and the assumption-free feature of the random forest imputation method, we attempt to improve the existing random forest imputation algorithm so that the method can be properly applied to handle missing data in cognitive diagnostic assessment. On the basis of the DINA (Deterministic Inputs, Noise "And" Gate) model, widely used in cognitive diagnostic assessment, we introduce the RCI (Response Conformity Index) into missing data imputation to identify threshold of imputation type and hence proposes a new method for handling missing responses in the DINA model: random forest threshold imputation (RFTI) approach. Two simulation studies have been conducted in order to validate the effectiveness of RFTI. In addition, the advantages of the new method have been explored by comparing it with traditional techniques for handling missing data. First, the theoretical basis and algorithm implementation of RFTI were described in detail. Then, two Monte Carlo simulations were employed to validate the effectiveness of RFTI in terms of imputation rate and accuracy as well as the accuracy in DINA model parameter estimation. Moreover, the applicability of RFTI was investigated by considering different mechanisms for missingness (MNAR, MIXED, MAR and MCAR) and different proportions of missing values (10%, 20% 30%, 40% and 50%). The main results indicated: (1) imputation accuracy of RFT was significantly higher than that of the random forest imputation (RFTI) methods, and the data missingness rate treated by RFTI was about 10% under all conditions; (2) the highest attribute pattern match ratio and attribute marginal match ratio of participants were observed using RFTI under all conditions as compared to that of EM algorithm and RFI. Moreover, this behavior depended on the proportion and mechanisms of missing data. Results indicated that this phenomenon became more obvious when the missingness mechanism was MNAR and MIXED and the proportion of missing responses were more than 30%. However, the new algorithm failed to show superiority in estimating DINA model parameter. Based on these results, we conclude the article with an overall summary and recommendations, as well as the further direction.