Submitted Date
Subjects
Authors
Institution
Your conditions: Psychological Measurement
  • Development and Validation of the Susceptibility to PUA Personality Traits Scale and the Characteristics Manifestation Scale of PUA Relationships

    Subjects: Psychology >> Psychological Measurement submitted time 2024-03-25

    Abstract: Objective: To explore the relationship between personal characteristics and the possibility of receiving PUA in the context of Chinese culture, compile a personal special quality table and the basic characteristic scale of PUA relations suitable for people who are susceptible to PUA in the context of Chinese culture, and test their credibility and validity. Methods: The initial questionnaire is formed by combining literature retrieval, theoretical model construction and questionnaire survey; 1,188 adults were selected as the subjects in the PUA Personal Quality Table, and 1,188 adults who had experienced or were experiencing intimate relationships in the PUA Relationship Performance Characteristic Table were selected as the subjects. The trial questionnaire carried out project analysis and exploratory factor analysis; both questionnaires carried out verification factor analysis and credibility test. Results: The scale is vulnerable to PUA personal special quality table contains 4 dimensions, a total of 20 items. The fitting index of the factor structure model is good, RMSEA=0.060, CFI=0.937, IFI=0.937, TLI=0.924, SRMR=0.04 2; The performance characteristic scale of the two PUA relationship contains 6 dimensions, with a total of 29 items. RMSEA=0.053, CFI=0.925, TLI=0.919, GFI=0.913, SRMR=0.059. The internal consistency between the total scale of scale 1 and each dimension is between 0.779-0.909, and the internal consistency between the total scale of scale II and each dimension is between 0.897-0.970. Conclusion: The credibility and validity of the PUA personal special quality scale and the PUA relationship performance characteristic scale are good, and can be used as one of the measurement tools for the study of personal characteristics and the possibility of PUA in the context of Chinese culture.

  • Core Items Selection and Psychometric Properties of the Adult Attention-Deficit Hyperactivity Disorder Self-Report Scale-Chinese Short Version (ASRS-CSV)

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Clinical and Counseling Psychology Subjects: Medicine, Pharmacy >> Clinical Medicine submitted time 2024-03-12

    Abstract: Objective: This study aimed to develop and validate the Chinese Short Version of the Adult ADHD Self-Report Scale (ASRS-CSV), addressing the need for culturally appropriate diagnostic tools for Attention-Deficit Hyperactivity Disorder (ADHD) in the Chinese adult population.
    Methods: Utilizing a combination of intergroup difference analysis, factor analysis, and network analysis, we identified core ADHD symptoms pertinent to the Chinese cultural context. The study involved two samples: a vocational and technical school sample (N=1144) and an internet sample (N=1654), comprising adults aged 16-25 years. Reliability, validity, and diagnostic efficacy of the ASRS-CSV were assessed through psychometric testing.
    Results: The ASRS-CSV demonstrated high internal consistency (Cronbach’s alpha > 0.9) and robust convergent validity (AVE > 0.7). The scale’s diagnostic cutoff points were optimized, revealing high sensitivity and specificity for ADHD screening. Cross-cultural analysis highlighted differences in core ADHD symptoms between Chinese and Western populations, underscoring the scale’s cultural sensitivity.
    Conclusion: The ASRS-CSV is a reliable, valid, and efficient tool for screening ADHD in Chinese adults, reflecting the socio-cultural nuances of ADHD symptomatology. Its development marks a significant advancement in the field of psychiatry, offering a tailored approach for ADHD assessment in China and contributing to the global discourse on cross-cultural psychiatric diagnosis.

  • Statistical power analysis of event-related potential studies: methods and influencing factors

    Subjects: Psychology >> Experimental Psychology Subjects: Psychology >> Psychological Measurement submitted time 2024-03-04

    Abstract: Statistical power is one of the key indicators for assessing the robustness and replicability of research results. However, the standardization and completeness of calculating and reporting statistical power in event-related potential studies still need improvement. This paper aims to provide researchers with references for calculating and reporting statistical power during the design or preregistration of research protocols at various stages of event-related potential studies by summarizing the influencing factors, methods, and application examples of statistical power in such studies.

  • Exploration of Computerized Adaptive Item Bank Development for Emotional Stability Based on ChatGPT

    Subjects: Psychology >> Psychological Measurement submitted time 2024-02-01

    Abstract: To obtain a high-quality large-scale item bank, the extensive manpower and resources required for traditional project development have been constraining the development and application of computerized adaptive testing. However, the automatic item generation, based on the latest natural language processing technology holds promise in addressing this challenge. With the advancements in generative pre-trained models based on the Transformer architecture, the generation of items tailored to specific measurement objectives (especially non-cognitive tasks) becomes feasible. This study aimed to utilize ChatGPT to generate a large number of Chinese version personality items measuring emotional stability and to establish a computerized adaptive item bank based on this premise.
    We utilized ChatGPT based on GPT-4 Turbo to generate 114 items measuring emotional stability. Following expert review, 75 items were retained and formed the GPT item bank, while 42 widely-used items were selected to form the classic item bank. Testing was conducted on the aforementioned items, yielding 479 valid participants. Additionally, sample data from two separately administered measures, CBF-PI-B and BFI-2, were going to be used for subsequent cross-sample reliability comparisons. Procedures for item bank construction including unidimensionality test, IRT model selection, item analysis, and item bank quality analysis, as well as simulated computerized adaptive testing, were employed to assess the quality and CAT performance of the item bank.
    After the above analysis steps, it was found that all items in the classic item bank and the GPT item bank passed the unidimensionality test, showing no differential item functioning, and had good discrimination parameters and reasonable difficulty distribution. Both item banks provided high test information and marginal reliability for most trait levels of the examinees, with low measurement error. The overall item bank formed by combining all items remained of good quality. Simulation results of computerized adaptive testing showed that all three item banks achieved high validity with fewer items compared to traditional tests for the same level of precision. Under the same testing length, GPT item bank exhibited higher reliability and demonstrated stability across samples. Additionally, comparison revealed that the CAT performance of the GPT item bank even exceeded that of the classic item bank, while the overall item bank performance was slightly better than that of the GPT item bank.
    This study innovatively explores the development of a computerized adaptive item bank using the latest version of ChatGPT, validating the feasibility of this user-friendly project generation tool. Through comparison with previous research results, it reconfirms the excellent quality of projects generated by GPT-4. The study showcases the immense potential and possibilities of large language models in project development, particularly in the creation of large-scale item banks, while also indicating at a shift in the responsibilities of psychologists in future project development.

  • Psychometric Properties of Multidimensional State Anxiety Scale for College Students (MSAS-CS): Based on Factor Analysis and Network Analysis

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Clinical and Counseling Psychology submitted time 2024-01-14

    Abstract: Based on the State-Trait Anxiety Theory and the Psychopathological Network Theory, we developed the Multidimensional Anxiety Experience Scale for college students. This study conducted item analysis, factor analysis, network analysis, validity and reliability testing, as well as gender invariance testing. The results indicate that: (1) The Multidimensional Anxiety Experience Scale for college students consists of 27 items, organized into seven dimensions: Social Communication Anxiety (SCA), Learning Anxiety (LA), Family Relationship Anxiety (FRA), Future Anxiety (FA), Gender Norms Anxiety (GNA), Appearance Anxiety (AA), and Economic Anxiety (EA). (2) The scale demonstrates a reasonable factor network structure, good validity and reliability, and gender invariance, thus effectively measuring the level of state anxiety in Chinese college student.

  • Automated Scoring of Open-ended Situational Judgment Tests

    Subjects: Psychology >> Psychological Measurement submitted time 2023-12-21

    Abstract:     Situational Judgment Tests (SJTs) have gained popularity for their unique testing content and high face validity. However, traditional SJT formats, particularly those employing multiple-choice (MC) options, have encountered scrutiny due to their susceptibility to test-taking strategies. In contrast, open-ended and constructed response (CR) formats present a propitious means to address this issue. Nevertheless, their extensive adoption encounters hurdles primarily stemming from the financial implications associated with manual scoring. In response to this challenge, we propose an open-ended SJT employing a written-constructed response format for the assessment of teacher competency. This study established a scoring framework leveraging natural language processing (NLP) technology to automate the assessment of response texts, subsequently subjecting the system's validity to rigorous evaluation. The study constructed a comprehensive teacher competency model encompassing four distinct dimensions: student-oriented, problem-solving, emotional intelligence, and achievement motivation. Additionally, an open-ended situational judgment test was developed to gauge teachers' aptitude in addressing typical teaching dilemmas. A dataset comprising responses from 627 primary and secondary school teachers was  collected, with manual scoring based on predefined criteria applied to 6,000 response texts from 300 participants. To expedite the scoring process, supervised learning strategies were employed, facilitating the categorization of responses at both the document and sentence levels. Various deep learning models, including the convolutional neural network (CNN), recurrent neural network (RNN), long short-term memory (LSTM), C-LSTM, RNN+attention, and LSTM+attention, were implemented and subsequently compared, thereby assessing the concordance between human and machine scoring. The validity of automatic scoring was also verified.
        This study reveals that the open-ended situational judgment test exhibited an impressive Cronbach's alpha coefficient of 0.91 and demonstrated a good fit in the validation factor analysis through the use of Mplus. Criterion-related validity was assessed, revealing significant correlations between test results and various educational facets, including instructional design, classroom evaluation, homework design, job satisfaction, and teaching philosophy. Among the diverse machine scoring models evaluated, CNNs have emerged as the top-performing model, boasting a scoring accuracy ranging from 70% to 88%, coupled with a remarkable degree of consistency with expert scores (r= 0.95, QWK=0.82). The correlation coefficients between human and computer ratings for the four dimensions—student-oriented, problem-solving, emotional intelligence, and achievement motivation—approximated 0.9. Furthermore, the model showcased an elevated level of predictive accuracy when applied to new text datasets, serving as compelling evidence of its robust generalization capabilities.
        This study ventured into the realm of automated scoring for open-ended situational judgment tests, employing rigorous psychometric methodologies. To affirm its validity, the study concentrated on a specific facet: the evaluation of teacher competency traits. Fine-grained scoring guidelines were formulated, and state-of-the-art NLP techniques were used for text feature recognition and classification. The primary findings of this investigation can be summarized as follows: (1) Open-ended SJTs can establish precise scoring criteria grounded in crucial behavioral response elements; (2) Sentence-level text classification outperforms document-level classification, with CNNs exhibiting remarkable accuracy in response categorization; and (3) The scoring model consistently delivers robust performance and demonstrates a remarkable degree of alignment with human scoring, thereby hinting at its potential to partially supplant manual scoring procedures.
     

  • Estimating test reliability of intensive longitudinal studies: Perspectives on multilevel structure and dynamic nature

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Statistics in Psychology submitted time 2023-11-28

    Abstract: With the widespread use of intensive longitudinal studies in psychology and other social sciences, reliability estimation of tests in intensive longitudinal studies has received increasing attention. Earlier reliability estimation methods drawn from cross-sectional studies or based on generalizability theory have many limitations and are not applicable to intensive longitudinal studies. Considering the two main characteristics of intensive longitudinal data, multilevel structure and dynamic nature, the reliability of tests in intensive longitudinal studies can be estimated based on multilevel confirmatory factor analysis, dynamic factor analysis, and dynamic structural equation models. The main features and applicable contexts of these three reliability estimation methods are demonstrated with empirical data. Future research could explore the reliability estimation methods based on other models, and should also pay more attention to the testing and reporting of test reliability in intensive longitudinal studies.

  • Development of Online Calibration Method Based on SCAD Penalty and EM Perspective in CD-CAT: a study based on the G-DINA model

    Subjects: Psychology >> Psychological Measurement submitted time 2023-11-22

    Abstract: Cognitive diagnostic computerized adaptive testing (CD-CAT) provides a detailed diagnosis of an examinee’s strengths and weaknesses in the content measured in a timely and accurate manner, which can be used as a reference for further study or remediation planning, thus meeting the practical need for efficient and detailed test results. The successful implementation of CD-CAT is based on an item bank, but its maintenance is a very challenging task. A psychometrically popular choice for maintaining an item bank is online calibration. Currently, the research on online calibration methods in the CD-CAT that can calibrate Q-matrix and item parameters simultaneously is very weak. The existing methods are basically developed based on the deterministic input, noisy and gate (DINA) model. Compared with the DINA model, the generalized DINA (G-DINA) model has been more widely applied because it is less restrictive and can meet the requirements of a large number of test data in psychological and educational assessment. Therefore, if the online calibration method that jointly calibrates the Q-matrix and item parameters can be developed for models with few constraints such as G-DINA, its meaning is understood without explanation.
    In current study, a new online calibration method, SCADOCM, was proposed, which was suitable for the G-DINA model. The construction of SCADOCM was based on the smoothly clipped absolute deviation penalty (SCAD) and marginalized maximum likelihood estimation (MMLE/EM) algorithm. For the new item j, the log-likelihood function with SCAD can be formulated based on the examinees’ responses in this item and the examinees’ attribute marginal mastery probability, and the q-vector of the new item can be estimated by the q-vector estimator based on SCAD. Then, the EM algorithm was used to estimate the item parameter of the new item j based on the posterior distributions of examinees’ attribute patterns, the examinees’ responses to new item j and the estimated q-vector.  
    To examine the performance of the proposed SCADOCM and compare it with the SIE method, two simulation studies (Study 1 and Study 2) are conducted. Study 1 is based on a simulated item bank while Study 2 is based on the real item bank (Internet addiction item bank; Shi, 2017). In these simulation studies, four factors were manipulated: the calibration sample size (nj = 50 vs. 100 vs. 500 vs. 1000 vs. 2000), the distribution of the attribute pattern (uniform distribution vs. high-order distribution vs. normal distribution), the item quality (U (0.05, 0.15) vs. U (0.1, 0.3)), and the online calibration methods (SCADOCM vs. SIE). The results showed that (1) SCADOCM has satisfactory calibration accuracy and calibration efficiency, and is superior to the SIE method. In addition, the traditional SIE method is not applicable for the G-DINA model, and its Q-matrix estimation accuracy rate is low under all experimental conditions. (2) The item calibration accuracy of SCADOCM and SIE increases with the increase of calibration sample and item quality under most conditions, and its item calibration accuracy in the uniform distribution/higher-order distribution is greater than that in the normal distribution. (3) The calibration efficiency of SCADOCM decreases with the increase of calibration samples, but it is less affected by the item quality and the attribute pattern distribution; the calibration efficiency of SIE decreases with the increase of calibration samples, but it is less affected by the item quality. Moreover, the calibration efficiency of the SIE method in the normal distribution is slightly slower than that of uniform distribution/high-order distribution.
    To sum up the results, this study demonstrated that the SCADOCM has higher item calibration accuracy and calibration efficiency, and outperforms the SIE method; meanwhile, the traditional SIE method is not suitable for G-DINA model. All in all, this study provides an efficient and accurate method for item calibration in CD-CAT, and provides important support for further promoting the application of CD-CAT in practice.

  • Cognitive Diagnostic Assessment Based on Signal Detection Theory: Modeling and Application

    Subjects: Psychology >> Psychological Measurement submitted time 2023-11-13

    Abstract: Cognitive diagnostic assessment (CDA) is aimed at diagnose which skills or attributes examinees have or do not have as the name expressed. This technique provides more useful feedback to examinees than a simple overall score got from classical test theory or item response theory. In CDA, multiple-choice (MC) is one of popular item types, which have the superiority on high test reliability, being easy to review, and scoring quickly and objectively. Traditionally, several cognitive diagnostic models (CDMs) have been developed to analyze the MC data by including the potential diagnostic information contained in the distractors.
    However, the response to MC items can be viewed as the process of extracting signals (correct options) from noises (distractors). Examinees are supposed to have perceptions of the plausibility of each options, and they make the decision based on the most plausible option. Meanwhile, there are two different states when examinee response to items: knows or does not know each item. Thus, the signal detection theory can be integrated into CDM to deal with MC data in CDA. The cognitive diagnostic model based on signal detection theory (SDT-CDM) is proposed in this paper and has several advantages over traditional CDMs. Firstly, it does not require the coding of q-vector for each option. Secondly, it provides discrimination and difficulty parameters that traditional CDMs cannot provide. Thirdly, it can directly express the relative differences between each options by plausibility parameters, providing a more comprehensive characterization of item quality.
    The results of two simulation studies showed that (1) the marginal maximum likelihood estimation approach via Expectation Maximization (MMLE/EM) algorithm could effectively estimate the model parameters of the SDT-CDM. (2) the SDT-CDM had high classification accuracy and parameter estimation precision, and could provide option-level information for item quality diagnosis. (3) independent variables such as the number of attributes, item quality, and sample size affected the performance of the SDT-CDM, but the overall results were promising. (4) compared with the nominal response diagnostic model (NRDM), the SDT-CDM was more accurate in classifying examinees under all data conditions.
    Further, an empirical study on the TIMSS 2011 mathematics assessment were conducted using both the SDT-CDM and the NRDM to inspect the ecological validity for the new model. The results showed that the SDT-CDM had better fitting and a smaller number of model parameters than the NRDM. The difficulty parameters of the SDT-CDM were significantly correlated with those of the two- (three-) parameter logical models. And the same was true of the discrimination parameters for the SDT-CDM. However, the correlation between the discrimination parameters of the NRDM and those of the two- (three-) parameter logical models was low and not significant. Besides, the classification accuracy and classification consistency of the SDT-CDM were higher than those of the NRDM. All the results indicated that the SDT-CDM was worth promoting.

  • Reliability and validity of the Chinese version of the mobile Agnew Relationship Measure (mARM-C)

    Subjects: Psychology >> Applied Psychology Subjects: Psychology >> Psychological Measurement submitted time 2023-05-27

    Abstract: In order to assess the reliability and validity of the Chinese version of the mobile Agnew Relationship Measure (mARM-C), 574 university students who had recently used meditation apps were recruited to complete both the mARM-C and criterion measures. After two weeks, a subset of 102 of these participants were retested. The exploratory factor analysis and network analysis results revealed that the mARM-C comprised 19 items across five factors. Further confirmatory factor analysis demonstrated that the five-factor model was a good fit, and the questionnaire exhibited satisfactory criterion-related validity, convergent validity, discriminant validity, and good internal consistency reliability, which met the criteria for psychological measurement standards. These results indicate that the mARM-C is a reliable and valid instrument, capable of measuring the digital therapeutic alliance between users and programs in internet-based self-help interventions.

  • On the reliability of point estimation of model parameter: taking the CDMs as an example

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Statistics in Psychology submitted time 2023-05-11

    Abstract: Cognitive diagnostic models (CDMs) are psychometric models which have received increasing attention within the field of psychological, educational, social, biological, and many other disciplines. It has been argued that an inappropriate convergence criterion for MLE-EM (maximum likelihood estimation using the expectation maximization) algorithm could result in unpredictably distorted model parameter estimates, and thus may yield unstable and misleading conclusions drawn from the fitted CDMs. Although several convergence criteria have been developed, it remains an unexplored question, how to specify the appropriate convergence criterion for the fitted CDMs.
    A comprehensive method for assessing convergence is proposed in this study. To minimize the impact by the model parameter estimation framework, a new framework adopting the multiple starting values strategy mCDM is introduced. To examine the performance of the convergence criterion for MLE-EM in CDMs, a simulation study under various conditions was conducted. Five convergence assessment methods were examined: the maximum absolute change in model parameters, the maximum absolute change in item endorsement probabilities and structural parameters, the absolute change in log-likelihood, the relative log-likelihood, and the comprehensive method. The data generating models were the saturated CDM and the hierarchical CDM. The number of items was set to J = 16 and 32. Three levels of sample sizes were considered: 500, 1000, and 4000. Three convergence tolerance value conditions were: 10-4 , 10-6 , and 10-8 . The simulated response data were fitted by the saturated CDM using the mCDM and the R package GDINA. And the maximum number of iterations was set to 50000.
    Simulation results suggest that:
    (1) The saturated CDM converged under all conditions. However, the actual number of iterations exceeded 30000 under some conditions, which implies that when predefined maximum iteration number is less than 30000, the MLE-EM algorithm might mistakenly stop.
    (2) The model parameter estimation framework affected the performance of the convergence criteria. The performance of the convergence criteria under the mCDM framework was comparable or superior to that of the GDINA framework.
    (3) Regarding the convergence tolerance values considered in this study, 10-8  consistently had the best performance in providing the maximum value of the log-likelihood and 10-4  had the worst as suggested by the higher log-likelihood value. Compared to all other convergence assessment methods, the comprehensive method in general had the best performance, especially under the mCDM framework. The performance of the maximum absolute change in model parameters was similar to the comprehensive method, however, its good performance was not guaranteed. On the contrary, the relative log-likelihood had the worst performance under the mCDM or GDINA framework.
    The simulation results showed that, the most appropriate convergence criterion for MLE-EM in CDMs was the comprehensive method with tolerance 10-8  under the mCDM framework. Results from the real data analysis also demonstrated the good performance of the proposed comprehensive method and mCDM framework.
     

  • The Measurement and Influence of Colleges’ Academic Involution

    Subjects: Psychology >> Social Psychology Subjects: Psychology >> Psychological Measurement submitted time 2023-05-04

    Abstract: Academic involution may harm the cultivation and development of college students, but there has not been a reliable measurement tool to assess it. This paper developed a Colleges’Academic Involution Scale (CAIS) and examined its reliability and validity with 3 studies. Study 1 generated a 31-item pool based on literature review, daily cases, and interview, and filtered items based on a 338-undergraduate sample. Study 2 confirmed a 16-item final version CAIS, which consisted of three dimensions: unwilling hardworking, excessive competition, and surface learning, based on a large sample (N = 3000) and an independent sample (N = 571). Based on the 3000-undergraduate sample, more than 60% of college students are involved in academic involution. Specifically, individuals with high scores in the CAIS showed stronger zero-sum belief, higher trait anxiety, lower life satisfaction, and poorer sleep quality, but not greater creative potential. Study 3 revealed that the test-retest reliability of the final version scale reached 0.83 based on a new sample (N = 99). The CAIS could be a reliable and effective tool for future research exploring harms, causes, and ways to mitigate academic involution.

     

  • Test mode effect: Sources, detection, and applications

    Subjects: Psychology >> Psychological Measurement submitted time 2023-04-22

    Abstract: Test mode effect (TME) refers to the difference in test function caused by the administration of the same test in different test modes. The existence of TME will have an impact on test fairness, selection criteria and test equating, so it is of great significance to accurately detect and interpret TME. By systematically sorting out the source, detection (including the experimental design and detection methods) and research results of TME, the methodology of TME research is comprehensively demonstrated. Further interpretation of the TME model, expansion of the test modes in TME research, and application of TME research results to largescale educational assessment programs in China, are important future development directions in the field of TME.

  • CCTE-A database of Chinese COVID-19 Terms

    Subjects: Psychology >> Cognitive Psychology Subjects: Psychology >> Experimental Psychology Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Statistics in Psychology Subjects: Psychology >> Other Disciplines of Psychology Subjects: Linguistics and Applied Linguistics >> Linguistics and Applied Linguistics Subjects: Other Disciplines >> Synthetic discipline submitted time 2023-02-08

    Abstract: Objective: To establish a multi-dimensional and standardized lexical database of COVID-19-related terms and words. The database may have facilitated COVID-19-related research in domains such as Psychology, Psychiatry, Neuroscience, etc. Methods: This database referred to the established methods of the emotional lexical database at home and abroad, and used the dot-detection task and words in the database as experimental materials to test the attention bias of the subjects suspected of having COVID-19 phobia, so as to test the validity of the database. Results: 196 COVID-19-related words and 99 neutral words were included in the word database. Then, we classified and evaluated the words through six dimensions, and established a standardized database of Chinese COVID-19-related terms. The words have good reliability and internal consistency. In addition, the validity was tested through the dot-detection task. Subjects with COVID-19 fear and those without COVID-19 fear showed a significant attentional bias toward COVID-19-related words Limitations: The initial sample size is small and the database application needs further development. Conclusions: The database of Chinese COVID-19 terms has good reliability, internal consistency, and reliability, and can be used as materials related to COVID-19-related research in the future.

  • Using word embeddings to investigate human psychology: Methods and applications

    Subjects: Psychology >> Social Psychology Subjects: Psychology >> Cognitive Psychology Subjects: Psychology >> Psychological Measurement Subjects: Computer Science >> Natural Language Understanding and Machine Translation submitted time 2023-01-30

    Abstract: As a basic technique in natural language processing (NLP), word embedding represents a word with a low-dimensional, dense, and continuous numeric vector (i.e., word vector). Word embeddings can be obtained by using neural network algorithms to predict words from the surrounding words or vice versa (Word2Vec and FastText) or words’ probability of co-occurrence (GloVe) in large-scale text corpora. In this case, the values of dimensions of a word vector denote the pattern of how a word can be predicted in a context, substantially connoting its semantic information. Therefore, word embeddings can be utilized for semantic analyses of text. In recent years, word embeddings have been rapidly employed to study human psychology, including human semantic processing, cognitive judgment, individual divergent thinking (creativity), group-level social cognition, sociocultural changes, and so forth. We have developed the R package “PsychWordVec” to help researchers utilize and analyze word embeddings in a tidy approach. Future research using word embeddings should (1) distinguish between implicit and explicit components of social cognition, (2) train fine-grained word vectors in terms of time and region to facilitate cross-temporal and cross-cultural research, and (3) deepen and expand the application of contextualized word embeddings and large pre-trained language models such as GPT and BERT.

  • Binary Modeling of Action Sequences in Problem-solving Tasks: One- and Two-parameter Action Sequence Model

    Subjects: Psychology >> Psychological Measurement submitted time 2023-01-05

    Abstract: Process data refers to the human-computer or human-human interaction data recorded in computerized learning and assessment systems that reflect respondents’ problem-solving processes. Among the process data,  action sequences are the most typical data because they reflect how respondents solve the problem step by step.  However, the non-standardized format of action sequences (i.e., different data lengths for different participants) also poses difficulties for the direct application of traditional psychometric models. Han et al. (2021) proposed the SRM by combining dynamic Bayesian networks with the nominal response model (NRM) to address the shortcomings of existing methods. Similar to the NRM, the SRM uses multinomial logistic modeling, which in turn assigns different parameters to each possible action sequence in the task, leading to high model complexity. Given that action sequences in problem-solving tasks have correct and incorrect outcomes rather than equivalence relations without quantitative order, this paper proposes two action sequence models based on binary logistic modeling with relatively low model complexity: the one- and two-parameter action sequence models (1P and 2P-ASM). Unlike the SRM, which applies the NRM migration to action sequence analysis, the 1P-ASM and 2P-ASM migrate the simpler one- and two-parameter IRT models to action sequence analysis, respectively. An illustrated example was provided to compare the performance of SRM and two ASMs with a real-world interactive assessment item, “Tickets,” in the PISA 2012. The results mainly showed that: (1) the latent ability estimates of two ASMs and the SRM had high correlation; (2) ASMs took less computing time than that of SRM; (3) participants who are solving the problem correctly tend to continue to present the correct action sequences, and vice versa; and (4) compared with the fixed discrimination parameter of the SRM, the free estimated  discrimination parameter of the 2P-ASM helped us to better understand the task. A simulation study was further designed to explore the psychometric performance of the proposed model in different test scenarios. Two factors were manipulated: sample size (including 100, 200, and 500) and average problem state transition sequence length (including short and long). The SRM was used to generate the state transition sequences in the simulation study. The problem-solving task structure from the empirical study was used. The results showed that: (1) two ASMs could provide accurate parameter estimates even if they were not the data-generation model; (2) the computation time of both ASMs was lower than that of SRM, especially under the condition of a small sample size; (3) the problem-solving ability estimates of both ASMs were in high agreement with the problem-solving ability estimate of the SRM, and the agreement between 2P-ASM and SRM is relatively higher; and (4) the longer the problem state transition sequence, the better the recovery of problem solving ability parameter for both ASMs and SRM. Overall, the two ASMs proposed in this paper based on binary logistic modeling can achieve effective 6 analysis of action sequences and provide almost identical estimates of participants' problem-solving ability to SRM while significantly reducing the computational time. Meanwhile, combining the results of simulation and empirical studies, we believe that the 2P-ASM has better overall performance than the 1P-ASM; however, the more parsimonious 1P-ASM is recommended when the sample size is small (e.g., 100 participants) or the task is simple (fewer operations are required to solve the problem).

  • Development of a Short Version of the Health Literacy Scale Short-Form: Based on Classical Test Theory and Item Response Theory

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Applied Psychology Subjects: Medicine, Pharmacy >> Preventive Medicine and Hygienics submitted time 2022-12-06

    Abstract:

    Objective Simplify health literacy scales and conduct psychometric tests in Chinese cohorts. Methods  A total of 7449 residents were included in the evaluation of the scale, and the data were randomly generated into 2 data sets for de Results A 9-item version of the scale (HLS-SF9) and a 4-item version of the scale (HLS-SF4) were simplified using CTT and the Mokken model, respectively.The Cronbach’s α coefficients of HLS-SF9 and HLS-SF4 were 0.913 and 0.842, HLS-SF4 was tested for one common factor by exploratory factor analysis(EFA), and the results of the confirmatory factor analysis(CFA) of HLS-SF9 showed that all the fitness indicators of its three-dimension model were excellent. And there was a significant positive correlation between the Perceived Social Support Scale and the Family Health Scale Short-Form as the calibration scale (r=0.367, p<0.001; r=0.292, p<0.001) (r=0.340, p<0.001; r=0.266, p<0.001), indicating good empirical validity. At the same time, HLS-SF9 and HLS-SF4 are highly consistent with the content measured by the original version of the scale. Conclusions The simplified Health Literacy Scales have good reliability and validity, and are reliable and effective tools for quickly assessing the health literacy of Chinese people.   

  • A meta-analysis of the relationship between semantic distance and creative thinking

    Subjects: Psychology >> Developmental Psychology Subjects: Psychology >> Psychological Measurement submitted time 2022-11-20

    Abstract:

    The development of natural language processing has offered reliable and valid research methods for exploring the relationship between semantic distance and creative thinking. There are more and more studies in this direction in recent years. However, the research findings are inconsistent in this direction. Based on the Associative Theory of Creativity and the Spreading#2;Activation Model, the present study investigated the relationship between semantic distance and creative thinking by using a meta-analysis method. The reasons for the inconsistency of previous studies in this line of research were also analyzed. The current research involved 14 studies and extracted 53 effect sizes from 4729 subjects. The random effect model was used for the meta#2;analysis. The results showed that there was a moderate positive correlation between semantic distance and creative thinking (r = 0.379, 95%CI [0.300, 0.452]). The meta-regression analysis found that the correlation was moderated by the age of participants and dimensions of creative thinking. Specifically, the results suggested that the correlation between semantic distance and creative thinking decreased with the increase in the age of participants. In addition, flexibility had a higher correlation coefficient with semantic distance than originality and fluency. However, elaboration had a negative correlation with semantic distance. This is the potential reason for the inconsistent results in previous studies. The current study provides new perspectives and explanations for exploring cognitive and neural mechanisms of creativity. It contributes to the exploration of the relationship between semantic distance and creative thinking. The current study offers better scientific evidence and important implications for interpreting, predicting, and improving creativity.

  • Longitudinal Hamming Distance Discrimination: Developmental Tracking of Latent Attributes

    Subjects: Psychology >> Psychological Measurement submitted time 2022-10-06

    Abstract: Longitudinal cognitive diagnostics can assess students' strengths and weaknesses over time, profile students' developmental trajectories, and can be used to evaluate the effectiveness of teaching methods and optimize the teaching process.Existing researchers have proposed different longitudinal diagnostic classification models, which provide methodological support for the analysis of longitudinal cognitive diagnostic data. Although these parametric longitudinal cognitive diagnostic models can effectively assess students' growth trajectories, their requirements for coding ability and sample size hinder their application among frontline educators, and they are time-consuming and not conducive to providing timely feedback. On the one hand, the nonparametric approach is easy to calculate, efficient to apply, and provides timely feedback; on the other hand, it is free from the dependence on sample size and is particularly suitable for analyzing assessment data at the classroom or school level. Therefore, this paper proposed a longitudinal nonparametric approach to track changes in student attribute mastery. This study extended the longitudinal Hamming distance discriminant (Long-HDD) based on the Hamming distance discriminant (HDD), which uses the Hamming distance to represent the dependence between attribute mastery patterns of the same student at adjacent time points. To explore the performance of Long-HDD in longitudinal cognitive diagnostic data, we conducted a simulation study and an empirical study and compared the classification accuracy of the HDD, Long-HDD, and Long-DINA models. In the simulation study, five independent variables were manipulated, including (1) sample sizes N = 25, 50, 100, and 300; (2) number of items I = 25 and 50; (3) number of time points T = 2 and 3; (4) number of attributes measured at each time point K = 3 and 5, and (5) data analysis methods M = HDD, Long-HDD, and Long-DINA. The student’s real attribute mastery patterns were randomly selected with equal probability from all possible attribute patterns, and the transfer probabilities among attributes between adjacent time points were set to be equal (e.g., p(0→0) = 0.8, p(0→1) = 0.2, p(1→0) = 0.05, p(1→1) = 0.95), while the first K items constituting the unit matrix in the Q-matrix at each time point were set to be anchor items, and the item parameters were set to be moderately negative correlation, generated by a ?bivariate normal distribution. For the empirical study, the results of three parallel tests with 18 questions each, measuring six attributes, were used for 90 7th graders. The Q-matrix for each test was equal. The results of the simulation study showed that (1) Long-HDD had higher classification accuracy in longitudinal diagnostic data analysis; (2) Long-HDD performed almost independently of sample size and performed better with a smaller sample size compared to Long-DINA; and (3) Long-HDD consumed much less computational time than Long-DINA. In addition, the results of the empirical data also showed that there was good consistency between the results of the Long-HDD and the Long-DINA model?in tracking changes in attribute development. The percentage of mastery of each attribute increased with the increase of time points. In summary, the long-HDD proposed in this study extends the application of nonparametric methods to longitudinal cognitive diagnostic data and can provide high classification accuracy. Compared with parameterized longitudinal DCM (e.g., Long-DINA), it can provide timely diagnostic feedback due to the fact that it is not affected by sample size, simple calculation, and less time-consuming. It is more suitable for small-scale longitudinal assessments such as class and school level. " "

  • The Status, Approach and Challenges of Artificial Intelligence-Empowered Psychological Research

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Statistics in Psychology Subjects: Psychology >> Other Disciplines of Psychology submitted time 2022-09-19

    Abstract: Human beings have entered the era of artificial intelligence (AI), and it is in urgent need of innovative data collection and processing methods to carry out increasingly complex psychological research. AI and related technology can help collect more ecologically valid, dynamic, diverse, and accurate data, and analyze massive and multi-modal data, which makes up for the deficiency of traditional methods. Therefore, incorporating AI is a major direction of the future development of psychological research. In the meantime, it is also important to not rely too much on an AI-based data-driven approach. The integration of top-down theory-driven and bottom-up data-driven approaches is also crucial in intelligent psychological research.