ESTUDO METODOLÓGICO
São estudos cujo foco é desenvolver e/ou avaliar os propriedades clinimétricas e as características dos instrumentos de avaliação.
Desenhos de estudos metodológicos
- Souza AC et al. Propriedades psicométricas na avaliação de instrumentos: avaliação da confiabilidade e da validade. Epidemiol Serv Saúde. 2017 jul-set; 26 (3). <https://www.scielo.br/j/ress/a/v5hs6c54VrhmjvN7yGcYb7b/>
- Echevarría-Guanilo ME et al. Propriedades psicométricas de instrumentos de medidas: bases conceituais e métodos de avaliação – parte I. Texto contexto – enferm. 2017; 26(4). <https://www.scielo.br/j/tce/a/prwykQN6gV84gBph8Y795QJ/?lang=pt>
- Echevarría-Guanilo ME et al. Propriedades psicométricas de instrumentos de medidas: bases conceituais e métodos de avaliação – parte II. Texto contexto – enferm. 2019; 28. <https://www.scielo.br/j/tce/a/CpZB9gSc3SrDcQY8nj9yRHD/?lang=pt>
- Miot HA. Análise de concordância em estudos clínicos e experimentais. J vasc bras. 2016;15:89–92. <https://www.scielo.br/j/jvb/a/DVPVnQPdt8qGj8Ryhx7j8yk/?lang=pt>.
- de Oliveira GM et al. Revisão sistemática da acurácia dos testes diagnósticos: uma revisão narrativa. Rev Col Bras Cir. 2010 Apr;37(2):153-6. <https://pubmed.ncbi.nlm.nih.gov/20549106/>.
- Oliveira MR et al. QUADAS and STARD: evaluating the quality of diagnostic accuracy studies. Rev Saude Publica. 2011 Apr;45(2):416-22. <https://pubmed.ncbi.nlm.nih.gov/21412577/>.
- Kottner J, Streiner DL. The difference between reliability and agreement. J Clin Epidemiol. 2011 Jun;64(6):701-2; author reply 702. <https://pubmed.ncbi.nlm.nih.gov/21411278/>.
- de Vet HC et al. When to use agreement versus reliability measures. J Clin Epidemiol. 2006 Oct;59(10):1033-9. <https://pubmed.ncbi.nlm.nih.gov/16980142/#:~:text=Agreement%20parameters%20assess%20how%20close,each%20other%2C%20despite%20measurement%20errors.>.
- Kottner J et al. Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed. J Clin Epidemiol. 2011 Jan;64(1):96-106. <https://pubmed.ncbi.nlm.nih.gov/21130355/>.
- McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb). 2012;22(3):276-82. <https://pubmed.ncbi.nlm.nih.gov/23092060/>.
Interpretação dos resultados
- Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977 Mar;33(1):159-74. <https://pubmed.ncbi.nlm.nih.gov/843571/>.
- Test–retest reliability was tested using the intraclass correlation coefficient (ICC2,1). Values lower than 0.69 indicated poor reliability; values between 0.70 and 0.79 were considered acceptable; values between 0.80 and 0.89 indicated good reliability and from 0.90 to 1.0 excellent reliability (Cohen, 1977).
- Measures of agreement:
- Standard Error of the Measurement (SEM) and Smallest Detectable Change (SDC). The SEM was calculated by multiplying the standard deviation of the mean differences between the two measurements by the square root of 1 minus ICC (SD differences * √1 − ICC) and the SDC was calculated using the formula SDC = 1.96 × √2 × SEM. The SEM reflects the absolute error of the instrument and the SDC reflects the smallest within person change in a score that can be interpreted as a “real” change, above the measurement error one of an individual. The ratio between the SEM and the total score of the instrument was used to indicate agreement as follows: less than or equal to 5%, very good agreement; greater than 5% and 10% or less, good agreement; greater than 10% and 20% or less, doubtful agreement; and greater than 20%, negative agreement (Silva et al., 2018).
- the Bland-Altman plots, the standard error of the measurement (SEM), and the smallest detectable change (SDC) (Francq, Govaerts, 2016). The SEM was calculated by multiplying the standard deviation of the mean differences between the 2 measurements by the square root of 1 minus ICC (standard deviation differences * √1 – ICC), and the SDC was calculated using the formula SDC = 1.96 × √2 × SEM. The SEM reflects the absolute error of the instrument, and the SDC reflects the smallest within-person change in a score that can be interpreted as a real change, above the measurement error 1 of an individual (de Vet et al., 2006).
- Internal consistency: was measured using the Cronbach’s alpha. An alpha value between 0.70 and 0.90 was considered good and greater than 0.90 was considered excellent (Terwee et al., 2007).
- Construct validity: was determined using a Spearman correlation. A correlation coefficient greater than 0.90 was considered excellent; between 0.71 and 0.90 was considered good; between 0.51 and 0.70 was considered reasonable; between 0.31 and 0.50 was considered weak; and less than or equal to 0.30 was considered low (Fermanian et al., 1984). (Silva et al., 2018).
- Convergent validity: was determined by testing the following hypotheses involving correlations between the test X and the results of similar constructs: The convergent validity was analyzed based on the Pearson’s correlation and was interpreted as follows: 0–0.19 = no correlation; 0.2–0.39 = poor correlation; 0.4–0.69 = moderate to good correlation; 0.7–0.89 = high correlation; and 0.9–1 = very high correlation. We defined the convergent validity of test as good if > 74% of the hypotheses were confirmed (Terwee et al., 2007).
- The divergent validity was assessed by comparing the scores of the symptomatic side and asymptomatic or lower symptomatic side using the t-independent test.
- Ceiling and floor effects Ceiling and floor effects refer to content validity, and their presence indicates that extreme items are missing in the scales. The percentages of responders who scored the lowest or highest in each separate subscale were documented. Ceiling and floor effects for an entire questionnaire are considered to be present if more than 15% of respondents score the lowest or highest possible score (Terwee et al., 2007).
- Intraexaminer and interexaminer reproducibility: A reliability study was conducted with a test–retest design.
- Reliability of continuous data: Use the intraclass correlation coefficient (ICC2,1) to describe the reliability. The ICC values lower than 0.4 can be classified as poor, between 0.4 and 0.7 can be classified as satisfactory, and over 0.7 can be classified as excellent (de Vet et al., 2006).
- Reliability of categorical data: Use the coefficient of kappa (κ) to describe the reliability. The κ values less than 0 mean there is no reliability, and from 0 to 0.19 indicate poor reliability, 0.20 to 0.39 fair reliability, 0.40 to 0.59 moderate reliability, 0.60 to 0.79 good reliability, and 0.8 to 1 excellent reliability (Altman, Hall, 1990).
- Responsiveness: refers to an instrument’s ability to detect important changes over time in the construct measured. Our hypotheses were: (1) the correlations between the test 1 change score, test 2 change score, test 3 change score, and subscales 1 and 2 change scores should range from moderate to good; (2) the correlations between the test 1 change score and the performance-based test change scores (test 2 and test 3 change scores) would be higher than the subscales 1 change scores. We defined the responsiveness of test 1 as good if > 74% of the hypotheses were confirmed
Cálculo amostral
- Construct validity: At least 50 patients are required for an appropriate analysis (Terwee et al., 2007; Mokkink et al., 2016).
- Reproducibility and ceiling and floor effects: At least 50 patients are required for an appropriate analysis (Terwee et al., 2007; Mokkink et al., 2016).
- Internal consistency: a minimum of 100 patients to analyze (Terwee et al., 2007; Mokkink et al., 2016).
- Concordance: The minimum sample size to estimate good or high agreement (Kappa index > 0.6) and to reduce possible type II errors was 110 participants. Equal absolute agreement for adequate and inadequate literacy, as well as for disagreement, which was nine times lower, was taken into account. A non-probabilistic random sample was used. (Cangussu et al., 2021). Kappa statistic (McHugh, 2012; Tang et al., 2015).
Avaliação da qualidade metodológica
- QUADAS (Quality Assessment of Diagnostic Accuracy Studies) <https://www.bristol.ac.uk/population-health-sciences/projects/quadas/resources/>.