Page 77 - The diagnostic work-up of women with postmenopausal bleeding
P. 77

External validation of a prediction model
T-test or Mann–Whitney U-test for univariate analysis was used to compare means or medians. For analysis, the Statistical Package for the Social Sciences (IBM Corp, Armonk, NY, USA) version 20.0 was utilized. Statistical significance was set at p < 0.05.
Imputation of missing values
In our validation study, we performed multiple imputations for missing data elements,
with separate imputation rounds for each of the two databases.In multiple imputation,
each missing value is imputed several times.The variation among the imputations
reflects the uncertainty with which the missing values can be predicted from the
observed ones. After combining the results, the pooled estimates and standard 4 errors reflect missing data uncertainty.17-19
External validation
We retrospectively applied the two models developed by Opmeer (patient characteristics only, patient characteristics and TVS) to the women in the Dutch and Swedish databases.We assessed the performance of the models by examining calibration (agreement between predicted risks and observed frequencies of endometrial cancer) and discriminative performance (the ability of the models to distinguish between women with and without endometrial cancer). To assess calibration for the two models, we plotted the predicted probabilities of endometrial cancer and the observed proportion of endometrial cancer by deciles of the predicted probabilities in a calibration plot.20 Calibration is considered perfect if the intercept is 0 and the calibration slope is 1).21,22 Calibration is relevant to evaluate the accuracy of the risk estimates provided by the models (do patients with predicted risk of 25% indeed have a risk of 1 in 4 of having endometrial cancer), but in clinical practice high performance in terms of identified and missed cases at a certain threshold will be required. Calibration analyses were performed using R version 15.2.1.
Discriminative performance of the two models was assessed by calculating the area under the receiver operator characteristic curve (AUC).AUCs reflect the overall discriminative taking into account the full spectrum of predicted probabilities. As such, they are informative from a statistical perspective, but a model with lower AUC may show superior clinical performance at a particular threshold as compared to a model with higher AUC.
75


































































































   75   76   77   78   79