This question is about estimating cut-off scores on a multi-dimensional screening questionnaire to predict a binary endpoint, in the presence of correlated scales.

I was asked about the interest of controlling for associated subscores when devising cut-off scores on each dimension of a measurement scale (personality traits) which might be used for alcoholism screening. That is, in this particular case, the person was not interested in adjusting on external covariates (predictors) — which leads to (partial) area under covariate-adjusted ROC curve, e.g. (1-2) — but essentially on other scores from the same questionnaire because they correlate one to each other (e.g. "impulsivity" with "sensation seeking"). It amounts to build an GLM which includes on the left-side the score of interest (for which we seek a cut-off) and another score computed from the same questionnaire, while on the right-hand side the outcome may be drinking status.

To clarify (per @robin request), suppose we have $j=4$ scores, say $x_j$ (e.g., anxiety, impulsivity, neuroticism, sensation seeking), and we want to find a cut-off value $t_j$ (i.e. "positive case" if $x_j>t_j$, "negative case" otherwise) for each of them. We usually adjust for other risk factors like gender or age when devising such cut-off (using ROC curve analysis). Now, what about adjusting impulsivity (IMP) on gender, age, and sensation seeking (SS) since SS is known to correlate with IMP? In other words, we would have a cut-off value for IMP where effect of age, gender and anxiety level are removed.

Apart from saying that a cut-off must remain as simple as possible, my response was

About covariates, I would recommend

estimating the AUCs with and without

adjustment, just to see if the

predictive performance increase. Here,

your covariates are merely other

subscores defined from the same

measurement instrument and I never

faced such a situation (usually, I

adjust on known risk factors, like Age

or Gender). […] Also, since you are

interested in prognostic issues (i.e.

screening efficacy of the questionnaire), you

may also be interested in estimating

the positive predictive value (PPV,

probability of patients with positive

test results who are correctly

classified) provided you are able to

classify subjects as "positive" or

"negative" depending on their

subscores on your questionnaire. Note, however,

that it is necessary to know the

prevalence of this disorder to

correctly interpret the PPV in turn…

Do you have a more thorough understanding of this particular situation, with link to relevant papers when possible?

**References**

- Janes, H and Pepe, MS (2008). Adjusting for Covariates in Studies of Diagnostic, Screening, or Prognostic Markers: An Old Concept in a New Setting.
*American Journal of Epidemiology*, 168(1): 89-97. - Janes, H and Pepe, MS (2008). Accommodating Covariates in ROC Analysis.
*UW Biostatistics Working Paper Series*, Paper 322.

**Contents**hide

#### Best Answer

The way that you've envisioned the analysis is really not the way I would suggest you start out thinking about it. First of all it is easy to show that if cutoffs *must* be used, cutoffs are not applied on individual features but on the overall predicted probability. The optimal cutoff for a single covariate depends on all the levels of the other covariates; it cannot be constant. Secondly, ROC curves play no role in meeting the goal of making optimum decisions for an *individual* subject.

To handle correlated scales there are many data reduction techniques that can help. One of them is a formal redundancy analysis where each predictor is nonlinearly predicted from all the other predictors, in turn. This is implemented in the `redun`

function in the R `Hmisc`

package. Variable clustering, principal component analysis, and factor analysis are other possibilities. But the main part of the analysis, in my view, should be building a good probability model (e.g., binary logistic model).

### Similar Posts:

- Solved – Adjusting for covariates in ROC curve analysis
- Solved – Adjusting for covariates in ROC curve analysis
- Solved – ANCOVA: ordinal covariate on SPSS
- Solved – Subscales (not items) as indicators of latent variables in SEM
- Solved – Different score range when calculating area of under curve in ROC curves