# Solved – Finding optimal cutoff point with two linear regression models

I am trying to get an optimal cut-off value dividing group with minimum sums of squares of residuals (=observed y – estimated y) the model is like below.

In group 1 : model y= a1x + b1z + C1v …

In group 2 : model y= a2x + b1z + C1v …

I have the data of y, x, z, v… The problem is the group 1 and 2 are not divided yet and the purpose of the analysis is finding optimal cut-off point of x using regression models.

I searched again and again, but couldn't find the way to make models varying 'a' and share b1 and c1… and fitting it to data.

I asked similar quesion in stackexchange, and somebody advised me the problems of this kind of approach, however, I need this approach, because it's some clinical research want to 'find' optimal (not perfect) cut-off point of x.

The article I read described below
The authors of the article mentioned that they used R, but I cannot find any reference or examples about this kind of analysis.

To determine the relationship between serum 25(OH)D and iPTH concen-
trations while adjusting for confounders that could affect serum
25(OH)D concentrations (i.e., age, gender, body weight, calcium
intake, physical activity, and season of year), we considered two
linear regression models, one for subjects below a certain
concentration of serum 25(OH)D and the other for subjects above that
concentration. To determine the specific cutoffs, we fitted the two
linear regression models described above and calculated the sums of
squares of residuals (=observed PTH – estimated PTH) from the two
models for each concentration of serum 25(OH)D. The models with the
lowest residual sums of squares were our best models, and the
corresponding concentrations of serum 25(OH)D were defined as the
optimal cutoff values.

Somebody said that this question is already answered in "regression model fitting for define cut-off" but, I don't think so… It's not regression discontinued design, because there is no a-priori cut-off. Finding cutoff is the purpose of analysis.
Thanks.

Contents

Package `segmented` could help you:

Given a linear regression model (of class "lm" or "glm"), segmented tries to estimate a new model having broken-line relationships with the variables specified in seg.Z. A segmented (or broken-line) relationship is defined by the slope parameters and the break-points where the linear relation changes. The number of breakpoints of each segmented relationship is fixed via the psi argument, where initial values for the break-points must be specified. The model is estimated simultaneously yielding point estimates and relevant approximate standard errors of all the model parameters, including the break-points.

[…] segmented implements the bootstrap restarting algorithm described in Wood (2001). The bootstrap restarting is expected to escape the local optima of the objective function when the segmented relationship is flat and the log likelihood can have multiple local optima.

Here is an example (simplified from the documentation):

``library(segmented)  set.seed(12) xx<-1:100 yy<-2+1.5*pmax(xx-35,0)-1.5*pmax(xx-70,0)+rnorm(100,0,2) dati<-data.frame(x=xx,y=yy)  plot(y~x, data=dati)  out.lm<-lm(y~x,data=dati) o<-segmented(out.lm,seg.Z=~x,psi=list(x=c(30,60)),              control=seg.control(display=FALSE))  summary(o) # ***Regression Model with Segmented Relationship(s)*** #    #   Call:  #   segmented.lm(obj = out.lm, seg.Z = ~x, psi = list(x = c(30, 60)),  #                control = seg.control(display = FALSE)) #  # Estimated Break-Point(s): #         Est. St.Err # psi1.x 36.00 0.5469 # psi2.x 69.18 0.5455 #  # t value for the gap-variable(s) V:  0 0  #  # Meaningful coefficients of the linear terms: #             Estimate Std. Error t value Pr(>|t|)   # (Intercept)  0.81994    0.58906   1.392   0.1672   # x            0.05358    0.02854   1.877   0.0636 . # U1.x         1.50166    0.04127  36.387       NA   # U2.x        -1.55553    0.04540 -34.263       NA   # --- #   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #  # Residual standard error: 1.705 on 94 degrees of freedom # Multiple R-Squared: 0.9949,  Adjusted R-squared: 0.9946  #  # Convergence attained in 4 iterations with relative change 5.311563e-05   lines(predict(o)) `` Rate this post

# Solved – Finding optimal cutoff point with two linear regression models

I am trying to get an optimal cut-off value dividing group with minimum sums of squares of residuals (=observed y – estimated y) the model is like below.

In group 1 : model y= a1x + b1z + C1v …

In group 2 : model y= a2x + b1z + C1v …

I have the data of y, x, z, v… The problem is the group 1 and 2 are not divided yet and the purpose of the analysis is finding optimal cut-off point of x using regression models.

I searched again and again, but couldn't find the way to make models varying 'a' and share b1 and c1… and fitting it to data.

I asked similar quesion in stackexchange, and somebody advised me the problems of this kind of approach, however, I need this approach, because it's some clinical research want to 'find' optimal (not perfect) cut-off point of x.

The article I read described below
The authors of the article mentioned that they used R, but I cannot find any reference or examples about this kind of analysis.

To determine the relationship between serum 25(OH)D and iPTH concen-
trations while adjusting for confounders that could affect serum
25(OH)D concentrations (i.e., age, gender, body weight, calcium
intake, physical activity, and season of year), we considered two
linear regression models, one for subjects below a certain
concentration of serum 25(OH)D and the other for subjects above that
concentration. To determine the specific cutoffs, we fitted the two
linear regression models described above and calculated the sums of
squares of residuals (=observed PTH – estimated PTH) from the two
models for each concentration of serum 25(OH)D. The models with the
lowest residual sums of squares were our best models, and the
corresponding concentrations of serum 25(OH)D were defined as the
optimal cutoff values.

Somebody said that this question is already answered in "regression model fitting for define cut-off" but, I don't think so… It's not regression discontinued design, because there is no a-priori cut-off. Finding cutoff is the purpose of analysis.
Thanks.

Package `segmented` could help you:

Given a linear regression model (of class "lm" or "glm"), segmented tries to estimate a new model having broken-line relationships with the variables specified in seg.Z. A segmented (or broken-line) relationship is defined by the slope parameters and the break-points where the linear relation changes. The number of breakpoints of each segmented relationship is fixed via the psi argument, where initial values for the break-points must be specified. The model is estimated simultaneously yielding point estimates and relevant approximate standard errors of all the model parameters, including the break-points.

[…] segmented implements the bootstrap restarting algorithm described in Wood (2001). The bootstrap restarting is expected to escape the local optima of the objective function when the segmented relationship is flat and the log likelihood can have multiple local optima.

Here is an example (simplified from the documentation):

``library(segmented)  set.seed(12) xx<-1:100 yy<-2+1.5*pmax(xx-35,0)-1.5*pmax(xx-70,0)+rnorm(100,0,2) dati<-data.frame(x=xx,y=yy)  plot(y~x, data=dati)  out.lm<-lm(y~x,data=dati) o<-segmented(out.lm,seg.Z=~x,psi=list(x=c(30,60)),              control=seg.control(display=FALSE))  summary(o) # ***Regression Model with Segmented Relationship(s)*** #    #   Call:  #   segmented.lm(obj = out.lm, seg.Z = ~x, psi = list(x = c(30, 60)),  #                control = seg.control(display = FALSE)) #  # Estimated Break-Point(s): #         Est. St.Err # psi1.x 36.00 0.5469 # psi2.x 69.18 0.5455 #  # t value for the gap-variable(s) V:  0 0  #  # Meaningful coefficients of the linear terms: #             Estimate Std. Error t value Pr(>|t|)   # (Intercept)  0.81994    0.58906   1.392   0.1672   # x            0.05358    0.02854   1.877   0.0636 . # U1.x         1.50166    0.04127  36.387       NA   # U2.x        -1.55553    0.04540 -34.263       NA   # --- #   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #  # Residual standard error: 1.705 on 94 degrees of freedom # Multiple R-Squared: 0.9949,  Adjusted R-squared: 0.9946  #  # Convergence attained in 4 iterations with relative change 5.311563e-05   lines(predict(o)) `` 