How do I formulate quantile regression as a Linear Programming problem?

When looking at the median quantile problem I know it is

begin{align}

text{minimize } & sum_{i=1}^n |beta_0 + X_i beta_1-Y_i|\

text{transforms into } & \

text{minimize } & sum_{i=1}^n e_i\

text{s.t.} & \

& e_igeq beta_0 + X_ibeta_{1}-Y_i\

& e_igeq -(beta_0 + X_ibeta_{1}-Y_i)

end{align}

but how do I transform the minimization for other quantiles?

**Contents**hide

#### Best Answer

You use the quantile regression estimator

$$hat beta(tau) := arg min_{theta in mathbb R^K} sum_{i=1}^N rho_tau(y_i – mathbf x_i^top theta).$$

where $tau in (0,1)$ is constant chosen according to which quantile needs to be estimated and the function $rho_tau(.)$ is defined as

$$rho_tau(r) = r(tau – I(r<0)).$$

In order to see the purpose of the $rho_tau(.)$ consider first that it takes the residuals as arguments, when these are defined as $epsilon_i =y_i – mathbf x_i^top theta$. The sum in the minimization problem can therefore be rewritten as

$$sum_{i=1}^N rho_tau(epsilon_i) =sum_{i=1}^N tau lvert epsilon_i lvert I[epsilon_i geq 0] + (1-tau) lvert epsilon_i lvert I[epsilon_i < 0]$$

such that positive residuals associated with observation $y_i$ above the suggested quantile regression line $mathbf x_i^top theta$ are given the weight of $tau$ while negative residuals associated with observations $y_i$ below the suggested quantile regression line $mathbf x_i^top theta$ are weighted with $(1-tau)$.

Intuitively:

With $tau=0.5$ positive and negative residuals are "punished" with the same weight and an equal number of observation are above and below the "line" in optimum so the line $mathbf x_i^top hat beta$ is the median regression "line".

When $tau=0.9$ each positive residual is weighted 9 times that of a negative residual with weight $1-tau= 0.1$ and so in optimum for every observation above the "line" $mathbf x_i^top hat beta$ approximately 9 will be placed below the line. Hence the "line" represent the 0.9-quantile. (For an exact statement of this see THM. 2.2 and Corollary 2.1 in Koenker (2005) "Quantile Regression")

The two cases are illustrated in these plots. Left panel $tau=0.5$ and right panel $tau=0.9$.

Linear programs are predominantly analyzed and solved using the standard form

$$(1) min_z c^top z mbox{subject to } A z = b , z geq 0$$

To arrive at a linear program on standard form the first problem is that in such a program (1) all variables over which minimization is performed $z$ should be positive. To achieve this residuals are decomposed into positive and negative part using slack variables:

$$epsilon_i = u_i – v_i$$

where positive part $u_i = max(0,epsilon_i) = lvert epsilon_i lvert I[epsilon_i geq 0]$ and $v_i = max(0,-epsilon_i) =lvert epsilon_i lvert I[epsilon_i < 0]$ is the negative part. The sum of residuals assigned weights by the check function is then seen to be

$$sum_{i=1}^N rho_tau(epsilon_i) = sum_{i=1}^N tau u_i + (1-tau) v_i = tau mathbf 1_N^top u + (1-tau)mathbf 1_N^top v,$$

where $u = (u_1,…,u_N)^top$ and $v=(v_1,…,v_N)^top$ and $mathbf 1_N$ is vector $N times 1$ all coordinates equal to $1$.

The residuals must satisfy the $N$ constraints that

$$y_i – mathbf x_i^toptheta = epsilon_i = u_i – v_i$$

This results in the formulation as a linear program

$$min_{theta in mathbb R^K,uin mathbb R_+^N,vin mathbb R_+^N}{ tau mathbf 1_N^top u + (1-tau)mathbf 1_N^top vlvert y_i= mathbf x_itheta + u_i – v_i, i=1,…,N},$$

as stated in Koenker (2005) "Quantile Regression" page 10 equation (1.20).

However it is noticeable that $thetain mathbb R$ is still not restricted to be positive as required in the linear program on standard form (1). Hence again decomposition into positive and negative part is used

$$theta = theta^+ – theta^- $$

where again $theta^+=max(0,theta)$ is positive part and $theta^- = max(0,-theta)$ is negative part. The $N$ constraints can then be written as

$$mathbf y:= begin{bmatrix} y_1 \ vdots \ y_Nend{bmatrix} = begin{bmatrix} mathbf x_1^top \ vdots \ mathbf x_N^top end{bmatrix}(theta^+ – theta^-) + mathbf I_Nu – mathbf I_Nv ,$$

where $mathbf I_N = diag{mathbf 1_N}$.

Next define $b:=mathbf y$ and the design matrix $mathbf X$ storing data on independent variables as

$$ mathbf X := begin{bmatrix} mathbf x_1^top \ vdots \ mathbf x_N^top end{bmatrix} $$

To rewrite constraint:

$$b= mathbf X(theta^+ – theta^-) + mathbf I_N u- mathbf I_N v= [mathbf X , -mathbf X , mathbf I_N ,mathbf – mathbf I_N] begin{bmatrix} theta^+ \ theta^- \ u \ vend{bmatrix}$$

Define the $(N times 2K + 2N )$ matrix

$$A := [mathbf X , -mathbf X , mathbf I_N ,mathbf – mathbf I_N]$$ and introduce $theta^+$ and $theta^-$ as variables over which to minimize so they are part of $z$ to get

$$b = A begin{bmatrix} theta^+ \ theta^- \ u \ vend{bmatrix} = Az$$

Because $theta^+$ and $theta^-$ only affect the minimization problem through the constraint a $mathbf 0$ of dimension $2Ktimes 1$ must be introduced as part of the coeffient vector $c$ which can the appropriately be defined as

$$ c = begin{bmatrix}mathbf 0 \ tau mathbf 1_N \ (1-tau) mathbf 1_N end{bmatrix},$$

thus ensuring that $c^top z = underbrace{mathbf 0^top(theta^+ – theta^-)}_{=0}+tau mathbf 1_N^top u + (1-tau)mathbf 1_N^top v = sum_{i=1}^N rho_tau(epsilon_i).$

Hence $c,A$ and $b$ are then defined and the program as given in $(1)$ completely specified.

This is probably best digested using an example. To solve this in R use the package quantreg by Roger Koenker. Here is also illustration of how to set up the linear program and solve with a solver for linear programs:

`base=read.table("http://freakonometrics.free.fr/rent98_00.txt",header=TRUE) attach(base) library(quantreg) library(lpSolve) tau <- 0.3 # Problem (1) only one covariate X <- cbind(1,base$area) K <- ncol(X) N <- nrow(X) A <- cbind(X,-X,diag(N),-diag(N)) c <- c(rep(0,2*ncol(X)),tau*rep(1,N),(1-tau)*rep(1,N)) b <- base$rent_euro const_type <- rep("=",N) linprog <- lp("min",c,A,const_type,b) beta <- linprog$sol[1:K] - linprog$sol[(1:K+K)] beta rq(rent_euro~area, tau=tau, data=base) # Problem (2) with 2 covariates X <- cbind(1,base$area,base$yearc) K <- ncol(X) N <- nrow(X) A <- cbind(X,-X,diag(N),-diag(N)) c <- c(rep(0,2*ncol(X)),tau*rep(1,N),(1-tau)*rep(1,N)) b <- base$rent_euro const_type <- rep("=",N) linprog <- lp("min",c,A,const_type,b) beta <- linprog$sol[1:K] - linprog$sol[(1:K+K)] beta rq(rent_euro~ area + yearc, tau=tau, data=base) `

### Similar Posts:

- Solved – Reconciling notations for mixed models
- Solved – Linear regression: *Why* can you partition sums of squares
- Solved – Is the Gaussian Kernel still a valid Kernel when taking the negative of the inner function
- Solved – Is the Gaussian Kernel still a valid Kernel when taking the negative of the inner function
- Solved – Variance of $hat{mathbf{beta}}_j$ in multiple linear regression models