I have data that describe the number of days that a car should be tested before using it.
There are different predictors which measure the complexity of the car (more complexity, more days will be need).
The main goal of my multiple regression model is to predict the number of days for new data.
I used a robust regression model but I think there is one thing I forgot to consider here:
The data are sampled over the last 3 years. I guess that there could be a time trend in terms of efficiency (ususally every year one tries to reduce the costs of testing and a car with same complexity would get in 2015 only 90 percent of test days as in 2014).
How can I capture the time trend?
You could simply consider including a time variable indicating the year of the observation (either 1, 2 or 3). That would give you a linear time trend. If the efficiency is increasing, the associated regression coefficient should be negative: the further into the future, the fewer days used for testing. Since you only have three years, going beyond linearity would be risky and likely prone to overfitting. However, if you assume the testing time to decrease exponentially (such as 10% every year), linear trend would not be what you need. Still, I am not sure how to obtain a good estimate of an exponential trend given only three years of data.
If you have a finer-than-yearly time scale, you would use the same idea. Regarding nonlinearity, technically you would now be less prone to overfitting if you tried estimating a nonlinear time trend — but in practical terms this is still a concern; I doubt the progress is fast enough to be measurable as a nonlinear function of time, given a data set spanning only three years.
- Solved – Newbie – What are the odds of having a birthday on a particular day of the week
- Solved – Would including “year” as a categorical random effect remove a long-term trend in a mixed effects model
- Solved – Number of Days in a Monthly Forecast
- Solved – R – Calculate trendline and extend it for 365 days automatically
- Solved – The test procedure of significant differences in count data over time