Model formulae in R such as
y ~ x + a*b + c:d
are based on the so called Wilkinson notation: Wilkinson and Rogers 1973, Symbolic Description of Factorial Models for Analysis of Variance.
This paper did not discuss notations for mixed models (which might not have existed back then). So where did the mixed model formulae used in lme4
and related packages in R such as
y ~ x + a*b + c:d + (1|school) + (a*b||town)
come from? Who introduced them for the first time, and when? Is there any agreed upon term such as "Wilkinson notation" for them? I am specifically referring to the terms like
(model formula | grouping variable) (model formula || grouping variable)
Best Answer
The notation |
has been around in nlme
docs since version 3.1-1 and that is probably late 1999; we can easily check that on CRAN nlme code archive. nlme
does use this notation, for example try library(nlme); formula(Orthodont)
; the |
comes up – so 2000's are off. So let's dig…. "Graphical Methods for Data with Multiple Levels of Nesting" Pinheiro & Bates (1997) where the groupedData
constructor is introduced. And they say: "The formula in a grouped data object has the same pattern as the formula used in a call to a Trellis graphics function in S-PLUS, such as xyplot" Which…. makes sense as are P&B working in… Bell Labs (RIP) which developed the Trellis graphics system which actually used the operator |
already to indicate groups. Which probably means… "The Visual Design and Control of Trellis Display" by Becker et al. (1996) has something to do with this. Notation is not introduced in this paper but it is the first electronic Trellis display reference I can find.
Essentially we need to dig-up visualisation literature at this point. Probably I would check Cleveland's book Visualizing Data (1993) and early works of Deepayan Sarkar (who developed lattice
). Notice that the actual operator |
(and ||
) are true primitive operators as they are associated with OR
operators, so it was just a matter of time till someone overloads them. While not a full answer, I strongly suspect P&B checked their colleagues cool visualisation system (the plots in that 1996 paper are quite good for late 2010's standards) and realised that someone (Becker, Cleveland and Shyu) already did some work on this (maybe even discussed this with them at the time) and just followed up what was already there. I.e. the |
operator originates in graphics notation. Trellis almost certainly used it; potential predecessors of Trellis may have done so too but their e-footprint is very hard to track.
In general, I think you might want this page on NLME: Software for mixed-effects models by Bell Labs for more historic information on nlme
.
Similar Posts:
- Solved – Non-linear model in lme4
- Solved – Different t-value for the same data in R (nlme vs lme4 package)
- Solved – What’s the formula for the Benjamini-Hochberg adjusted p-value
- Solved – What’s a good book or reference for data visualization
- Solved – Time series prediction – what is Autoregressive Tree model ? (Python)