I am very new to statistics and prob theory:
I am looking at a scatter plot of two random variables, both uniformly distributed on $(0,1)$. The plot is approximatively a straight line, and the book says this is an indicator of independence.
I don't think I really understand the scatter plot at all. It is the 2d plot of points where one variable gives one coordinate, and the other variable gives another coordinate. So if it's almost a straight line, that means that when RV 1 gives the value $0.25$, then RV 2 gives something similar. But why is this a sign of independence, surely that's as dependent you can get (I know it's not, but that's the first thought that popped into my head).
Scatter plot shows values of one variable against values of the second variable for paired observations. Below you can see four examples:
- $X$ versus itself (obviously dependent),
- $X$ vs $Y$ where both are independent from each other,
- $X$ vs $Z$ where $Z$ is a function of $X$ with noise: $Z = aX + varepsilon$,
- $Y$ vs $Z$, both are independent.
As you can see, scatter plot in form of straight line shows linear relation between variables. If it is "random", then it suggests independence. If the randomness is uniformly distributed (2), values are scattered uniformly on the area of the plot, however if they have different distribution it can look differently, e.g. in (4) Gaussian noise is added and so $Z$ is more centered around mean.
See also What is covariance in plain language? thread as it describes the idea of covariance in simple terms. Understanding scatter plots will help you in the future when dealing with residual diagnostics in regression modeling, where scatter plots of fitted values vs residuals are commonly used, e.g. to spot heterosedacity.
set.seed(123) n <- 1e3 x <- runif(n) y <- runif(n) z <- x * 5 + rnorm(n)