I'm new to Bayesian and PyMC3 and I am trying to predict a chance of failure very similar to this example (the part example:Challenger Space Shuttle Disaster):
In this example the chance of failure is modeled with respect to the outside temperature.
As a probability function, the logistic function is used.
The pymc3 model is as follows:
import pymc3 as pm temperature = challenger_data[:, 0] D = challenger_data[:, 1] # defect or not? #notice the`value` here. We explain why below. with pm.Model() as model: beta = pm.Normal("beta", mu=0, tau=0.001, testval=0) alpha = pm.Normal("alpha", mu=0, tau=0.001, testval=0) p = pm.Deterministic("p", 1.0/(1. + tt.exp(beta*temperature + alpha))) observed = pm.Bernoulli("bernoulli_obs", p, observed=D) # Mysterious code to be explained in Chapter 3 start = pm.find_MAP() step = pm.Metropolis() trace = pm.sample(120000, step=step, start=start) burned_trace = trace[100000::2]
Now I want to make the probability function dependent of two stochastic variables, like outside temperature and O-Rings size for instance.
I believe I should draw from two beta and alpha distributions, but I cannot see which probability function I should use.
Best Answer
At the core, this is a logistic regression model. First, there are many improvements in pymc3 3.2 that should make this model more efficient, like using the (default) NUTS sampler:
import pymc3 as pm temperature = challenger_data[:, 0] D = challenger_data[:, 1] # defect or not? import theano.tensor as tt with pm.Model() as model: beta = pm.Normal("beta", mu=0, tau=0.001) alpha = pm.Normal("alpha", mu=0, tau=0.001) p = pm.Deterministic("p", tt.nnet.sigmoid(beta*temperature + alpha)) observed = pm.Bernoulli("bernoulli_obs", p, observed=D) trace = pm.sample()
Essentially then you want to introduce more covariates. You do this just like in a linear regression, e.g. if you have orings:
import pymc3 as pm temperature = challenger_data[:, 0] D = challenger_data[:, 1] # defect or not? import theano.tensor as tt with pm.Model() as model: beta_temp = pm.Normal("beta", mu=0, sd=1) beta_orings = pm.Normal("beta", mu=0, sd=1) alpha = pm.Normal("alpha", mu=0, sd=1) p = pm.Deterministic("p", tt.nnet.sigmoid(beta_temp*temperature + beta_orings*orings + alpha)) observed = pm.Bernoulli("bernoulli_obs", p, observed=D) trace = pm.sample()
Note that I also changed the priors to be more informed, which is always a good idea (but it assumes your covariates are z-scored so might not be a good idea here). There is also more convenient syntax in the glm
submodule, see here: http://docs.pymc.io/notebooks/GLM-logistic.html
Finally, you get a better hit-rate for your questions on http://discourse.pymc.io/.