Solved – Probability function from two random variables in PyMC3

I'm new to Bayesian and PyMC3 and I am trying to predict a chance of failure very similar to this example (the part example:Challenger Space Shuttle Disaster):

http://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_PyMC3.ipynb

In this example the chance of failure is modeled with respect to the outside temperature.

enter image description here

As a probability function, the logistic function is used.

The pymc3 model is as follows:

import pymc3 as pm  temperature = challenger_data[:, 0] D = challenger_data[:, 1]  # defect or not?  #notice the`value` here. We explain why below. with pm.Model() as model:     beta = pm.Normal("beta", mu=0, tau=0.001, testval=0)     alpha = pm.Normal("alpha", mu=0, tau=0.001, testval=0)     p = pm.Deterministic("p", 1.0/(1. + tt.exp(beta*temperature + alpha)))     observed = pm.Bernoulli("bernoulli_obs", p, observed=D)      # Mysterious code to be explained in Chapter 3     start = pm.find_MAP()     step = pm.Metropolis()     trace = pm.sample(120000, step=step, start=start)     burned_trace = trace[100000::2] 

Now I want to make the probability function dependent of two stochastic variables, like outside temperature and O-Rings size for instance.

I believe I should draw from two beta and alpha distributions, but I cannot see which probability function I should use.

At the core, this is a logistic regression model. First, there are many improvements in pymc3 3.2 that should make this model more efficient, like using the (default) NUTS sampler:

import pymc3 as pm  temperature = challenger_data[:, 0] D = challenger_data[:, 1]  # defect or not?  import theano.tensor as tt with pm.Model() as model:     beta = pm.Normal("beta", mu=0, tau=0.001)     alpha = pm.Normal("alpha", mu=0, tau=0.001)     p = pm.Deterministic("p", tt.nnet.sigmoid(beta*temperature + alpha))     observed = pm.Bernoulli("bernoulli_obs", p, observed=D)      trace = pm.sample() 

Essentially then you want to introduce more covariates. You do this just like in a linear regression, e.g. if you have orings:

import pymc3 as pm  temperature = challenger_data[:, 0] D = challenger_data[:, 1]  # defect or not?  import theano.tensor as tt with pm.Model() as model:     beta_temp = pm.Normal("beta", mu=0, sd=1)     beta_orings = pm.Normal("beta", mu=0, sd=1)     alpha = pm.Normal("alpha", mu=0, sd=1)     p = pm.Deterministic("p", tt.nnet.sigmoid(beta_temp*temperature + beta_orings*orings + alpha))     observed = pm.Bernoulli("bernoulli_obs", p, observed=D)      trace = pm.sample() 

Note that I also changed the priors to be more informed, which is always a good idea (but it assumes your covariates are z-scored so might not be a good idea here). There is also more convenient syntax in the glm submodule, see here: http://docs.pymc.io/notebooks/GLM-logistic.html

Finally, you get a better hit-rate for your questions on http://discourse.pymc.io/.

Similar Posts:

Rate this post

Leave a Comment