Bernoulli model: bulge size x red spirals in Python using Stan

From: Bayesian Models for Astrophysical Data, Cambridge Univ. Press

you are kindly asked to include the complete citation if you used this material in a publication

Code 10.13 Bernoulli model in Python using Stan, for assessing the relationship between bulge size and the fraction of red spirals

================================================================================

import numpy as np
import pandas as pd
import pystan
import statsmodels.api as sm

# Data
path_to_data = 'https://raw.githubusercontent.com/astrobayes/BMAD/master/data/Section_10p6/Red_spirals.csv'

# read data
data_frame = dict(pd.read_csv(path_to_data))
x = np.array(data_frame['fracdeV'])

# prepare data for Stan
data = {}
data['X'] = sm.add_constant((x.transpose()))
data['Y'] = np.array(data_frame['type'])
data['nobs'] = data['X'].shape[0]
data['K'] = data['X'].shape[1]

# Fit
stan_code="""
data{
int<lower=0> nobs; # number of data points
int<lower=0> K; # number of coefficients
matrix[nobs, K] X; # bulge size
int Y[nobs]; # galaxy type: 1 - red, 0 - blue
}
parameters{
vector[K] beta; # linear predictor coefficients
}
model{
# priors and likelihood
for (i in 1:K) beta[i] ~ normal(0, 100);

Y ~ bernoulli_logit(X * beta);
}
"""

# Run mcmc
fit = pystan.stan(model_code=stan_code, data=data, iter=6000, chains=3,
warmup=3000, thin=1, n_jobs=3)

# Output
print(fit)

================================================================================

GET SOURCE

Output on screen:

Inference for Stan model: anon_model_6aebece066ad577d127968121539fd7a.
3 chains, each with iter=6000; warmup=3000; thin=1;
post-warmup draws per chain=3000, total post-warmup draws=9000.

mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta[0] -4.91 3.3e-3 0.16 -5.24 -5.02 -4.91 -4.81 -4.6 2318 1.0
beta[1] 8.17 9.6e-3 0.46 7.26 7.86 8.17 8.47 9.07 2282 1.0
lp__ -954.4 0.02 0.95 -957.0 -954.8 -954.1 -953.7 -953.5 2953 1.0

Samples were drawn using NUTS at Wed May 3 19:10:45 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

HSI

HSI