Bayesian Regression Analysis Using S-PLUS and R Softwares

Authors : S.P. Ahmad, A.A. Khan and A. Ahmed

Abstract: This study is devoted to the Bayesian regression analysis. It contains an extended version of normal theory regression models which includes extreme-value, logistic and normal regression models as its particular cases. Methods proposed in this study are illustrated numerically, i.e., regression coefficient of pH on EC (Electrical Conductivity) of soil data is analyzed numerically in S-PLUS and R softwares.

How to cite this article:

S.P. Ahmad, A.A. Khan and A. Ahmed, 2011. Bayesian Regression Analysis Using S-PLUS and R Softwares. Pakistan Journal of Social Sciences, 8: 125-131.

DOI: 10.3923/pjssci.2011.125.131

URL: https://medwelljournals.com/abstract/?doi=pjssci.2011.125.131

INTRODUCTION

Regression analysis is a statistical technique for modeling the relationship between variables. In practice, many situations involve heterogeneous population and it is important to consider the relationship of response variable y on concomitant variable x which is explicitly recognized. One way to examine the relationship of concomitant variable (or regressor variable) to response variable y is through a regression model in which y has a distribution that depends upon the regressor variables. This involves specifying a model for the distribution of y given x where:

1xp vector of regressor variables for an individual. Let the distribution of y given x is of the form:

(1)

where, β is a px1 vector of regression coefficients:

And:

The alternative form of Eq. 1 is:

(2)

Where:

has the standardized distribution with density function f (z). The family of models for which f (z) has a standard normal distribution is common in statistical literature (Searle, 1971; Rao, 1973; Seber, 1977; Draper and Smith, 1981; Weisberg, 1985) but models in which z has other distribution belonging to location-scale family (Eq. 2) are also important.

For example, extreme value regression models are employed in applications ranging from accelerated life testing (Lawless, 2003; Zelen, 1959) to the analysis of survival data on patients suffering from chronical diseases (Prentice, 1973; Feigl and Zelen, 1965; Krall et al., 1975).

Furthermore, if data is contaminated with outliers then normal distribution can be replaced with Student’s t distribution (with small degree of freedom) to have a better fit (Lange et al., 1989). Model (Eq. 2) has the ability to accommodate linear as well as non-linear models for the various functional forms of xβ.

None of the earlier researchers discuss Bayesian approach. Box and Tiao (1973) and Gelman et al. (1995) discuss this approach of regression analysis to deal with normal linear as well as non-linear non-normal models. Zellner (1971) deals with Bayesian inference in reference to econometrics. It is mostly confined to normal linear models only. The general frame work used for casual inference is presented by Rubin (1974, 1978). Bayesian approaches to analyzing regression residuals appear in Zellner (1976), Chaloner and Brant (1988) and Chaloner (1991).

Bayesian joint inference for β and σ: Suppose that associated with each individual observation is a response y₁ and a regression vector:

Table 1:	A Summary of derivatives of log- likelihoods

Where, z_i = y_i-x_iβ/σ, i = 1, 2,...,n

Thus (x_i, y_i) for i = 1, 2,..., n is assumed to be a random sample from location-scale family of models in (Eq. 1) which can be represented as:

Consequently, in terms of general notation:

A vector of length (p+1) and likelihood is (Table 1):

This implies that:

(3)

Where:

It may be noted that same notation z_i is being used for standard variate both in univariate and regression scenario. Assuming prior independence of β and σ (Jeffreys, 1961; Box and Tiao, 1973), researchers have:

(4)

As a result, joint posterior density of β and σ given data vector: y^T = (y₁, y₂, ..., y_n) is:

(5)

Where:

is a nxp matrix of covariates (or regressors) corresponding to response vector y. Now Joint inference for β and σ can be made from posterior (Eq. 5). Posterior mode of p (β, σ|x, y) serves as a point estimate of β and σ. Its calculations requires partial derivatives of log posterior:

(6)

Defining partial derivatives:

And:

These derivatives can be defined more explicitly as:

And:

Consequently, score vector U (β, σ) and Hessian matrix H (β, σ) are:

And:

Therefore, making use of Newton-Raphson iteration scheme there can obtain posterior mode vector as:

(7)

The asymptotic posterior covariance matrix of (Eq. 5) can be obtained as:

More clearly, posterior density:

(8)

where, N_r (a, b) is the r-variate normal distribution with mean vector a and a covariance matrix b. This is a first order approximation of the posterior density (Berger, 1985). An equivalent version of this approximation is the Chi-square approximation, i.e.:

A more accurate approximation, Laplace’s approximation (Tierney and Kadane, 1986; Reid, 1988) can be also used, i.e.:

(9)

Any of the approximations can be used both for hypothesis testing and construction of credible regions.

The marginal Iinference for β and σ: In regression analysis there are often interested with the regression coefficient vector β (Weisberg, 1985; Zellner, 1971) and σ is treated as nuisance parameter (Box and Tiao, 1973; Gianola and Fernando, 1986; Gelman et al., 1995) (Table 2).

Therefore, nuisance parameter must be integrated out from the joint posterior density (Eq. 5) to obtain marginal posterior density of β, denoted by p (β|x, y).

However, if the interest is concerned with the scale parameter σ, its posterior p (σ|x, y) can also be obtained after integrating out the joint posterior with respect to β. To be precise, marginal densities for β and σ are:

(10)

Table 2:	A Summary of prior densities for location parameter β

Similarly, marginal posterior of σ can be obtained, i.e:

(11)

Bayesian analysis is to be based on these two posteriors. For normal model, p (β|x, y) and p (σ|x, y) can be obtained in closed form (Zellner, 1971). However, for non-normal members of location-scale family these marginals can be obtained through numerical integration only (Naylor and Smith, 1982). Alternative approach is to deal with asymptotic theory approach (Tierney et al., 1989; Leonard et al., 1989). Normal and Laplace’s approximations can be written directly for posterior densities p (β|x, y) and p (σ|x, y) as under.

Normal approximation: Marginal posterior density of β can be approximated by normal distribution as (Table 3):

(12)

Where:

	=	The posterior mode
	=	A pxp matrix

Defined as:

where, suffixes 1 and 2 to I stand for and , respectively. This approximation is equivalent to the Chi square approximation defined as:

Corresponding approximations for p (σ|x, y) can be written as:

(13)

This is equivalent to the Chi square approximation, i.e:

Table 3:	Regression coefficient of pH on E. C for various models

Laplace’s approximation: Laplace’s approximation can also be used to approximate marginal density of β, i.e:

(14)

where, is the posterior mode of σ for a fixed β. Corresponding approximation for p (σ|x, y) can also be written as:

(15)

BAYESIAN REGRESSION ANALYSIS OF LOGISTIC MODEL

Let y be the response vector and x_i be the vector for the ith observation. Assume that:

(16)

For some f (logistic distribution). Consequently, in terms of general notation:

A vector of length (β+1) and likelihood is given by:

This implies that:

(17)

where, z_i is defined in (Eq. 16). Researchers take partial derivatives with respect to μ and σ:

Researchers follow the standard approach of Box and Tiao (1973) and Gelman et al. (1995) assuming the prior:

(18)

where, p (β) and p (σ) are priors for β and σ. By using Bayes theorem there obtain the posterior density p (β, σ|y) as:

(19)

The log-posterior is given by:

Or:

(20)

For a prior:

Researchers have:

And:

The posterior mode is obtained by maximizing (Eq. 20) with respect to β and σ. The score vector of log posterior is given by:

And Hessian matrix of log posterior is:

Posterior mode can be obtained from iteration scheme:

(21)

Consequently, modal variance Σ can be obtained as:

Using normal approximation there can write directly a bivariate normal approximation of p (β, σ|x, y) as:

(22)

Similarly there can write Bayesian analog of likelihood ratio criterion as:

(23)

Using Laplace’s approximation p (β, σ|x, y) can write as:

(24)

The marginal Bayesian inference about β and σ is to be based on marginal posterior densities of these parameters. Marginal posterior for β can be obtained after integrating out p (β, σ|x, y) with respect to σ, i.e:

(25)

Similarly, marginal posterior of σ can be obtained, i.e:

(26)

The Normal and Laplace’s approximation’s of p (β|x, y) and p (σ|x, y) are given as. There can write normal approximation of marginal posterior p (β|x, y) as:

(27)

where, is the posterior mode and is a (pxp) matrix defined as:

Bayesian analog of likelihood ratio criterion can also be defined as a test criterion as:

(28)

Laplace’s approximation of marginal posterior density p (β|x, y) can be given by:

(29)

Table 4:	Approximate Normal posterior quantiles for regression coefficient of various models

Similarly, p (σ|x, y) can be approximated and results corresponding to normal and Laplace’s approximation can be written as:

(30)

Or equivalently:

(31)

(32)

Numerical illustrations: Numerical illustrations are implemented in S-PLUS software for Bayesian regression analysis. These illustrations are meant for the purpose of showing strength of Bayesian methods in practical situations. There have used survReg and censorReg functions for Bayesian analysis of various regression models with non-informative prior. S-PLUS has a function censorReg for regression analysis. This has a very substantial overlap with survReg but is more general in that it allows truncation as well as censoring (Venables and Ripley, 2002) (Table 4). The usage of survReg and censorReg are given below:

Where:

formula	=	A formula expression as for other regression models
data	=	Optional data frame in which to interpret the variable occurring in the formula
dist	=	Assumed distribution for y variable

D^T = (D₁, D₂, ..., D_p) a (px1) vector:

for i = 1, 2, ..., p; I stands for identity (pxp) matrix and c is the normalizing constant.

Related Links

Journals By Subject

Pakistan Journal of Social Sciences

Bayesian Regression Analysis Using S-PLUS and R Softwares

How to cite this article: