INTRODUCTION
The use of Generalized Estimating Equations (GEE) to analyze repeated binary data has become increasingly common in the health sciences. The analysis of correlated binary responses is often accomplished through the use of GEE methodology for parameter estimation. Assessment of the adequacy of the fitted GEE model is problematic since no likelihood exists and the residuals are correlated within a cluster. Tsiatis^{[2]} proposed a goodnessoffit test for the logistic regression model which is asymptotically chisquared and is computed as a quadratic form of observed counts minus the expected counts. Stuart^{[3]} proposed a goodnessoffit test statistic for regression with heterogeneous variance, which is asymptotically chisquare if the given model is correct. The test statistic is computed as a quadratic form of observed minus predicted responses. Cessie^{[4]} discussed a new global test statistic for models with continuous covariates and binary response is introduced. The test statistic is based on nonparametric kernel methods. Explicit expressions are given the mean and variance of the test statistic. Asymptotic properties are considered and approximate corrections due to parameter estimation are presented. Also Cessie^{[5]} considered testing the goodnessoffit of regression models. Emphasis is on a goodnessoffit test for generalized linear models with canonical link function and known dispersion parameter. The test based on the score test for extra variation in a random effect model. By choosing a suitable form for the dispersion matrix, a goodnessoffit test statistic is obtained which is quite similar to test statistics based on nonparametric kernel methods. The aim of present study was to utilize the BIRDEM data to parameter estimate in the main effect model and another model which includes the same main effects, the regions, time effects and interaction effects and then to test the goodnessoffit by using various correlation structures.
Generalized Estimating Equation (GEE): The GEE approach provides consistent estimators of the regression parameters which needs only the correct specification of the form of the mean function μ_{i}, of the vector of responses for each individual.
Let us consider that each individual is observed for T occasions. Thus we have
a Y x 1 random vector of responses for the ith individual where the response
variable is binary. Notationally,
Where, the binary random variable Y_{it} = 1 if at time t, the subject i has response 1, i.e., success and 0 otherwise. Here the response variable is dichotomous. We took k independent variables, so for ith individual we have a T x k matrix of covariates.
Notationally,
The usual GEE modeling for binary outcomes have the following setting:
The mean vector is
Where:
So the variance of y_{ij} is
And the variance covariance matrix of y_{i} is given by:
Estimation of β is obtained by solving the generalized estimating equations^{[6,7]},
where, R_{i} is the working correlation matrix for Y_{i}.
Goodnessoffit test: By first partitioning the covariate space into
M distinct region in Pdimensional space. Let be an be
an M x 1 vector, where, I_{itm} is the indicator variable that equals
one if the ith subject is in the mth region at the tth occasion and zero otherwise.
They define the T x M matrix I_{i} as:
Let Z_{T} be the T x (T1) matrix where the first row has entries zero and the remaining (T1) rows form a (T1) x (T1) identity matrix. Consider the model :
Where,
is a T x (T1) M matrix and 0 is a (T1) M x 1 vector of zeros. Note that τ
is the (T1) x 1 vector of time effects (the first occasion is the reference
time point), γ is the M x 1 vector of region effects and ρ is the
(T1) M x 1 vector of time and region interaction effects because each column
of S_{i} results from component wise multiplication of two column vectors,
one column vector from Z_{T} and the other from I_{i}. A goodnessoffit
statistic consists of testing H_{0}: θ = 0, where, θ = [τ’,
γ’, ρ’]’ is a J x 1 vector with J = (T1)+M+(T1)M.
Let L = P+1+J be the number of parameters in the model presented in (4). Denote U be the L x 1 vector with lth component:
for
is obtained as the solution to (2). Then under H_{0}: θ = 0, the asymptotic distribution of U is multivariate normal with mean zero and covariance matrix^{[6]}:
Where, is
a T x T matrix. Note that cov (Y_{i}) can be consistently estimated
by If the correlation matrix R_{i} is correctly specified, then the asymptotic covariance matrix U reduces to
be the partitioning for U, W_{R} and W, where, U_{2} is the J x 1 vector and C_{R} and C are J x J matrices. Under H_{0}: θ=0, both the proposed robust (empirically corrected) goodnessoffit test statistic:
And the proposed modelbased goodnessoffit test statistic:
Are asymptotically distributed as chisquare random variables with:
Where, G¯ is any generalized inverse of the matrix G. The degreeoffreedom
for chisquare random variables do not equal the number of parameters in θ
because of linear dependencies between the covariates in the model and the covariates
from the region partitioning, i.e., are
singular matrices. Let H_{1} and H_{2} be the design matrices
in models (1) and (4), respectively.
Then intuitively, the degreesoffreedom of the above chisquare random variables
is equal to rank (H_{2})(H_{1}). Let
design matrix for the ith subject in model (4). It is easily shown that the
tj th element of
is equal to Therefore, the goodnessoffit test statistics Q and Q_{R} can be readily
calculated once
is obtained from the estimating Eq. 2.
Data set and covariates: In our study we have used the repeated measures data diabetes mellitus to carry out the analysis. Here the follow up data on 995 patients registered at BIRDEM (Bangladesh Institute of Research and Rehabilitation in Diabetes, Endocrine and Metabolic disorders) in 198494 is used to identify the risk factors responsible for the transitions from controlled diabetic to confirmed diabetic state as well as confirm diabetic to controlled stage of diabetes. The response variable is defined in terms of the observed glucose level two hours of 75 gglucose load followup visit. The cutoff point for the blood glucose level is 11.1 mmol L^{1}. If the observed response is less than 11.1, then the patient is define as non diabetic (categorized as 0) if the response is greater than or equal to 11.1 then the patient is said to be diabetic (categorized as 1). We included two independent variables in the study. They are age and sex. Out of these variables, age represents the age responds at each visit. The variable is a continuous variable and used directly in the analysis. Sex is categorical variables. Here sex is a dichotomous variable with two categories 0 and 1, 0 stands for female and 1 stands for male. In order to assess the performance of the proposed goodnessoffit tests, we used data simulated with known distributions from models in the alternative hypothesis to test the goodnessoffit. To conduct the proposed goodnessoffit tests, the following regions were partitioned as region1 if age greater than or equal to 50 and male, region 2 if age greater than or equal to 50 and female, region 3 if age less than 50 and male and region 4 if age less than 50 and female. If any individual occurs any of the four regions then indicate 1 otherwise 0. Time effect represents the two consecutive visits. Time effect is a dichotomous variable with two categories 0 and 1, 0 stands for first visit and 1 stands for second visit. Interaction 1, interaction 2, interaction 3, interaction 4 are component wise multiplication of region 1, region 2, region 3, region 4 and time effect.
RESULTS AND DISCUSSION
The logistic regression model is considered as one of the most important and widely applicable techniques in analyzing repeated outcome variables. To assess the fit of a model, it is necessary to identify the influential elements. In the logistic regression analysis for repeated binary measures we adjust for setting and the covariates. We assumed independence, exchangeable, autoregressive and pairwise working correlation structures and we obtained standard errors. Table 1 lists the parameter estimates and standard errors for the initial model having only main effects.
According to likelihood test the null hypothesis is rejected under all correlation structures in GEE. In this case has an interpretation that at least one of the coefficients is different from zero. According to Wald test sex is significant at 5% level of significance under independence, exchangeable, autoregressive and pairwise correlation structures. There exits positive association between the response variable and sex. The estimated coefficient of the variable age is found to be insignificant in all cases. Hence it may be conclude that these variables has no significant effect on the transition from confirmed diabetes state to controlled diabetes state. In terms of odds ratio, we may comment that, male patients are 1.240775 times likely to develop diabetes as compared to their counterparts. We considered additions to this main effects model to provide a better fit to the data. Table 2 displays the results from a model that includes regions, time effects and interactions.
In this case we see that several of the effects are significant, indicating
their importance in modeling. Reject the null hypothesis by likelihood test
under independence, exchangeable autoregressive and pairwise correlation structures.
So rejection of null hypotheses in this case has an interpretation that at least
one of the coefficients is different from zero. We also found that under all
assumptions region 1 and time effect show positive association and interaction1
shows negative association.
Table 1: 
Estimates obtained by GEE assuming various correlation structures
within repeated outcomes with associated Wald test 

*Significant at p<0.05 
Table 2: 
Estimates obtained Barnhart and Williamson’s model by
GEE assuming various correlation structures within repeated outcomes with
associated Wald test 

Table 3: 
Goodnessoffit by using various correlation structures 

Among these variation region1, time effect and interaction1 are significant
at 5% level of significance in all cases. The other coefficients of the variables
are found to be insignificant in all cases. Hence it may be conclude that these
variables has no significant effect on the transition from confirmed diabetes
state to controlled diabetes state.
From the Table 3, the model suggested by Barnhart and Williamson^{[1]}
is highly significant by model based test. In this case has an interpretation
that at least one of the coefficients is different from zero. Also we see that
the null hypothesis is rejected by the empirically corrected test and the model
(4) is highly significant. In this case has an interpretation that the covariates
have significant effect. The both goodnessoffit test provided no evidence
for lack of fit by adding regions, time effect and interaction effects.
CONCLUSIONS
We fit two models to the data. The first model only includes the main effects
of age and sex and the second model includes the same main effects and the treatment
and time interaction. Because all the covariates are discrete, the covariate
categories were used to form four regions with frequencies. Both the goodnessoffit
tests suggest that the model with only main effects did not fit the data well.
There is a significant time and treatment interaction effect indicating that
patients with new treatment improved significantly faster than the patients
with the standard treatment. The model with this interaction term included has
a good fit to the data. The parameter estimates and the goodnessoffit tests
obtained here are very similar to the results obtained by using a weighted least
squares approach. Thus, the goodnessoffit tests successfully detected the
interpretation departure and the efficiencies of the estimates of the Barnhart
and Williamson’s suggested model for identity correlation is higher than
that of our suggested exchangeable correlation, autoregressive correlation and
pairwise correlation.
ACKNOWLEDGEMENTS
We would like to express our gratitude to the Director of BIRDEM for giving us kind permission to use their data. We are indebted to the Chairman, Department of Statistics, University of Dhaka, Bangladesh for his kind cooperation through this research.