## The Nonequivalent Group DesignPart I

In this exercise you are going to create data for and analyze a nonequivalent group design. The design has several important characteristics. First, a pretest and posttest are given to all participants. Second, the design usually has two groups, one which gets some program or treatment and one which does not (usually termed the "program" and " comparison groups respectively). Third, the two groups are "nonequivalent groups", that is, we expect that they may differ prior to the study. Often, nonequivalent groups are simply two intact groups that are accessible to the researcher (e.g., two classrooms, two states, two cities, two mental health centers, etc.). We can depict the design using the following notation:

N  O  X  O
N  O       O

where the N indicates that the groups are nonequivalent, the first O represents the pretest, the X indicates administration of some program or treatment, and the last O signifies the posttest. Notice that the top line represents the program group while the bottom line signifies the comparison group.

To begin, get into MINITAB in the usual manner. You should see the MTB prompt (which looks like this MTB>). Now you are ready to enter the following commands.

You will create two hypothetical tests as in previous exercises. Here, one test will be considered the pretest, the other the posttest. We will assume that both tests measure the same true ability and that they each have their own unreliability or error:

MTB> Random 500 C1;
SUBC> Normal 50 5.
MTB> Random 500 C2;
SUBC> Normal 0 5.
MTB> Random 500 C3;
SUBC> Normal 0 5.

Here C1 represents the true ability on the tests for 500 people. C2 and C3 represent random error for the pretest and posttest respectively. Notice that the mean ability score for the tests will initially be set to 50 test score units. Next, construct the observed tests scores:

You should notice that each test has about equal amounts of true score and error (because all three Random/Normal statements above use a 5 unit standard deviation). Now, name the columns:

MTB> Name C1 ='true' C2 ='x error' C3 ='y error' C4 ='pretest' C5 ='posttest'

What you have done so far is to create a pretest and post for 500 hypothetical persons. Next, you have to create "nonequivalent" groups. For convenience, you will create groups of 250 persons each. To do this enter:

MTB> Set C6
DATA> 1:500
DATA> End
MTB> Code (1:250) 0 C6 C6
MTB> Code (251:500) 1 C6 C6

The SET statement (and the two associated DATA statements) simply numbers each person from 1 to 500 and puts this sequence of numbers in C6. The first code statement essentially says "change all the numbers from 1 to 250 in C6 to 0's and put these 0's back into C6." The second code replaces the numbers from 251 to 500 with a 1. You have created two groups of 250 persons each. You know which group a person is in by looking at their value in C6. If they have a 0, they are in one group; if they have a 1, they are in the other. For convenience, the persons having a zero will be the comparison group and those having a one will be the program group. You should name this new variable:

MTB> Name C6 = 'Group'

Try the following three statements to verify that you have two groups of 250 persons:

MTB> Table C6
MTB> Sign C6
MTB> Histogram 'Group'

Each of these presents slightly different information but all of them verify that you have two equal sized groups.

But you have still not created "nonequivalent" groups. To see this, you will use the subcommand form of the TABLE command:

MTB> Table C6;
SUBC> means C4 C5.

The first row of the table gives the pretest and posttest means for the comparison group (C6 = 0) while the second row gives these values for the program group. At this point, all four means should be near 50 test score units.

In the nonequivalent group design we typically select two groups which we hope are similar or equivalent. Nevertheless, because we don't select these groups randomly, we expect that one group may be better or worse than the other before our study. You saw from the table command above that both groups appear to be similar on the pretest. Therefore, you can create nonequivalent groups by making the program group slightly better in test ability. This situation might occur in real life if we chose two classrooms of students that we thought were pretty similar, only to find out that one group scores on the average a few points better than the other. To create the "advantaged" program group do the following:

MTB> Let C4 = C4 + (5 * C6)
MTB> Let C5 = C5 + (5 * C6)

It is important to think about what these statements are doing. The first let command operates on the pretest scores. You add five test score points to each program group pretest score. How does this work? Remember that C6 has a 0 for all the comparison group persons and a 1 for the program people. When you multiply 5 times this C6 variable the result will be a zero for each comparison person and a 5 for each program person. You then add these 0 or 5 points to the original pretest score and put the result right back into C4. The second Let command does the same thing for the posttest scores and, as a result, this "advantage" should be seen on both the pre and posttest. Now verify that you have an "advantaged" program group (that is, that you have nonequivalent groups). You will again use the table command but will add another subcommand to give the standard deviations:

MTB> Table C6;
SUBC> means C4 C5;
SUBC> stdev C4 C5.

Clearly, the program group has pre and posttest averages in the vicinity of 55 test score units.

So far, you have created two nonequivalent groups having a pretest and posttest. But one of these groups received your program or treatment. Did it work? It would appear from the data that it did not. The difference between the group means on the pretest is about the same as their posttest difference. About the only way that you could claim that the program had an effect is if you had reason to believe that without it the posttest difference between the means would have been different than it is. This would be possible, for example, if the groups had been maturing at different rates (a selection - maturation threat) but without any other evidence than these test scores this would be a hard argument to accept. On the basis of this data you would probably conclude that the program was ineffective. This makes sense especially because you did not build into the data any program effect. Now add 10 test score points to the posttest for each program person - a treatment effect of 10 points:

MTB> Let C5 = C5 + (10*C6)

which you should recognize as the same type of command that you used above to create nonequivalent groups in the first place. Now, look at the means and standard deviations:

MTB> Table C6;
SUBC> means C4 C5;
SUBC> stdev C4 C5.

Now, the pretest difference between the two groups is still about 5 points on the average, but the posttest difference is about 15 points. The "gain" of the program group over what you might expect on the basis of pretest scores appears to be about 10 points (which, of course, is exactly what you set it up to be).

At this point it is worth reflecting on what you have done. If you had conducted a study using the nonequivalent group design, you could have obtained data like that which is described in the last table. You would notice that the groups appear to be different on the pretest, with the program group having the advantage. You would also notice that the difference between the groups is considerably larger on the posttest. In fact, you have simulated data for a nonequivalent group design and (whether you realize it or not) you have explicitly controlled the size of the correlation between the measures, the number of persons in each group, the amount of nonequivalence between the groups, the size of the program effect, and so on. One reason we run simulations of this type is to determine whether statistical analyses which we use give us accurate estimates of the effect of the effect of our programs. Since we specifically put in a 10 point program effect, we would expect that an accurate analysis would tell us that the effect was about that large. Let's find out if our analysis will work.

The typical strategy for analyzing pretest-posttest group designs is one which is based on the Analysis of Covariance (ANCOVA). Essentially, we want to look at the difference between the two groups on the posttest after we have "adjusted for" the initial differences between the groups as measured on our covariate - the pretest. The ANCOVA can be analyzed using multiple regression analysis (you should recognize that the ANCOVA is simply a subset of the multiple regression model - we would get exactly the same results for the analysis whether we use a computer program which does ANCOVA or one which does regression as long as we tell the regression program the correct model to estimate. We will generally use the regression command in MINITAB to conduct an Analysis of Covariance).

Before actually running the analysis you ought to look at plots of the data. Try some of the following:

MTB> Histogram C4
MTB> Histogram C5
MTB> Plot C5 * C4
MTB> Plot C5 * C4;
SUBC> symbol C6.

The Histogram commands show the distributions for the pretest and posttest. The first plot command shows the pre-post bivariate distribution. The second plot command shows this same distribution but uses different symbols for the program and comparison groups. Unfortunately, it may be difficult to see the program and comparison groups distinctly. As a side exercise, you might try to use the choose command to create separate columns of pre and posttest scores for the two groups so these distributions can be plotted separately. Now run the ANCOVA using the MINITAB regression command. The regression model form of the ANCOVA can be stated as:

Y = b0 + b1X + b2Z + eY

where

Y = the posttest (C5)

X = the pretest (C4)

Z = the assignment variable (C6)

b0 = the intercept of the comparison group line

b1 = slope of regression lines

b2 = the program effect

eY = random error

To do this analysis enter:

MTB> Regress C5 2 C4 C6

The computer will first print out the regression equation. The first number on the right of the equal sign is the intercept (b0)of the comparison group regression line (because you included C6, a dummy 0,1 variable in the regression, in effect two lines are being fit to the data, one for each group). The second number in the equation gives the slope (b1) for the program and comparison group regression lines (recall that the Analysis of Covariance assumes that the slopes of the two groups are equal - thus, we only simulate a single value). The third number after the equal sign is the estimate of the program effect (b2). Recall that you put in a program effect of 10 points. Is this value close? The table below the equation tests whether these three values are significantly different from zero. Since you are particularly interested in determining whether this analysis gives an accurate estimate of the program effect, you should look in the table for the line for variable C6, the "group " variable. The coefficient or estimate that was shown in the equation is repeated first on this line. Then the standard deviation of the estimate is shown. You know that you put in a program effect of 10 points. To see whether the estimate given by the analysis is accurate at a .05 level of significance, you have to construct a confidence interval for the estimate or coefficient. To do this, first multiply the standard deviation for that coefficient by 2 and then add and subtract this value from the estimate. For example, let's say the analysis tells you that the estimate or coefficient of the C6 variable is 11.3 and that the standard deviation is 0.5 units. Given this, the .05 confidence interval ranges from 10.3 to 12.3 (that is 11.3 plus or minus 2 times .5) . This analysis would be telling you that the best estimate of the program effect is 11.3 and that the odds are less than 5 out of 100 that the true effect is outside of that range. Recall that you have simulated that the true effect is ten points. In this example, you would wonder whether the analysis we used (ANCOVA) is working correctly because the program effect that you put in doesn't fall within the 95% confidence interval. When you construct the confidence interval do you find that 10 is included within it or not? Is the estimate of effect above or below 10?

If you have followed the instructions, you will find that most of the time you will not get an accurate estimate of the effect. In fact, ANCOVA yields biased estimates of effect for this type of nonequivalent group design. We do have better analysis strategies, but in order to understand them well it is important to understand why the ANCOVA strategy fails. You should try to get some idea of why ANCOVA fails by conducting simulations like the one above. Some variations are suggested below. The next exercise will present an analysis strategy which can often be used to obtain correct estimates of program effect.

• A key reason for the failure of ANCOVA is unreliability or error in the measures. You explicitly controlled the reliability by setting the standard deviations of the true and error scores in the Random/Normal statements. Try the simulation again setting the true score standard deviation to 10 and the error standard deviations to 1.

• Try the variation above but make the pretest more reliable than the posttest. To do this, use a small standard deviation for the pretest error (C2) and a larger one for posttest error (C3).

• Try to construct a simulation where the treatment group is disadvantaged relative to the comparison group. To do this, you will have to multiply the C6 variable by a negative number in the appropriate let statement above.

• Put in a negative program effect. To do this you will have to use a negative number where you used the +10 above. A negative effect implies that your program actually hurt rather than helped the program group relative to the comparison group.

When we use simulation techniques to investigate the accuracy of a statistical analysis we never rely on the results of a single run because the results could be wrong simply by chance. Typically, we would run the simulation several hundred times and average the estimates of program effect to see if the analysis is biased or not. Although that many runs is probably not feasible for you, it might be worthwhile for you to compare the estimates of effect that you got with estimates which others obtain. If you average these estimates, you should see more clearly that ANCOVA yields a biased estimate for the nonequivalent group design. Simulation Home Page