The Randomized Experimental Design

In this exercise you will simulate a simple pretest-posttest randomized experimental design. This design is of the form

R  O  X  O
R  O       O

and thus has a pretest, a posttest, and two groups that have been randomly assigned. Note that in randomized designs a pretest is technically not required although one is often included as a covariate to increase the precision of program effect estimates. We will assume that we are comparing a program and comparison group (instead of two programs or different levels of the same program),

To begin, get into MINITAB in the usual manner. You should see the MTB prompt (which looks like this MTB>). Now you are ready to enter the following commands.

You will create two hypothetical tests as in previous exercises. Here, one test will be considered the pretest, the other the posttest. Assume that both tests measure the same true ability and that they each have their own unreliability or error:

MTB> Random 500 C1;
SUBC> Normal 50 5.
MTB> Random 500 C2;
SUBC> Normal 0 5.
MTB> Random 500 C3;
SUBC> Normal 0 5.

Here C1 represents the true ability on the tests for 500 people. C2 and C3 represent random error for the pretest and posttest respectively. Notice that the mean ability score for the tests will initially be set to 50 test score units. Next, construct the observed test scores:

MTB> Add C1 C2 C4.
MTB> Add C1 C3 C5.

You should notice that each test has about equal amounts of true score and error (because all three Random/Normal statements above use a 5 unit standard deviation). Now, name the columns:

MTB> Name C1 = 'true' C2 ='x error' C3 ='y error' C4 = 'pretest' C5 = 'posttest'

So far you have created a pretest and post for 500 hypothetical persons. Next, you need to randomly assign half of the people to the treated group and half to the control. One way to do this is to create a new random number for each individual. You will then use this variable to assign cases randomly. Since we want equal size groups (250 in each) you can assign all persons less than or equal to the median on this random number to one group, and all above the median to the other. Here is the way to do this:

MTB> random 500 C6;
SUBC> normal 0 5.

creates the random assignment number

MTB> let k1=min(C6)
MTB> let k2=median(C6)
MTB> let k3=max(C6)

gets the minimum, median and maximum values on this random assignment number. And

MTB> code (k1:k2) 0 (k2:k3) 1 c6 c7

creates the two equal size groups. To confirm that they are equal in size, do

MTB> table c7

and you should see that there are 250 0's and 1's.

Now, to be consistent with other exercises and to get rid of the unnecessary variable, put C7 into C6 and erase C7

MTB> let C6=C7 MTB> erase C7

Then, name C6

MTB> name C6='group'

Try the following three statements to verify that you have two groups of 250 persons:

MTB> Sign C6
MTB> Histogram 'Group'

Each of these presents slightly different information but both verify that you have two equal sized groups.

Now that you have created two groups, let's say that your treatment had an effect. To put in an effect you have to create a posttest score that has something added into it for those people who received the treatment, and does not add this in for the control cases. Remember that to create the posttest originally, you just added together the True Score and Posttest Error for each individual. To create the posttest with a 10-point treatment effect built in, you would use the following formula

Y = T + eY + (10* Z)

where Z is the 0,1 group variable (C6) you just created. To do this in MINITAB do

MTB> let c7=c1 + c3 + (10*c6)
MTB> name c7='postgain'

Now, c5 is the posttest when there is no treatment effect and c7 is the posttest when there is a 10-point treatment effect.

At this point, it's worth stopping and thinking about what you've done. You created a random True Score (C1) and added it to independent error (C2) to create a pretest (C4) and to other independent error (C3) to create a posttest (C5). Then you randomly assigned half of the people to a treatment (C6=1) and to a control (C6=0) condition. Finally, you created a posttest that has a 10-point treatment effect in it (C7). If this were a real study (and not a simulation), you would observe only three variables: the pretest (X, C3), the group (Z, C6) and the posttest with a treatment effect in it (Y, C7).

Let's imagine how we might analyze the data using these three variables, in order to see whether the treatment has an effect. One of the first things we might do is to look at some simple distributions for the pretest and posttest. First, look at some histograms:

MTB> Histogram 'pretest'.

MTB> Histogram 'postgain'.

MTB> Histogram 'pretest';
SUBC> MidPoint;
SUBC> Bar 'group'.

MTB> Histogram 'postgain';
SUBC> MidPoint;
SUBC> Bar 'group'.

The first two commands show the histograms for all 500 cases while the last two show histograms for the two groups separately. Can you see that the two groups differ on average on the posttest?

Now, look at the bivariate distribution

MTB> Plot 'postgain' * 'pretest';
SUBC> Symbol 'group'.

You should see that the treated group has lots more high posttest scorers than the control group.

Now, look at some descriptive statistics tables.

MTB> Table 'Group';
SUBC> Means 'pretest' 'postgain';
SUBC> StDev 'pretest' 'postgain';
SUBC> N 'pretest' 'postgain'.

Here you should see clearly that while the two groups are very similar in average value on the pretest, they differ by nearly 10 points on the posttest.

In a randomized experiment, you technically don't need to measure a pretest. You could have the design:

R    X  O
R         O

If you did, all you would be able to do to look for treatment effects is to compare the groups on the posttest. This might best be accomplished by conducting a simple t-test on the posttest

MTB> TwoT 95.0 c7 c6;
SUBC> alternative 0.

You can get the same result by using regression analysis with the following formula

Y = b0 + b1Z + eY


Y = posttest

Z = the 0,1 assignment variable

b0 = posttest mean of the comparison group

b1 = difference between the program and comparison group posttest means

eY = random error

This model can be run in MINITAB using

MTB> Regress 'postgain' 1 'Group'.

This regresses the posttest score onto the 0,1 group variable Z. The results for both the t-test and regression versions should be identical, but you have to know where to look to see this. In the t-test results, the last line will say in it 'T=' and report a t-value. The way you set up the simulation, this t-value should be negative in value (because it tests the control-treatment group difference which should be negative because the treatment group mean is larger by about ten points). Now look at the regression table under the heading 't-ratio'. The t-ratio for Group should be the same as the t-test result (except that the sign is reversed).

In general, the regression analysis method of testing for differences easier to use and interpret than the t-test results. In the regression results, b0 is the coefficient for the Constant and b1 is the coefficient for Group. The b0 in this case is actually the average posttest value for the control group. The b1 is the amount you add to the control group average to get the treatment group posttest average, that is, the estimate of the difference between the two groups on the posttest. This should be somewhere around 10 points. Both coefficients are tested with a t-test. The p-value tells you the probability that the estimated coefficient was obtained by chance.

So far, all you've done is to look at the difference between groups on the posttest. But you also have a pretest measured. How does this pretest help in analyzing the data? In a randomized experiment, the pretest (or any other covariate) is used to reduce variability in the posttest that is unrelated to the treatment. If you reduce posttest variability in this way, it should be easier to see a treatment effect. In other terms, for the very same posttest, including a good pretest should yield a higher t-value associated with the coefficient for differences between groups. To see this, you have to run a regression model that includes the pretest values in it. This model is:

Y = b0 + b1X + b2Z + eY


Y = the posttest

X = the pretest

Z = the assignment variable

b0 = the intercept of the comparison group line

b1 = slope of regression lines

b2 = the program effect

eY = random error

You can run this in MINITAB by doing:

MTB> Regress 'postgain' 2 'pretest' 'Group'.

Now, if you look at the t-ratio associated with the Group variable you should see that it is higher than it was in the original regression equation you ran. Even though you used the exact same posttest variable, you are able to see the treatment effect more clearly (i.e., got a higher t-value) because you included a good covariate (the pretest) that reduced some of the noise in the posttest that might obscure the treatment effect.

At this point you should be convinced of the following:

Downloading the MINITAB Commands

If you wish, you can download a text file that has all of the MINITAB commands in this exercise and you can run the exercise simply by executing this macro file. To find out how to call the macro check the MINITAB help system on the machine you're working on. You may want to run the exercise several times -- each time will generate an entirely new set of data and slightly different results. Click here if you would like to download the MINITAB macro for this simulation.
Simulation Home Page
Copyright 1996, William M.K. Trochim