|
Experimental designs are often touted as the most
"rigorous" of all research designs or, as the "gold standard" against
which all other designs are judged. In one sense, they probably are. If you
can implement an experimental design well (and that is a big "if" indeed), then
the experiment is probably the strongest design with respect to internal
validity. Why? Recall that internal validity is at the center of all
causal or cause-effect inferences. When you want to determine whether some program
or treatment causes some outcome or outcomes to occur, then you are interested in
having strong internal validity. Essentially, you want to assess the proposition:
If X, then Y
or, in more colloquial terms:
If the program is given, then the outcome occurs
Unfortunately, it's not enough just to show that when the program or treatment occurs
the expected outcome also happens. That's because there may be lots of reasons,
other than the program, for why you observed the outcome. To really show that there
is a causal relationship, you have to simultaneously address the two propositions:
If X, then Y
and
If not X, then not Y
Or, once again more colloquially:
If the program is given, then the outcome occurs
and
If the program is not given, then the outcome does not
occur
If you are able to provide evidence for both of these propositions, then
you've in effect isolated the program from all of the other potential causes of the
outcome. You've shown that when the program is present the outcome occurs and when
it's not present, the outcome doesn't occur. That points to the causal effectiveness
of the program.
Think of all this like a fork in the road. Down one path, you
implement the program and observe the outcome. Down the other path, you don't
implement the program and the outcome doesn't occur. But, how do we take both
paths in the road in the same study? How can we be in two places at once? Ideally,
what we want is to have the same conditions -- the same people, context, time, and so on
-- and see whether when the program is given we get the outcome and when the program is
not given we don't. Obviously, we can never achieve this hypothetical situation.
If we give the program to a group of people, we can't simultaneously not give it!
So, how do we get out of this apparent dilemma?
Perhaps we just need to think about the problem a little differently.
What if we could create two groups or contexts that are as similar as we can
possibly make them? If we could be confident that the two situations are comparable,
then we could administer our program in one (and see if the outcome occurs) and not give
the program in the other (and see if the outcome doesn't occur). And, if the two
contexts are comparable, then this is like taking both forks in the road simultaneously!
We can have our cake and eat it too, so to speak.
That's exactly what an experimental design tries to achieve. In the
simplest type of experiment, we create two groups that are "equivalent" to each
other. One group (the program or treatment group) gets the program and the other
group (the comparison or control group) does not. In all other respects, the groups
are treated the same. They have similar people, live in similar contexts, have
similar backgrounds, and so on. Now, if we observe differences in outcomes between
these two groups, then the differences must be due to the only thing that differs between
them -- that one got the program and the other didn't.
OK, so how do we create two groups that are "equivalent"?
The approach used in experimental design is to assign people randomly from a common pool
of people into the two groups. The experiment relies on this idea of random
assignment to groups as the basis for obtaining two groups that are similar. Then,
we give one the program or treatment and we don't give it to the other. We observe
the same outcomes in both groups.
The key to the success of the experiment is in the random assignment.
In fact, even with random assignment we never expect that the groups we create will
be exactly the same. How could they be, when they are made up of different people?
We rely on the idea of probability and assume that the two groups are
"probabilistically equivalent" or equivalent within known probabilistic ranges.
So, if we randomly assign people to two groups, and we have enough people
in our study to achieve the desired probabilistic equivalence, then we may consider the
experiment to be strong in internal validity and we probably have a good shot at assessing
whether the program causes the outcome(s).
But there are lots of things that can go wrong. We may not have a
large enough sample. Or, we may have people who refuse to participate in our study
or who drop out part way through. Or, we may be challenged successfully on ethical
grounds (after all, in order to use this approach we have to deny the program to some
people who might be equally deserving of it as others). Or, we may get resistance from the
staff in our study who would like some of their "favorite" people to get the
program. Or, they mayor might insist that her daughter be put into the new program
in an educational study because it may mean she'll get better grades.
The bottom line here is that experimental design is intrusive and
difficult to carry out in most real world contexts. And, because an experiment is
often an intrusion, you are to some extent setting up an artificial situation so that you
can assess your causal relationship with high internal validity. If so, then you are
limiting the degree to which you can generalize your results to real contexts where you
haven't set up an experiment. That is, you have reduced your external validity in
order to achieve greater internal validity.
In the end, there is just no simple answer (no matter what anyone tells
you!). If the situation is right, an experiment can be a very strong design to use.
But it isn't automatically so. My own personal guess is that randomized
experiments are probably appropriate in no more than 10% of the social research studies
that attempt to assess causal relationships. |