U n i v e r s i d a d d e B a r c e l o n a

Departamento de Personalidad, Evaluación y Tratamientos Psicológicos

Cursos asistidos por ordenador a través de i n t e r n e t
jgutierrez@psi.ub.es


 

Experimental Design
Fuente: Research Methods Knowledge Base (http://trochim.human.cornell.edu/kb/)


Experimental designs are often touted as the most "rigorous" of all research designs or, as the "gold standard" against which all other designs are judged.   In one sense, they probably are.  If you can implement an experimental design well (and that is a big "if" indeed), then the experiment is probably the strongest design with respect to internal validity.  Why?   Recall that internal validity is at the center of all causal or cause-effect inferences.  When you want to determine whether some program or treatment causes some outcome or outcomes to occur, then you are interested in having strong internal validity.  Essentially, you want to assess the proposition:

If X, then Y

or, in more colloquial terms:

If the program is given, then the outcome occurs

Unfortunately, it's not enough just to show that when the program or treatment occurs the expected outcome also happens.  That's because there may be lots of reasons, other than the program, for why you observed the outcome.  To really show that there is a causal relationship, you have to simultaneously address the two propositions:

If X, then Y

and

If not X, then not Y

Or, once again more colloquially:

If the program is given, then the outcome occurs

and

If the program is not given, then the outcome does not occur

If you are able to provide evidence for both of these propositions, then you've in effect isolated the program from all of the other potential causes of the outcome.  You've shown that when the program is present the outcome occurs and when it's not present, the outcome doesn't occur.  That points to the causal effectiveness of the program.

Think of all this like a fork in the road.  Down one path, you implement the program and observe the outcome.  Down the other path, you don't implement the program and the outcome doesn't occur.  But, how do we take both paths in the road in the same study?  How can we be in two places at once? Ideally, what we want is to have the same conditions -- the same people, context, time, and so on -- and see whether when the program is given we get the outcome and when the program is not given we don't.  Obviously, we can never achieve this hypothetical situation.   If we give the program to a group of people, we can't simultaneously not give it!   So, how do we get out of this apparent dilemma?

Perhaps we just need to think about the problem a little differently.   What if we could create two groups or contexts that are as similar as we can possibly make them?  If we could be confident that the two situations are comparable, then we could administer our program in one (and see if the outcome occurs) and not give the program in the other (and see if the outcome doesn't occur).  And, if the two contexts are comparable, then this is like taking both forks in the road simultaneously!   We can have our cake and eat it too, so to speak.

That's exactly what an experimental design tries to achieve.  In the simplest type of experiment, we create two groups that are "equivalent" to each other.  One group (the program or treatment group) gets the program and the other group (the comparison or control group) does not.  In all other respects, the groups are treated the same.  They have similar people, live in similar contexts, have similar backgrounds, and so on.  Now, if we observe differences in outcomes between these two groups, then the differences must be due to the only thing that differs between them -- that one got the program and the other didn't.

OK, so how do we create two groups that are "equivalent"?   The approach used in experimental design is to assign people randomly from a common pool of people into the two groups.  The experiment relies on this idea of random assignment to groups as the basis for obtaining two groups that are similar.  Then, we give one the program or treatment and we don't give it to the other.  We observe the same outcomes in both groups.

The key to the success of the experiment is in the random assignment.   In fact, even with random assignment we never expect that the groups we create will be exactly the same.  How could they be, when they are made up of different people?   We rely on the idea of probability and assume that the two groups are "probabilistically equivalent" or equivalent within known probabilistic ranges.

So, if we randomly assign people to two groups, and we have enough people in our study to achieve the desired probabilistic equivalence, then we may consider the experiment to be strong in internal validity and we probably have a good shot at assessing whether the program causes the outcome(s).

But there are lots of things that can go wrong.  We may not have a large enough sample.  Or, we may have people who refuse to participate in our study or who drop out part way through.  Or, we may be challenged successfully on ethical grounds (after all, in order to use this approach we have to deny the program to some people who might be equally deserving of it as others). Or, we may get resistance from the staff in our study who would like some of their "favorite" people to get the program.  Or, they mayor might insist that her daughter be put into the new program in an educational study because it may mean she'll get better grades.

The bottom line here is that experimental design is intrusive and difficult to carry out in most real world contexts.  And, because an experiment is often an intrusion, you are to some extent setting up an artificial situation so that you can assess your causal relationship with high internal validity.  If so, then you are limiting the degree to which you can generalize your results to real contexts where you haven't set up an experiment.  That is, you have reduced your external validity in order to achieve greater internal validity.

In the end, there is just no simple answer (no matter what anyone tells you!).  If the situation is right, an experiment can be a very strong design to use.   But it isn't automatically so.  My own personal guess is that randomized experiments are probably appropriate in no more than 10% of the social research studies that attempt to assess causal relationships.

 

 

Although there are a great variety of experimental design variations, we can classify and organize them using a simple signal-to-noise ratio metaphor. In this metaphor, we assume that what we observe or see can be divided into two components, the signal and the noise (by the way, this is directly analogous to the true score theory of measurement).

truescor.gif (4325 bytes)

The figure, for instance, shows a time series with a slightly downward slope. But because there is so much variability or noise in the series, it is difficult even to detect the downward slope. When we divide the series into its two components, we can clearly see the slope.

In most research, the signal is related to the key variable of interest -- the construct you're trying to measure, the program or treatment that's being implemented. The noise consists of all of the random factors in the situation that make it harder to see the signal -- the lighting in the room, local distractions, how people felt that day, etc. We can construct a ratio of these two by dividing the signal by the noise. In research, we want the signal to be high relative to the noise. For instance, if you have a very powerful treatment or program (i.e., strong signal) and very good measurement (i.e., low noise) you will have a better chance of seeing the effect of the program than if you have either a strong program and weak measurement or a weak program and strong measurement.

With this in mind, we can now classify the experimental designs into two categories: signal enhancers or noise reducers. Notice that doing either of these things -- enhancing signal or reducing noise -- improves the quality of the research. The signal-enhancing experimental designs are called the factorial designs. In these designs, the focus is almost entirely on the setup of the program or treatment, its components and its major dimensions. In a typical factorial design we would examine a number of different variations of a treatment.

There are two major types of noise-reducing experimental designs: covariance designs and blocking designs. In these designs we typically use information about the makeup of the sample or about pre-program variables to remove some of the noise in our study.