Validity and Reliability (more about validity)

Most of the definitions here from from:

T08. W. M. K. Trochim, “Research methods knowledge base,” Online, last accessed August 2008. [Online]. Available: http://www.socialresearchmethods.net/kb/index.php

There are other definitions, a longer term task is to review a number of other textbooks on experimental design and crossreference!

Reliability and Validity

Reliability and validity apply to measurements. A hypothesis is not a measurement (surely). So to say that a hypothesis is reliable and valid is not meaningful. These refer to the measurements.

There are four main types of validity: internal validity, construct validity, external validity and conclusion validity.

The first type of validity only applies where there is a causal relationship and the other types of validity apply to all types of experiment.

Internal Validity

Internal validity is “you have evidence that what you did in the study (i.e., the program) caused what you observed (i.e., the outcome) to happen. It doesn’t tell you whether what you did for the program was what you wanted to do or whether what you observed was what you wanted to observe — those are construct validity concerns. It is possible to have internal validity in a study and not have construct validity.”.

The single group threats are:

History Threat
Maturation Threat
Testing Threat
Instrumentation Threat
Mortality Threat
Regression Threat

External validity

“External validity refers to the approximate truth of conclusions the involve generalizations.”

There are three major threats to external validity because there are three ways you could be wrong — people, places or times. Your critics could come along, for example, and argue that the results of your study are due to the unusual type of people who were in the study. Or, they could argue that it might only work because of the unusual place you did the study in (perhaps you did your educational study in a college town with lots of high-achieving educationally-oriented kids). Or, they might suggest that you did your study in a peculiar time. For instance, if you did your smoking cessation study the week after the Surgeon General issues the well-publicized results of the latest smoking and cancer studies, you might get different results than if you had done it the week before.

Construct validity

“Construct validity refers to the degree to which inferences can legitimately be made from the operationalizations in your study to the theoretical constructs on which those operationalizations were based.”

Are we measuring what we really think we are measuring?

Threats include:

Inadequate Preoperational Explication of Constructs
Mono-Operation Bias (did you only use one version of the treatment?)
Mono-Method Bias (did you cross-validate your measurements?)
Interaction of Different Treatments
Interaction of Testing and Treatment
Restricted Generalizability Across Constructs
Confounding Constructs and Levels of Constructs

There are also “social threats”: hypothesis guessing, evaluator apprehension and experimenter expectencies.

Conclusion Validity

“Conclusion validity is the degree to which conclusions we reach about relationships in our data are reasonable.” [T08]

Ok, there is a bit of difference here.

Distinguish from internal validity which is a measure that the cause was what we expect it to be. So we could have an experiment with conclusion validity (there is some relationship measured between our variables) but it is not interally valid because the explanation for the relationship is due to some external uncontrolled factor.

So in this case. We have the goal of “Drawing a valid conclusion about the relationship between typing styles and identity where the user is typing a strong password.”

Threats to conclusion validity include:

Noise
1. low reliability of measures
2. poor reliability of treatment implementation
3. Noise that is caused by random irrelevancies in the setting
4. random heterogeneity of respondents. If you have a very diverse group of respondents, they are likely to vary more widely on your measures or observations.
Statistical power
1. fishing and the error rate problem
2. violated assumptions of statistical tests.

Written by opusiti

September 3, 2008 at 12:57 am

Posted in experimental methods

Opus iti — my little work