Intelligence Is More Than An Arbitrary Collection of Abilities
IQ tests are not like marble collections
There is a common fallacy among IQ critiques that intelligence tests merely consist of an arbitrary selection of tasks. Linda Gottfredson says that critiques who use this argument view intelligence as a marble collection.1 In this view, there isn’t an objective way to see which marble collection is the correct one, leading to the resulting IQ score to reflect psychometrician’s preferences or personal beliefs, rather than some objective fact about the test taker.
Intelligence researchers believe that intelligence is a unitary trait because a single factor, the g factor, arises from a statistical method called factor analysis. It is perhaps easier to understand that there is something that all cognitive tasks tap into called the g factor. We know this because scores on all cognitive tasks positively correlate with each other, meaning if someone is good at one reasoning tasks, they will also likely be good at other reasoning tasks. Charles Spearman discovered this principle and named it the indifference of the indicator. This is not a necessary fact of nature; it could have been the case that if you are good at mathematics, we would expect you to be worse at reading.
Factor analysis eliminates the unique components of cognitive tasks and determines what all tasks have in common, namely the g factor. You can accurately measure g using any collection of diverse cognitive tasks. The tasks selected to be on what are called intelligence tests such as the Wechsler Adult Intelligence Scale (WAIS), are selected because they more accurately and quickly measure intelligence.
If different intelligence tests were just different groups of arbitrary selected tasks, then we wouldn’t expect them to correlate particularly well. If intelligence tests measured the g factor, then we would expect to see two tests correlate highly. This is what we see. From In The Know2 by intelligence researcher Russel T. Warne:
The authors of one of the earliest studies of this type (Stauffer, Ree, & Carretta, 1996) gave 10 common pencil-and-paper intelligence subtests and a series of 25 computerized tasks called the Cognitive Abilities Measurement (CAM) battery. The CAM battery was intended to measure processing speed, working memory, declarative knowledge (i.e., information that the person can state that they know), and procedural knowledge (which is the knowledge of how to complete tasks). The intelligence subtests and the CAM battery each produced a g factor that correlated almost perfectly (r = .950 to .994).
In a more recent study (Keith, Kranzler, & Flanagan, 2001), a team of psychologists administered two intelligence tests, the Woodcock–Johnson III (WJ-III) and Cognitive Assessment System (CAS), to a sample of 155 children. Keith et al. (2001) used factor analysis to identify each test’s g factor and found that the correlation between the two was r = 0.98 (p. 108). What makes this result more remarkable is that the CAS was created by psychologists who did not intend to create a test that measured g. As a result, most of the tasks on the CAS do not resemble tasks on the WJ-III at all. Nevertheless, the CAS still produced a g factor, and the CAS’s g factor is identical to the g on the WJ-III test.
Floyd, Reynolds, Farmer, and Kranzler (2013) coconducted a more elaborate follow-up with six samples of children or adolescents that took two intelligence tests out of a group of five tests: the Differential Ability Scales (DAS), DAS II, Wechsler Intelligence Scale for Children (WISC) IV, WISC-III, WJ-III, and Intelligence Is an Arbitrary Collection of Test Tasks 33 Kaufman Assessment Battery for Children II. 2 The sample sizes ranged from 83 to 200, and the correlations between these tests’ g factors ranged from r = .89 to r = 1.00 and averaged r = .95. Again, this shows that the g factors produced by different tests are largely identical. Additionally, Floyd et al. (2013) found that the similar Stratum II factors that each test produced were largely the same (e.g., the processing speed factor on one test was highly correlated with another test’s processing speed factor). This means that Stratum II abilities in the Cattell– Horn–Carroll model can also have a high degree of similarity across tests.
A team headed by psychologist Wendy Johnson found similar results with even larger samples. In a group of 436 adults who took three test batteries (the Comprehensive Ability Battery, the Hawaii Battery supplemented with some additional tests, and the Weschler Adult Intelligence Scale), the different g factors from these test batteries all correlated r = .99 or r = 1.00 (W. Johnson, Bouchard, Krueger, McGue, & Gottesman, 2004). The researchers summed up their findings by saying that across these three tests there was, “ Just one g” (p. 95). Johnson and her colleagues followed up this work with another study of 500 Dutch seamen. With four different tests (a test battery for the Royal Dutch Navy, a battery of 12 subtests from the Twente Institute of Business Psychology, the General Aptitude Test Battery, and the Groninger Intelligence Test), the correlations of their g factors were all between r = .95 and r = 1.00 (W. Johnson, te Nijenhuis, & Bouchard, 2008, p. 88).3
So, we can see that intelligence tests are measuring some underlying unitary ability, rather than the ability of test takers on an arbitrary collection of tasks. But what use is this information? How do we know that it is in fact measuring anything like what we talk about when we say “intelligence?”
People who are intelligent are good at mathematics, science, writing, reading, and philosophical reasoning. Intelligent people are often better educated and academically more successful. I think few would have objections to this outside of the context of discussing IQ, but when entering into debates about the value of IQ or true intelligence, people raise objections about what “intelligence” really means.
I think that we can sidestep this by saying that “IQ tests”—which should probably be called tests of cognitive ability—measure cognitive ability. Cognitive ability correlates with a lot of things people talk about when they talk about intelligence. This is a fine way to go about discussing the issue. We wouldn’t expect intelligence, as it is used in common discussions, to correlate perfectly with the scores on tests of cognitive ability. We do know it highly correlates, so it is worth discussing. Cognitive ability matters.
Gottfredson, L. S. (2009). Logical fallacies used to dismiss the evidence on intelligence testing. In R. P. Phelps (ed.), Correcting fallacies about educational and psychological testing (pp. 11–65). Washington, DC: American Psychological Association.
Warne, R. (2020). In the Know: Debunking 35 Myths about Human Intelligence. (pp.33-34) Cambridge: Cambridge University Press. doi:10.1017/9781108593298
A fifth test, the Cattell Culture Fair Test, had ag factor that had a much weaker correlation with the other g factors: r = .77 to r = .96. This is almost certainly because this test consists of four extremely similar tasks, instead of– as on the other tests – a diverse set of tasks. A narrow variety of tasks on a test means that factor analysis cannot fully remove the unique aspects of each task when identifying a g factor. This lowers the correlation of the Cattell Culture Fair Test’s g factor with the g factors derived from other tests.W. Johnson et al.’s (2008) example shows a limitation of factor analysis: without a broad range of tasks on a test, theg factor identified in a test will not represent the entire breadth of intelligence. This is a well-known shortcoming of factor analysis (Jensen, 1998) and why the best intelligence tests include several different types of tasks for examinees to do.