Wednesday, April 25, 2012

The SAT Bell Curve


In The Mismeasure of Man, Stephen Jay Gould wrote an extended criticism of the quantification of general intelligence with factor analysis. I half expected public rebukes of Gaussian normal curves following the publication of The Bell Curve by Richard Herrnstein and Charles Murray. When the IQ distributions of whites and African Americans appear on the same graph in proportion to each group’s population, it can evoke a sense that one bell curve is physically dominating or even raping the other.
What the graph actually illustrates is that about as many dim white people live among us as dim black people because the graphs overlap at the left tail. This point can escape attention when this data, which Herrnstein and Murray borrowed from the National Longitudinal Survey of Youth, appears under the assumption of equal group size.
I decided to attempt to replicate these graphs with SAT data. Like it or not, the SAT is a sort of intelligence test, more so than the ACT exam that college applicants in the American heartland so commonly take. I shall quote extensively from a paper by Satoshi Kanazawa because he fairly succinctly summarized the case for the SAT as an intelligence test.
The SAT has a significant advantage as a proxy IQ test over other standardized academic tests, such as the American College Testing (ACT), an alternative university admissions test, or the National Assessment of Educational Progress (NAEP), administered to representative samples of fourth and eighth graders in public schools every year. While the SAT measures the students' critical reasoning ability, both the ACT and the NAEP measure their learned knowledge of academic subjects. This distinction between the SAT and the ACT is well recognized by both testing services…. A principal component analysis of SAT and ACT scores shows that the former load on two factors (verbal and quantitative) while the latter load on four additional factors (information, English, natural sciences, and social studies). Frey and Detterman (2004) show that the correlation between SAT scores and g is .857 (corrected for nonlinearity) when the measure of g is the Armed Services Vocational Aptitude Battery, and it is .72 (corrected for restricted range) when the measure of g is Raven's Advanced Progressive Matrices.

This is not to deny the complicating nuances of the research. After all, a genome-wide association study of intelligence determined that the examined single nucleotide polymorphisms of our DNA influenced the fluid intelligence, which was partially derived from Raven’s Matrices, more than crystallized-type intelligence, which tests of acquired knowledge (like vocabulary) can measure. However, the mysterious Flynn effect of rising intelligence in the industrialized world has more rapidly elevated Raven’s Matrices scores than other intelligence tests.
SAT data can construct score distribution graphs for racial groups but only for four years in the 1980’s. In the case of black and white students, the years in question still likely reflect the present situation because the rapid decline in the black-white score gap occurred just prior to these years, and these score differences have, more or less, persisted since then. PhotobucketPhotobucketPhotobucket Though the verbal and writing subtests might not elicit a Pavlovian reaction to bell curves, this seems to result from the test range chopping the black students’ curves into wedges. If the true IQ distribution of African Americans follows a bell-shaped Gaussian curve, then an artificial minimum SAT score could be misrepresenting the full ability spectrum of black students.

In 1996, SAT score distributions “recentered” to reflect a new 1990 reference group that replaced the old 1941 reference group. Prior to the recentering, the greater decline of average verbal scores relative to mathematics subtest scores had concerned the College Board. Recentering also lowered the mathematics standard deviation to make black, Hispanic, and female students “appear less below average.” The following graph shows that recentering increased verbal scores even more for the black students in the 1990 reference group, giving them a bell-shaped distribution. Photobucket This does not convince me that the same occurred for actual post-recentering black SAT scores because the black-white gap remained virtually unchanged.
Certainly, the SAT verbal and math subtest distribution for the general population shifted higher, as shown below: PhotobucketPhotobucketPhotobucket Notice that the percentages with the highest scores continued to increase even after the recentering, especially on the mathematics subtest.

Shifting all groups higher could hurt the black average verbal and writing SAT scores by revealing a full bell curve and thereby allowing the artificial floor to fall out from under the worst students, unless black performance improved simultaneously, causing the two phenomena to mask each other. However, if African Americans suddenly attained an extended bell-shaped distribution, I would expect an increase in their score variance on the verbal subtest, which would be reflected in an increased standard deviation. On the contrary, black students have long held the lowest standard deviations, and the graph of this quantity has been equally flat for math and verbal subtests.
The following graphs show the black and white score distributions without the population sizes being held equal. At the time, African Americans were the largest minority, and similar graphs for Hispanics and Asians make the respective groups’ curves almost imperceptible puddles, so I shall forgo posting them. PhotobucketPhotobucketPhotobucket As the standard deviations graph above already revealed, Asians comprise the most heterogeneous group, and I find their distribution to be the most fascinating. PhotobucketPhotobucketPhotobucket The most obvious characteristic of the verbal and writing graphs are the bimodal distributions, which one would expect in a group for whom English frequently is the second language. This matches the writing subtest distribution for Hispanics below, but the Asian verbal subtest graph has one other aspect lacking in the Hispanic counterpart. Despite the large number of poor performers on the left side, the most elite performers of the Asian graph appear to present in roughly equal proportion to those of the white graph. In fact, a slightly higher proportion of Asians achieved the highest two verbal score ranges compared to the white group for each of the four years, and these were years prior to most of the Asian score improvement that I previously discussed.

On the mathematics subtest graph, the Asian distribution extends noticeably more into the higher ranges than the white distribution. Thus, a much greater proportion of Asians achieve the highest range of math performance, a point that I shall also extend to men.
PhotobucketPhotobucketPhotobucket A 2006 no-confidence vote compelled Larry Summers to resign from his position as president of Harvard because he gave a speech in which he said the following:
There are three broad hypotheses … with respect to the presence of women in high-end scientific professions…. The second is what I would call different availability of aptitude at the high end…. It does appear that on many, many different human attributes—height, weight, propensity for criminality, overall IQ, mathematical ability, scientific ability—there is relatively clear evidence that whatever the difference in means—which can be debated—there is a difference in the standard deviation, and variability of a male and a female population…. Even small differences in the standard deviation will translate into very large differences in the available pool substantially out.

The standard deviations graph above validates Summers’ observation about differing aptitude variability between the sexes, and this is especially the case on the mathematics subtest. The following graphs illustrate just how much the standard deviation difference in math (plus a difference in mean) translates into substantially more male students in the highest aptitude levels. PhotobucketPhotobucketPhotobucket Dr. Summers, on behalf of Harvard University, I would like to offer you your job back.



ResearchBlogging.org






Davies G, Tenesa A, Payton A, Yang J, Harris SE, Liewald D, Ke X, Le Hellard S, Christoforou A, Luciano M, McGhee K, Lopez L, Gow AJ, Corley J, Redmond P, Fox HC, Haggarty P, Whalley LJ, McNeill G, Goddard ME, Espeseth T, Lundervold AJ, Reinvang I, Pickles A, Steen VM, Ollier W, Porteous DJ, Horan M, Starr JM, Pendleton N, Visscher PM, & Deary IJ (2011). Genome-wide association studies establish that human intelligence is highly heritable and polygenic. Molecular psychiatry, 16 (10), 996-1005 PMID: 21826061

Hiscock, M. (2007). The Flynn effect and its relevance to neuropsychology Journal of Clinical and Experimental Neuropsychology, 29 (5), 514-529 DOI: 10.1080/13803390600813841

Kanazawa, S. (2006). IQ and the wealth of states Intelligence, 34 (6), 593-600 DOI: 10.1016/j.intell.2006.04.003

13 comments:

Anonymous said...

What do you know about mutational loads and whether they vary across races?

nooffensebut said...

I don't know about genetic load (deleterious mutations), but recent allele fixations differ between Africans, Europeans, and Asians, per Coop et al. I would expect the putting off of childbirth until later in life to have a significant influence on population differences in genetic load.

J said...

Differences in genetic load is the most plausible explanation of most differences in general intelligence, i.e. people with fewer deleterious mutations have higher intelligence. The genetic load theory is compatible with everything we know about g (e.g. heritability, correlations with health). Evolutionarily, it makes much more sense than the idea that intelligence is selected for, in which case everybody would be smart.

Anonymous said...

Can you point to any good references for genetic load in relation to intelligence, behavior, etc.?

Steve Sailer said...

Cochran is working on the genetic load idea.

Steve Sailer said...

Wow, those are creepy graphs. The undulating Blob ...

Steve Sailer said...

Something you might consider doing is calculating just how small the number of ultra-elite black scorers is. How many 1500+ blacks are left after Harvard, Yale, Princeton, and Stanford get their bites at the apple?

Steve Sailer said...

Another myth that won't die is that we could just switch affirmative action from race to class. But, as far as I can tell, virtually all the 1300+ black sutdents come from the middle class or above, or mixed race homes, or foreign elites.

brainsidea said...

Interesting data set.

May I suggest to label axes?

Also, concerning the male-female difference, you write that 'The standard deviations graph above validates Summers’ observation'. Summers' observation was that 'there is a difference in the standard deviation, and variability of a male and a female population'.

1) What do you mean by a standard deviations graph? I cannot find a single value (or graphical representation) for the standard deviations of men and women in this post.

2) Looking at the male and female plots makes me doubt whether male and female standard deviations will differ. The main difference seems to be one of means. You could check that with a t-test (comparing means) and a standard deviation test (comparing standard deviations).

3) What is the directionality? Are women worse on these tests, or are the tests less geared towards women? By agreeing with Summers, you suggest it is the former. Why?

nooffensebut said...

The graphs labeled "verbal/critical reading standard deviations" and "math standard deviations" graph standard deviations for men, women, and racial/ethnic groups. Men always had higher variance/standard deviations than women on the math subtest. This is reflected in a shorter and wider math bell curve for men. Summers was correct because far fewer women than men scored at the highest levels of the math subtest. I have never heard a case made that the math subtest has a gender bias, but a 2010 study determined that it does not have a cultural bias against black students, but the verbal subtest does, even though the black-white gap is larger on the math subtest.

Anonymous said...

I really appreciate what you post. You have a new subscriber now.

Anonymous said...

J said:

Differences in genetic load is the most plausible explanation of most differences in general intelligence, i.e. people with fewer deleterious mutations have higher intelligence. The genetic load theory is compatible with everything we know about g (e.g. heritability, correlations with health). Evolutionarily, it makes much more sense than the idea that intelligence is selected for, in which case everybody would be smart.


There is a healthy dose of nonsense there.

The brain is a very expensive organism, developmentally and operationally. There will be an optimum brain size that balances the cost of increasing brain size (and in any social species there will always be selection for increase in brain size to allow for exploitation of other members of your species as well as a conversion of some behaviors from learned to innate) against the benefits of that increased brain size in terms of reproductive success. In addition, we can expect the size of the selection effect to be correlated with the average group size. Thus selection for intelligence goes up as different groups start engaging in large-scale civilizations, like the Indians, the Chinese and the Caucasians.

Also, have you thought about how the argument about genetic load applies to other species? Why do those with lower genetic load not have higher intelligence than humans?

random mutation said...

J said:

Differences in genetic load is the most plausible explanation of most differences in general intelligence, i.e. people with fewer deleterious mutations have higher intelligence. The genetic load theory is compatible with everything we know about g (e.g. heritability, correlations with health). Evolutionarily, it makes much more sense than the idea that intelligence is selected for, in which case everybody would be smart.


If you are going to appeal to genetic load to explain differences in intelligence, then I think you also have to appeal to genetic load to explain:

1. differences running ability (long distance and sprinting)

2. Differences in propensity for violence

3. Differences in civilization building

Now, the problems with those things are that different groups are good at different things, so now you are reduced to claiming that Africans (and African Americans) have higher genetic load in their intelligence genes, whites and East Asians have higher genetic load in their running/sporting genes, and so on, which seems rather ad-hoc and silly.