Monday, 17 October 2016

Differences in sex differences: US trends and India


Sex differences fascinate, but would be easier to understand if only they would stand still for a moment! Reported sex differences vary in magnitude, 3 to 1, or 4 to 1, or 7 to 1.

As usual, it depends on the representativeness of samples, the abilities being measured, and also how far out on the right hand side of the bell curve you go when you measure the man/woman ratio of high achievers. In the early 1980s on the SAT-Math the sex ratio was approximately 2 to 1 for scores ≥500 (top 0.5%) and roughly 13 to 1 for students scoring ≥700 (top 0.01%).

As Gigerenzer keeps pointing out, most people misunderstand the combination of decimal points and percentage signs. 0.5% means 1 in 200, or 5 in a thousand, or 50 in ten thousand.  0.01% is even more tricky: it is not a fifth but a fiftieth of 0.5%. It is 1 in 10,000. In sum, males are 2 to 1 at a level of ability reached by 50 out of 10,000 students, but at the very high levels achieved by 1 out of 10,000 students the male advantage is 13 to 1. At least, that was the picture in the 1980s. What are things like now?

Before that, here is some background:

Maths is a man thing. September 2013

Advice to men caught unawares. November 2014

The historical record is clear: eminent men predominate by at least 7 to 1 or, in Charles Murray’s “Human Accomplishment” 30 to 0 for the very top thinkers, people like Aristotle, Darwin, Galileo, Newton, Einstein (page 143) .  Women have the perfect alibi of motherhood, and as Larkin noted, sexual liberation did not come till after the Beatles’ first LP.

Sexual intercourse began.

In nineteen sixty-three

(which was rather late for me) -

Between the end of the "Chatterley" ban.

And the Beatles' first LP.

In fact, some liberation began in 1870 with the Education Act, more in 1914 with the First World War, more after the contraceptive pill in the early 1960s and more and more thereafter.  Although that is true about our own age, perhaps this story is wrong, a mere blip of epoch-centric bias, and denies the rights and the impact women had made centuries before.  Thirty wills survive today from the late Anglo-Saxon period and ten of those are the wills of women, each of whom was a significant property owner , with the same rights of ownership and bequeathal as any man.  Women were highly significant figures in Saxon history, and were admired for their power and nepotism, even if it involved the occasional murder. Interestingly, royal succession was not by primogeniture, but by classifying royal progeny as aethelings (throne-worthy) and from this gene pool the royal family would select the one who seemed best qualified for the job. Meritocracy within aristocracy. So, when pressure groups today want to force employers to appoint women to high offices, they should recall that, as a rule of thumb, in the year 1000 it  was already the case that about a third of the richest Saxons were women.

However, given the clamour for equality in modern times, surely the speed of women’s advance should be quickening?

The sex ratio in accomplishment depends on the skills being measured (harder subjects increase sex differences) and how accomplished you have to be to be judged accomplished (harder standards increase sex differences). So, if we go for Fields Medallists, the score is 55 to 1. Coming down slightly from those sorts of levels, how are young American men and women doing in Maths?

Matthew C. Makel, Jonathan Wai, Kristen Peairs, Martha PutallazSex differences in the right tail of cognitive abilities: An update and cross cultural extension. Intelligence Volume 59, November–December 2016, Pages 8–15

In the Abstract they say: Male–female ability differences in the right tail (at or above the 95th percentile) have been widely discussed for their potential role in achievement and occupational differences in adults. The present study provides updated male–female ability ratios from 320,000 7th grade students in the United States in the right tail (top 5%) through the extreme right tail (top 0.01%) from 2011 to 2015 using measures of math, verbal, and science reasoning. Additionally, the present study establishes male-female ability ratios in a sample of over 7000 7th grade students in the right tail from 2011 to 2015 in India. Results indicate that ratios in the extreme right tail of math ability in the U.S. have shrunk in the last 20 years (still favoring males) and remained relatively stable in the verbal domain (still favoring females). Similar patterns of male-female ratios in the extreme right tail were found in the Indian sample.

They plot out the main US results in a dramatic graph.

SAT 700 sex ratios


The detailed results are in Table 1, so see what you think:

SAT maths table sex ratio

Look like sometime between 1996 and 2000 a new score category of 800 was added. Why? 700+ was good enough before. That category shows the biggest male advantage compared to the 700+ column. Looks like either a) students got brighter or b) the test got easier.

However, the EXPLORE-Math score did not show a decline. Whether earlier changes on some tests and the on-going stability in other tests can be explained by potential ceiling effects in the measure in this sample (see Wai et
al., 2012) or other reasons — (e.g., lack of time for any intervention, the effects of test makers purposefully “juking” tests to reduce demographic differences as suggested by Loewen et al., 1988) — is currently unknown

Well, this leaves a lot unknown. The drop in the sex ratio between 1980 and 1990 is enormous. Something must have happened. Crack teams must have fanned out across America, treating Maths anxiety among girls, and giving them special tuition. There must have been summer schools for the brightest girls. I have never seen such a speedy change in a scholastic indicator, and that includes the rise in language ability of first generation immigrants. It is not clear to me whether the authors believe in the change or not, which is a pity, because this is apparently one of the best findings showing that a cultural intervention can overcome an apparently deep-seated biological difference between the sexes. To give the authors their due, they mention that the tests may have been tampered with, so as to reduce sex differences, but they are the ones closest to the data, so I am sure they could tell us a little more. For example, given that this particular period is so extraordinary, why not plot out the results for each year? Big oscillations in the sex ratio during those years would be suggestive of cultural changes coming in, and taking time to spread through all schools. A sharp fall in a single year would suggest that the test had been revised in a major way. Which is it? What did the test makers say about sex differences over the years? Did they ever mention working on items to make sure they were not sex biased?

At the moment all I can think of is that US Maths tests prior to 1991 had the following statement in the instructions: ALL THE QUESTIONS IN THIS EXAM RELATE TO SPARK PLUGS.

Despite all this, as late a 2010 boys outshone girls at 7 to to 1 (actually 6.58 to 1, but I have rounded up for effect). On the ASSET test top score of 35 the ratio is 8 to 1.

It is minor gripe, but having got some great data from India, it was difficult to find it in their table. Please label the Indian results India. Saves time.

Have we yet another result which shows a biologically based male/female difference, which is also subject to strong cultural forces? I cannot be sure. I don’t know enough about the test content, and what questions may have been dropped because of presumed sex bias. I don’t know if the tests have become easier overall, but suspect it, since during recent years GCSEs in the UK became much easier in terms of the overall pass rate, and are now becoming slightly harder again. Test constructors are under pressure to make sure that their tests are fair, and the concept of fair mitigates against finding sex differences, as well as the more familiar race differences.

Despite my uncertainties, this is a good paper, on a very sizeable population of test takers in the US and in India. In my view the authors have not mined the Indian material very much. Surely in these disparate US and Indian tests there must be some very similar test items which would allow a proper comparison between US and India. The authors do some comparisons which assume US intelligence is identical to India, which null hypothesis I think can be discarded. Time for them to team up with Richard Lynn and see if they can do more work on the sex ratios in different Indian provinces, which are extremely heterogeneous in terms of general ability. Not sure what my prediction about sex ratios would be: the brighter the province the higher the sex ratios?

Overall, an intriguing finding, strongly suggesting a change in the sex ratio for Maths, but with relevant points still unanswered. Some specific item analyses could be highly informative.

I have already hinted that I know of work which links intelligence to measured brain volumes of men and women, finding brain size to be a good predictor of sex differences, but that paper is only just now going before reviewers, so whereof one cannot speak one must remain silent.

Keep tuning in to Psychological Comments.

Thursday, 13 October 2016

Has Europe been enriched by contemporary immigration?


In a wish to show I am capable of building up dramatic tension, here is one slide from a talk by Prof Heiner Rindermann which shows the correlations between cognitive ability, institutions and the wealth of nations, arranged in a Structural Equation Model. The loadings have been removed just to make the picture clearer, but the fuller version can be found in the conference slides.  This is the trailer, to be gazed at while eating popcorn and waiting for the main feature.

Cognitive ability and wealth of nations


The main feature is in two formats, so split screen would probably be the best way to see things.

First, here is the link to the whole conference slide show “Has Europe been enriched by contemporary immigration?”

(Check out the SEM on page 12, to see whether you prefer the simplified version or the original)

Here is the link to the talk itself:

Wednesday, 12 October 2016

More markers, more differentiation, and people know what race they are anyway


Cultural lag is the polite term for habits and hypotheses that never die. They become immune to refutation by virtue of constant repetition.  One such meme, due to Lewontin (1972), asserts that there is more genetic variation within genetic groups than between them, and therefore that…… er, ….there is no difference between the groups/there is no genetic difference between genetic groups/any differences between groups cannot be due to genetic reasons/asserting that genetic group differences are discriminable by genetics would be arbitrary and wrong/genetic groups do not exist.

I had never been convinced by these arguments, on the simple basis that genetic groups are clearly visible, and sustain themselves by genetic means, and are usually halved by admixture. Also, it was only a vague thought, but it seemed to me that a t test could still be significant with relatively small mean differences if the sample size was high enough. Probably not relevant in genetics, I mused.

In fact, the ease with which you can separate two genetic groups depends, like all discriminations and all clustering, on the number of markers available for the discrimination and clustering techniques being used. With only a few markers, discrimination is difficult, and error prone. As you increase the number, allocation to different groups becomes progressively easier.

So, to counter the endless echo of the original hypothesis, I am trying to put together a list of papers which explain and test the issue.

Tim Bates explains that Lewontin based his claims on blood type markers: about as advanced as it was possible to be in 1972, but hopeless to identify genetic clustering, therefore doomed to render a false negative.  The 2005 paper by Neil Risch (now cited 400 times) shows how inadequate that procedure was by showing one can now predict race near perfectly with random sets of SNPs.

Hua Tang, Tom Quertermous, Beatriz Rodriguez, Sharon L. R. Kardia, Xiaofeng Zhu, Andrew Brown, James S. Pankow, Michael A. Province, Steven C. Hunt, Eric Boerwinkle, Nicholas J. Schork, and Neil J. Risch. (2005) Genetic Structure, Self-Identified Race/Ethnicity, and Confounding in Case-Control Association Studies.  Am J Hum Genet. 2005 Feb; 76(2): 268–275.

The authors say in their abstract:

We have analyzed genetic data for 326 microsatellite markers that were typed uniformly in a large multi-ethnic population-based sample of individuals as part of a study of the genetics of hypertension (Family Blood Pressure Program). Subjects identified themselves as belonging to one of four major racial/ethnic groups (white, African American, East Asian, and Hispanic) and were recruited from 15 different geographic locales within the United States and Taiwan. Genetic cluster analysis of the microsatellite markers produced four major clusters, which showed near-perfect correspondence with the four self-reported race/ethnicity categories. Of 3,636 subjects of varying race/ethnicity, only 5 (0.14%) showed genetic cluster membership different from their self-identified race/ethnicity. On the other hand, we detected only modest genetic differentiation between different current geographic locales within each race/ethnicity group. Thus, ancient geographic ancestry, which is highly correlated with self-identified race/ethnicity—as opposed to current residence—is the major determinant of genetic structure in the U.S. population. Implications of this genetic structure for case-control association studies are discussed.


In their discussion they say:

Attention has recently focused on genetic structure in the human population. Some have argued that the amount of genetic variation within populations dwarfs the variation between populations, suggesting that discrete genetic categories are not useful (Lewontin 1972; Cooper et al. 2003; Haga and Venter 2003). On the other hand, several studies have shown that individuals tend to cluster genetically with others of the same ancestral geographic origins (Mountain and Cavalli-Sforza 1997; Stephens et al. 2001; Bamshad et al. 2003). Prior studies have generally been performed on a relatively small number of individuals and/or markers. A recent study (Rosenberg et al. 2002) examined 377 autosomal microsatellite markers in 1,056 individuals from a global sample of 52 populations and found significant evidence of genetic clustering, largely along geographic (continental) lines. Consistent with prior studies, the major genetic clusters consisted of Europeans/West Asians (whites), sub-Saharan Africans, East Asians, Pacific Islanders, and Native Americans. It is clear that the ability to define distinct genetic clusters depends on the number and type of markers used (Risch et al. 2002). Reports that document inability to define distinct clusters generally used only a modest number of markers and, hence, had little power to detect clusters (Romualdi et al. 2002). Studies with larger numbers of markers appear to show strong evidence of clustering (Stephens et al. 2001; Rosenberg et al. 2002).

Another major point of discussion has been the correspondence between genetic clusters and commonly used racial/ethnic labels. Some have argued for poor correspondence between these two entities (Lewontin1972; Wilson et al. 2001), whereas others have suggested a strong correlation (Risch et al. 2002; Burchard et al. 2003). We have shown a nearly perfect correspondence between genetic cluster and SIRE for major ethnic groups living in the United States, with a discrepancy rate of only 0.14%.

In sum, you get a near perfect correspondence between genetic measures and the common racial labels, with a misclassification rate of a mere 14 per 10,000. Some of this is due to the admixed “other” category, and perhaps some existential confusion in the others, but 9,986 in 10,000 subjects can master the art of looking in a mirror and noting which race they most resemble, a task beyond the wit of some academics.

Tuesday, 11 October 2016

Scientist stabbed to death by mentally ill illegal immigrant


Murderer and victim

That is the striking headline in The Telegraph, with all the makings of a modern horror story.

The Daily Mail likewise:

The Guardian is more circumspect, but equally informative

Only a few weeks ago I was discussing the detection of violence in schizophrenic patients, and questioning the basis for saying that it was virtually impossible to do anything about it.

Although the poor widow of the murdered man has made her witness statement, it will have zero impact on sentencing, and it is unlikely to have any influence on the current academic wisdom, which is that it is impossible to prevent such murders without restricting large numbers of patients, say 35,000 of them.

As you will see in the above post, I have my doubts about this claim. The facts of this horrible murder are not in dispute: the man was known to be psychotic, to have stopped medication, to be carrying knives, and to have threatened a policeman. He was also a heavy cannabis user. What more does a person have to do to be rated as a risk to others?

In my post you will see that I had some difficulty understanding the “stranger murder” calculations, but all is much clearer if, instead of preventing a stranger murder, we try to prevent an assault. This is worth doing, because to be assaulted is a profoundly distressing event, and if injuries are caused, also a potentially life changing one.

Taking the very paper which provides the “35,000” figure for stranger murder, the figures for assault are shown below, and put things into a more manageable context. The annual rates for assault and violent crime are extraordinarily high, almost unbelievably so. Given the very high base rate, screening and monitoring are worth while.

Positive predictive value in schizophrenia


As the event becomes more rare, the positive predictive value of the risk-categorization becomes lower, and the error rate higher, with progressively more people needing to be monitored to prevent one rare event. However, to prevent an assault would require that 3 schizophrenic patients be monitored, calling them in to check they are taking their medication, and presumably (hardest part) searching for them if they failed to show up. Easier would be to link up with the Police, so that if a patient is brought in for violent behaviour of any sort there can be coordinated management of the offender. Devoutly to be wished, often denied, but in the manageable range given the will and the resources. It would provide a good service for the patients, reducing suicide attempts, improving the quality of their lives, and reducing threats to others. It would certainly be worth testing it out in a London Borough, and checking that the above figures, derived from the best sources, hold up on further examination.

None of the media coverage goes into the question which arises out of normal curiosity: is psychotic behaviour more common among Africans in the UK? The picture above shows murderer and victim, and is an all too common pairing. The answer to the African question is: 6 to 9 times higher.

Morgan et al. (2006) First episode psychosis and ethnicity: initial findings from the AESOP study. World Psychiatry. 2006 Feb; 5(1): 40–46.

We found the incidence of all psychoses to be significantly higher in African-Caribbean and Black African populations across all three centres compared with the baseline White British population [African-Caribbeans: IRR 6.7 (5.4-8.3); Black Africans: IRR 4.1 (3.2-5.3)]. These differences were most marked for narrowly defined schizophrenia (F20) and manic psychosis (F30-31). For example, after adjusting for age, the incidence of schizophrenia across the three study centres was nine times higher in the African- Caribbean population [IRR 9.1 (6.6-12.6)] and six times higher in the Black African population [IRR 5.8 (3.9-8.4)]. The incidence rates for schizophrenia in the African- Caribbean and Black African populations (71 per 100,000 person years, and 40 per 100,000 person years, respectively) are among the highest ever reported. A strikingly similar pattern was evident for manic psychosis (F30-31). After adjusting for age, the incidence of manic psychosis was eight times higher for African-Caribbeans [IRR 8.0 (4.3- 14.8)] and six times higher for Black Africans [IRR 6.2 (3.1- 12.1)] compared with the White British baseline group. The rates of depressive psychosis were also raised, but more modestly [African-Caribbeans: IRR 3.1 (1.5-3.6); Black Africans: IRR 2.1 (0.9-5.0)]. Intriguingly, the incidence rates for all psychoses were also raised for all other ethnic groups (other White, Asian, mixed, other) compared with the White British populations, albeit much more modestly (IRRs for all psychoses ranged from 1.5 to 2.7).

If screening and or monitoring was done on a rational basis, those of African descent would be given particular attention, because detection is easiest where the baseline rate is high. There is 10 year follow-up work, showing generally poor prognosis, but with some achievements.

So, screen Africans who are psychotic or manic, particularly those on cannabis or other drugs, unemployed, not compliant with treatment, and showing any threatening behaviours, and get them treated as quickly as possible.

As a historical note, and at the risk of confusing things by raising an idea since disproved, or at least called into question, it was argued in the 90s that the incidence of schizophrenia was the same the world over, but that has since been shown not to be the case, or at least subject to exceptions. Looking at the references in the above paper by Morgan et al. 2006 I don’t consider it a real refutation, but it would be good to repeat the WHO study again more extensively. However, in a case of cultural lag, since I knew the team at WHO who did the work, it lurks in me as a given and true fact, whatever the current concerns about it.

These studies in countries of origin are important. If the rates of serious mental illness are low, then a case can be made for the stress of migration, and stresses of living in Western society (a common interpretation) as being the causes of the disturbance. It still needs to be explained why other migrants are far less prone than Africans. That aside, I think we need better studies in the countries of origin before being sure about causation.





Sunday, 9 October 2016

Artificial general intelligence: A Von Neumann machine


Alpha Go team


Intelligence is the ability to perform well across a wide range of tasks.

Intuition is inexpressible implicit knowledge.

Creativity is synthesizing knowledge to produce novel ideas.

One day my daughter came back from school, very excited. Nothing particular in that: she enjoyed education. But this time it was more than a class discussion, a maths competition won, or the delights of Java programming. She had listened to a talk by an outside speaker and was inspired. So, the speaker became some-one we lived with, in the ethereal but instructive sense of hearing her discuss the ideas he had engendered. She managed to get a week with him and his game company as part of work experience later in her education, and we all followed his illustrious career with a sense of identification. Moral for researchers: give at least one talk at a school.

Yesterday, thanks to a recommendation from Dominic Cummings, I listened to the same guy and have come away inspired, despite the contact being through a YouTube recording of an MIT lecture, and not face to face in a small classroom.

In the taped lecture below he discusses how his general intelligence system beat the world champion Go player. That is astounding in itself, but to me the most interesting aspect of his talk is his enthralling enquiry into the nature of thinking and problem solving. Has he found a technique with very powerful and wide application that will change the way we solve difficult problems?

His company employs 200 researchers, and attempts to fuse Silicon Valley with academia: the blue sky thinking of the ivory tower with the focus and energy of a start-up. With commendable enthusiasm and naïve impudence (doesn’t he know that many clever academics find these issues complicated, have studied them, and left them even more complicated?) he frames the problem thus:

Step 1 fundamentally solve intelligence.

Step 2 use it to solve everything else.

Who does he think he is? OK, a master chess player at 13, flourishing game company boss that developed Theme Park   and Republic, double First in Computing at Cambridge, then PhD in cognitive neuroscience at UCL, lots of excellent publications, and all this without listening to wise advice that he was setting his sights too high.

He says: Artificial Intelligence is the most powerful technology we will ever invent.

What follows is my considerable simplification of his talk, from which the aphorisms at the very start are also my compressed renditions of his remarks and working principles.

More prosaically, the technology he has developed is based on general purpose learning algorithms which can learn automatically for themselves from raw inputs, and are not pre-programmed; and can operate across a broad range of tasks. Operationally, intelligence is the ability to perform well across a broad range of tasks. This artificial general intelligence is flexible, adaptive and inventive. It is built from the ground up to deal with the unexpected: things it has never seen before. Old style artificial intelligence was narrow: hand-crafted, specialist, single purpose, brittle. Deep Blue beat Kasparov, but could not play simpler games like tick-tack-toe.

Artificial general intelligence is based on a reinforcement learning framework, in which an agent operates in an environment and tries to achieve a goal: it can observe reality and obtain rewards. With only noisy, incomplete observations it must build a statistical model of the environment, and then decide what actions to take from the options available at any particular moment to achieve its goal. A machine that can really think has to be grounded in a rich sensorimotor reality. There should be no cheating, no getting to see the internal game code. (Cheating leaves the system superficial and dull). The thinking machine interacts with the world through its perception. Games are a good platform for developing and testing AI algorithms. There is unlimited training data, no testing bias (one side wins, the other loses), opportunities to carry out parallel testing, and measure progress accurately. End to end learning agents go from the very simplest sensory inputs to concrete actions.

Deep reinforcement learning is the extension of reinforcement learning (conditioning, it used to be called: making actions conditional upon outcomes) so that it works at scale. Deep Mind started its learning journey with Atari games from the 1980s. (How Douglas Adams would have loved this! It reminds me of showing him around the technology museum at Karlsruhe, and as I walked past what I assumed he would see as boring Atari kids games, he burbled with pleasure, and named every one of them and their characteristics. I digress.) The learning agents received nothing but the raw pixels (about 30,000 pixels per frame in the game), tried to learn how to maximise their scores, learnt everything from scratch, and developed ONE system to play ALL the different games. Hence, the systems were learning about the games at a very deep level. (Nature, Learning Curve, 2015 Mnih et al).

In a nod to neuroscience, systems can be considered to have a neurology at a very high computational level: algorithms, representations and architectures. Deep-reinforcement-trained machines can now cope with two-dimensional symbolic reasoning, similar to Tower of Hanoi problems, in which a start state is given and the device must follow the rules, but get to a specified Goal state. This is like (example comes from friends at lunch yesterday) trying to change round the furniture in their house and realising, late in the process, that the correct solution depended entirely on moving the small desk on the top landing.

“Go” is the perfect game to test the deep learning machine, previously trained up on the starter problem of all the Atari games. Go has 10 to the power 170 positions, 19 by 19 “squares” (interstices) and only two rules: stones are captured when they have no liberties (are surrounded and have no free vertices to move to); and a repeated board position is not allowed. It is the most complex, profound game, requires intuition and calculation, and pattern recognition plus long term planning: the pinnacle of information games. Brute force approaches don’t work, because the search space is really huge (branching factor of 200, compared to 20 in chess) and it is extremely hard to determine who is winning. A tiny change can transform the balance of power, a so called “divine” move can win the game, and change the history of the game. (See the pesky small desk at the top of the stairs).

To deep-learn the game of Go, the team downloaded 100,000 amateur games and trained a supervised learning “policy” network to predict and play the move the human player played. After a lot of work they got to 60% accuracy as to what a human would have done. They then made the system play itself millions of times, and rewarded it for wins, which made it slowly re-evaluate the value of each move. This got the win rate up to 80%. Then the system played itself another 30 million times. That meant for every position they knew the probability of winning the game, which gave them an evaluation function, previously thought an impossible achievement. They called this the value network, which allowed a calculation of who was winning, and by how much.

The Policy Network provides the input in terms of the probability of moves arising from a position, and the Value Network provides the game-winning value of a move. All this is great, but you still need a planning function. They used a Monte Carlo tree search, and instead of having to churn through 200 possibilities, they looked at the 2 or 3 moves most played by the amateurs. I have simplified this, but it made the search task manageable: a great breakthrough. Thus trained and maximized, AlphaGo could beat 494 out of 495 computer opponents. It then beat Fan Hui, a professional player 5-0. (Silver et al. Nature 2016)

Very interestingly, getting more computer power does not help AlphaGo all that much. Between the first match against the professional European Champion Fan Hui and then the test match against World Champion Lee Sedol, AlphaGo improved to a 99% win rate against the 6 month earlier version. Against the world champion Lee Sedol, AlphaGo played a divine move: a move with a human probability of only 1 in 1000, but a value move revealed 50 moves later to have been key to influencing power and territory in the centre of the board. (The team do not yet have techniques to show exactly why it made that move). Originally seen by commentators as a fat finger miss-click, it was the first indication of real creativity. Not a boring machine.

The creative capabilities of the deep knowledge system is only one aspect of this incredible achievement. More impressive is the rate at which it learnt the game, going up the playing hierarchy from nothing, 1 rank a month, to world champion in 18 months, and is nowhere near asymptote yet. It does not require the computer power to compute 200 million positions a second that IBMs Deep Blue required to beat Kasparov. Talk about a mechanical Turk! AlphaGo needed to look at only 100,000 positions a second for a game that was one order of magnitude more complicated than chess. It becomes more human, comparatively, the more you find out about it, yet what it does now is not rigid and handcrafted, but flexible, creative, deep and real.

Further, it is doing things which the creators cannot explain in detail. So intent were they in building a winner, they did not give it the capacity to give a running commentary. Now, post-win, they are going to build visualizers to show what is going on inside the Von Neumann mind. What will the system say? “Same stupid problem as Thursday?” “Don’t interrupt me while I am thinking?” Or just, every time: “comparing Policy with Network, considering the 3 most common moves, watching the clock and sometimes, just sometimes, finding a shortcut”.

What about us poor humans, of the squishy sort? Fan Hui found his defeat liberating, and it lifted his game. He has risen from 600th position to 300th position as a consequence of thinking about Go in a different way. Lee Sedol, at the very top of the mountain till he met AlphaGo, rated it the best experience of his life. The one game he won was based on a divine move of his own, another “less than 1 in 1000” moves. He will help overturn convention, and take the game to new heights.

All the commentary on the Singularity is that when machines become brighter than us they will take over, reducing us to irrelevant stupidity. I doubt it. They will drive us to new heights.

On that note, the program was created by humans, as shown in the picture at the top of the post. The AlphaGo team, who in my mind must rank high in the annals of creative enterprise, are a snapshot of bright people on whom the rest of us rely for real innovation.

All those years ago, my daughter was right to think that Demis Hassabis showed promise.

Promise me you will give at least one talk at a school.

Wednesday, 5 October 2016

Richard Lynn Intelligence database: Becker edition


Whereas there are many very well funded projects which study national and international scholastic ability without mentioning intelligence, there is one database for the national intelligence of the countries of the world, and that was put together by one person, unfunded, working in his study. Prof Richard Lynn gathered together the very disparate studies which mention the nationality of test takers, and assembled them into one database.

David Becker, who works with Prof Heiner Rindermann at the Technical University of Chemnitz, Germany, has taken on the task of going through all the results and tracking down the references, an enormous labour. We want a copy of each reference, so that everything can be checked. David is a research student studying for his Masters and has been concentrating on cross-national differences in ability, their consequences, and their possible origins in early human migration. Unusually for our effete profession, he had an honest trade before he entered Psychology. He is a fully trained butcher (three years of training) and worked in that profession for two years, so has skills which will stand him in good stead in research, as he cleaves fatty residues away from good meat.

Here are two references to introduce you to his work:

Becker, D., & Rindermann, H. (2016). The relationship between cross-national genetic distances and IQ-differences. Personality and Individual Differences, 98, 300-310.

Rindermann, H., Becker, D., & Coyle, Th. R. (2016). Survey of expert opinion on intelligence: Causes of international differences in cognitive ability tests. Frontiers in Psychology, 7, 399.

The database gives the Country, the age of the testees, the N, the test, the IQ, the short and the full reference, and then a column indicating whether we have an copy  of the reference (Y or N). Occasionally there are question marks where a reference has not been traced.

This is where you come in. Have a look at the list, and if you have a copy of the papers, send David Becker a scan of them.

Also, if you have extra papers which have not been included (we know that some of you have been extending the database considerably) please get in touch so that we can put everything, duly acknowledged, into one document. Here is how you email David Becker.  Write his name in lowercase, first and second name separated by a dot, then put in the eponymous at symbol, followed by “”

Now for the download. Use the following link, and you have the world at your fingertips. Every time you listen to the news, your can look up the IQ results for the relevant country, and draw your own conclusions.


Tuesday, 4 October 2016

A quick education in Edinburgh


One flight to Edinburgh and I could get an education:


R programming


R programming: so I could crunch data again without SPSS. It might drive me mad, but I am told that thereafter all is serene and pure, like Chapman’s Homer.


Cognitive genetics

Cognitive Genetics: so I could read results with more insight, and spot any errors or interesting connections as the genetic story unfolds.



Systematic reviews and meta-analysis: to check that these things are being done properly. On that general topic, I have already muttered a few suggestions about inclusion criteria in previous posts, suggesting they should be graded for two levels of methodological purity.

Cognitive testing and details

Cognitive testing: because, although I imagine I know about this, this will be the most recent stuff, and targeted at ageing research. Cognitive testing is advancing, particularly in internet driven research, and some assessments are now very fast and efficient.

After that, I could talk about almost everything of interest in psychometrics. The further particulars about applying are as shown above.

If I don’t make it, perhaps you would like to go along and then let me see your notes, sending them to me as a Christmas present. If I manage to get there, please sit with me at the back and explain things as they go along.

Monday, 3 October 2016

Cupid calls: research in progress


The received wisdom about lonely hearts ads is that men advertise their status and wealth, women their looks. It is a simple trade.

More nuanced approaches suggest that successful relationships will depend on similarities of character, interests and ambitions. More prosaically, that men and women will stay together when they do things together, because they like the hobbies and interests they have in common, and work together to build up those common interests.

Emil had collected publically available data about the questions people ask each other when looking for a partner on OK Cupid. We do not know who is talking to whom, or with what outcome, but we have the anonymous questions, which can be linked to the anonymous basic details given by the person. No one’s privacy is being invaded, but we are getting a look at the question American ask each other when looking for love.  This is very interesting and informative.

Here is the link to Emil’s website on OK Cupid.

The attached video is Emil’s talk. The subsequent discussion is an illustration of how research gets done.

Also informative is the way that researchers see connections, then test the generality and strength of those connections. You probably know all this, and have better examples, but the exchanges between researchers are, to me, very interesting to listen to.


Sunday, 2 October 2016

Sunday lecture: Ancestry in the Americas: a meta-analysis


Traditionally, British Sundays were a day of repose, dedicated to the minority who wished to go to church, on whose behalf the godless majority forswore pleasure, and dedicated themselves to uplifting literature and improving healthy walks. Mostly, it rained, and Monday was a relief.

For your proper entertainment, here is Emil himself, in full flow.

Biogeographic Ancestry and Socioeconomic Outcomes in the Americas: a Meta-analysis

Speaker: Emil O. W. Kirkegaard

Co-authors: John Fuerst

A meta-analysis of American studies reporting associations between socioeconomic outcomes (S outcomes) and biogeographic ancestry (BGA) was conducted. 41 studies yielded a total of 167 datapoints and 57 non-overlapping effect sizes. European BGA was found to be positively associated with S outcomes r = .16 [95% CI: .12 to .20, K=23, N=20,837], while both Amerindian and African BGA was negatively so, -.12 [-.18 to -.06, K=17, N=15,870] and -.10 [-.16 to -.04, K=17, N=24,142], respectively. There was considerable cross-study variation in effect sizes (mean I2=90%), but there were too few datapoints to permit credible moderator analysis. Implications for future studies are discussed.


Here is the full live version, in only 19 minutes, because Emil talks fast or, as I call it, “at normal speed”:

Much, much better than the box set you were thinking of watching.

Thursday, 29 September 2016

Goodbye Sweden: Can I have a quick reaction?


Journalists, being fed news of some dreadful event, are prone to ask their studio guests: “Can I have a quick reaction?” Almost always the Talking Head comes up with an off-the-cuff reaction, also known as an opinion, as to whether the event is the end of: a dictator/a government/a country/low cost oil/Western civilization/the planet.

I would not dream of criticising this response, particularly because in former times on TV I sometimes ventured minor versions of such a response. I have not yet been asked to comment in a public arena as to whether the finding that contemporary reaction times are slower than in times of yore indicates the decline and fall of our civilization. You know the story full well: the much championed Flynn effect suggests that good food, free education and proper drains have boosted our intelligence, as well they might have; the Woodley effect suggests we are slowing up, losing our intellectual sparkle, becoming more specialised in our abilities but very probably sinking into the mire of soggy stupidity.

Now we have some even more solid findings to favour The Woodley Effect. (By the way, Charles Murray, responsible for coining The Flynn Effect,  suggested to me that the contemporary lowering of intellect should be named in this way).

Guy Madison, Michael A. Woodley of Menie and Justus Sänger

Secular Slowing of Auditory Simple Reaction Time in Sweden (1959–1985) Front. Hum. Neurosci., 18 August 2016 |

They say: There are indications that simple reaction time might have slowed in Western populations, based on both cohort- and multi-study comparisons. A possible limitation of the latter method in particular is measurement error stemming from methods variance, which results from the fact that instruments and experimental conditions change over time and between studies. We therefore set out to measure the simple auditory reaction time (SRT) of 7,081 individuals (2,997 males and 4,084 females) born in Sweden 1959–1985 (subjects were aged between 27 and 54 years at time of measurement). Depending on age cut-offs and adjustment for aging related slowing of SRT, the data indicate that SRT has increased by between 3 and 16 ms in the 27 birth years covered in the present sample. This slowing is unlikely to be explained by attrition, which was evaluated by comparing the general intelligence × birth-year interactions and standard deviations for both male participants and dropouts, utilizing military conscript cognitive ability data. The present result is consistent with previous studies employing alternative methods, and may indicate the operation of several synergistic factors, such as recent micro-evolutionary trends favoring lower g in Sweden and the effects of industrially produced neurotoxic substances on peripheral nerve conduction velocity.

The authors have collected new data on a large sample, with 7081 usable respondents on which there was much background material from previous testing. They pursued the respondents with reminders, and tested them online, using the best available software to ensure consistent exposure and recording of responses. This cannot be the same as bringing them in to a standard experimental set up of reaction time equipment, but on the other hand it generates much higher numbers of respondents. They have also considered the impact of these variations in methods which, if anything, would obscure rather than reveal underlying trends.

Reaction times seem to slow up after 1970. The authors say:

We found clear trends toward slowing auditory SRT when birth year was regressed against year-on-year SRT means for the years 1959–1985. It is notable that even without adjustment for aging, the SRT speed of the oldest participants is about the same as that of the subsequent generation, whom in the late twenties are supposed to have the shortest SRTs of all age groups (Der and Deary, 2006).

the secular slowing trend was present in all cohort comparisons (males, females, and both sexes combined), and was significant across the entire range of birth years for both the males and the whole sample, but not for the females, who nonetheless exhibited an overall negative trend in SRT performance consistent with potential secular slowing.

A potential cause of the apparent slowing may be exposure to neurotoxic industrial by-products such as heavy metals (Silverman, 2010) and dioxins (ten Tusscher et al., 2014), which may reduce SRT performance via their effects on peripheral nerve conduction velocity. However, as Silverman notes, known neurotoxins have come under tight governmental regulation, emissions have tended to decrease, and serum levels of lead, for example, have decreased since 1970 in the USA (Silverman, 2010, p. 46).

Another possible cause of this trend may be relatively recent micro-evolutionary trends favoring lower g in the population of Sweden. Several studies have revealed that g and fertility are inversely related in the US and the UK (as reviewed in Woodley of Menie, 2015) among cohorts born as far back as the 1890s (Lynn and Van Court, 2004; Lynn, 2011). However, the relationship between g and fertility in Scandinavian countries is less well characterized. Only one study has attempted to examine these trends across birth cohorts in Sweden (Vining et al., 1988). Utilizing aggregate data on fertility and IQ for a mixed-sex sample of Swedish cohorts resident in Stockholm county and born between 1909 and 1940 from Vining et al. (1988), it was possible to reconstruct predicted generational changes in genotypic IQ (I.e., the heritable variance component of IQ) due to the changing patterns of selection (I.e., the correlation between IQ and fertility established for each cohort) for four cohorts (see Appendix 2 for details of the method).

Main result here, but see the full paper:


In sum, this is strongly suggestive of a slowing of reaction times in Sweden, itself suggesting a possible drop in mental alertness and intelligence in that country. If the Flynn effect were a deep-seated real improvement in functioning then one would expect faster reaction times, not slower. An alarming result, worthy of further testing and attention.