Illustrative effect sizes for sex differences
Description
Ingalhalikar et al. (2013) study "sex differences in the structural connectome of the human brain" using a large sample of 949 individuals.
They report "conspicuous and significant sex differences that suggest fundamentally different connectivity patterns in males and females". They claim that their hypothesis that "male brains are optimized for communicating within the hemispheres, whereas female brains are
optimized for interhemispheric communication" was "overwhelmingly supported ... at every level".
The paper contains only t and pvalues, without any estimates of effect size. One can approximately (ignoring covariates) convert tstatistics into Cohen's d effect size estimates using d = t / sqrt(n1*n2 / (n1+n2)), or d = 2 * t / sqrt(df), where df=945 here.
This figure illustrates some effect sizes by plotting a standard normal distribution and a distribution shifted by an amount corresponding to the Cohen's d values. The figure includes the paper's most significant effect (largest absolute tvalue reported) and a key interhemispheric difference; these are compared to the effect size for a sex difference in height for illustration (data from Wikipedia).
The substantial overlap of the distributions highlights the danger of assuming that a significant difference from a large sample implies a fundamental/overwhelming difference between the sexes. The optimal (equal error) classification accuracy can be estimated as normcdf(d/2, 0, 1), which for the interhemispheric effect is about 56% (which is statistically significantly  but not really substantively  above chance).
Links
Comments (4)

0. This paper is woth reading and thinking about it. However it, like every paper in the literature, is likely to be making errors due to bias and technical difficulties
On the other hand
1. The original paper doesn't make any claims about magnitude of differences in connectivity. This is the only thing I care about until there is empirical data about the neural basis of differences between cognition in the sexes.
1a. the whole reason he has to estimate using cohens d is the absence of effect sizes in the published data.
2. I have not read it closely so feel free to correct me, but I only saw a correction using parametric methods (scrambling matrices and comparing). a 95 x 95 matrix will sing loudly for false positives.
2a. Something akin to a false discovery rate would certainly be worth discussing. False discovery rates are useful because it allows the investigator to set the tolerance for a number of false positives within their results, eg "if we want q<0.05, we can work with these data."
3. ad hominem attacks are for terrible college essays and flunky politicians. Critique methods, not pedigrees.
3a. Arguments to firepower neglect the extent to which fraud occurs in big labs and the way garbage papers from these same groups wind up getting high profile publications nobody else would. Note as point of fact, that someone published a human genome sequence without putting the data online for analysis and assessment. Publishing papers is a goofball game.
3b. I don't trust Einstein's ghost to tell the truth until I've seen the math and the experiments confirming it.
4. the whoooooooooooooooooooooooooooooooooooooooole point of science is that every question generating a falsifiable hypothesis should be addressed. eg, "without seeing effect sizes, I have a hard time believing these data are meaningful* under Cohen's D". So, acquire effect sizes and assess.
*meaningful being far more fascinating than significant.
05/12/2013 by

I am grateful that Drs Dodge and Mooperson have taken the time to engage with this. I hope I can be forgiven for only selectively responding here.
Most importantly, I'd like to clarify that I did not and do not consider the original paper to be fraudulent in any way. My argument is that I personally do not believe the effect sizes are consistent with the idea that men's and women's brains are fundamentally "wired" differently (rather that they have subtle differences on average, which can be significant with a large n), but there are no objective thresholds for small or large effects, so this can only be subjective; I intended this figure merely as a contribution to a debate, not as a definitive rebuttal.
Regarding the approximation, I should have been clearer about that, but I expect it to be a reasonably close approximation. To give an example, the formula based on the pure twosample ttest (without covariates), d = t / sqrt(n1*n2 / (n1+n2)), gives d = 0.4821 for t = 7.39; the formula based on DF (which is approximate for a pure ttest in the sense that it ignores the potential for unbalanced groups; it might perhaps nevertheless be considered a valid measure of effect size for any arbitrary tcontrast, but I leave that question for comments from more senior researchers), d = 2 * t / sqrt(df), gives 0.4808. The closeness of these results suggests (though certainly doesn't prove) that a more exact Cohen's d might not be so different. You will see from my figure that I used the larger of the two estimates, which was a deliberate attempt to be fairer to the authors.
There are of course other measures of effect size, some that I (with my limited knowledge) believe to be generally applicable to any t or Fstatistics from linear regression models, and that can be computed from only the reported statistics, are etasquared, epsilonsquared and omegasquared, all of which relate to the proportion of variance explained.
Based on t = 7.39 and df = 945, my calculations (which I would welcome others to check) suggest:
etasquared = 0.0546, epsilonsquared = 0.0536, omegasquared = 0.0536.
One can also transform Cohen's d into an equivalent correlational effect size measure. This book  http://books.google.co.uk/books?id=ByxHEePhwHIC  gives (equation 2.14) r = d / sqrt(d^2+4), which would convert d = 0.4821 into r = 0.2337, corresponding to r^2 = 0.0546. It is important to note that this agreement doesn't prove my original numbers are a good approximation though, since both sets of measures are similarly based on computing effect sizes without access to the original data, but it does at least suggest that my basic calculations are not dreadfully unreliable.
I am now too nervous to state outright that (approximately) 5.5% of explained variance is too small to conclude that men are from Mars and women are from Venus, but I do feel it is fair to present the numbers (with the caveat that I am not a statistician nor a senior researcher; though since Dr Dodge dislikes approximations, I feel compelled to note that the award of my PhD was not "two years ago", but three years, eleven months and seven days).
Regarding my own bad practice of not always including effect sizes, the flippant response would be that I have never done a study with n near 1000, nor can I recall having claimed to have discovered any fundamental dichotomies rather than just groupaverage differences. However, I must accept that I have not reported effect sizes as much as I should have done. One example of when I have is this (open access) paper  http://dx.doi.org/10.1016/j.jalz.2011.09.225  where we report a correlational effect size in Fig.S2, computed from tstatistics in a very similar way to the dtor conversion mentioned above. The maximal effect is about r=0.6 or r^2=0.36, but I think we are quite modest in interpreting the disease group differences due to their lack of statistical significance when correcting for multiple comparisons.
Finally, the last part of that digression reminds me to respond to Dr Mooperson, because the PNAS paper does indeed correct for multiple comparisons using permutation testing to control FWE (rather than FDR), at least for their Fig.2. However, I would suggest that further discussion of the PNAS article that is not related to the effect sizes would be better located elsewhere, for example on PubPeer, where there is already a healthy debate:
https://pubpeer.com/publications/3CFCCE950D22E7560E9B07C8B63979
05/12/2013 by

One have to have his head in the sand not to be cognizant of the huge wave of blogs and discussions with regard to the paper under question. The issues raised here by experts like Dr Ridgway is also confirmed by others with varying degrees of expertise, to name a few:
Prof Dorothy Bishop
http://storify.com/deevybee/postpublicationpeerreviewonsexdifferencesin/slideshow
Prof Sophie Scott
https://sites.google.com/site/speechskscott/SpeakingOut/askingquestionsaboutmenandwomenbylookingatteenagers
Dr Tom Stafoord
http://theconversation.com/aremenbetterwiredtoreadmapsorisitatiredclich21096
Christian Jarret
http://www.wired.com/wiredscience/2013/12/gettinginatangleovermenandwomensbrainwiring/
Neuroskeptic:
http://blogs.discovermagazine.com/neuroskeptic/2013/12/03/menwomenbigpnaspapers/
Dr David Colquhoun
https://pubpeer.com/publications/3CFCCE950D22E7560E9B07C8B63979
Neurocritic blog:
http://neurocritic.blogspot.com/2013/12/menaremapreadersandwomenare.html
Dr Cordella Fine:
https://theconversation.com/newinsightsintogenderedbrainwiringoraperfectcasestudyinneurosexism21083
This has not been the first time that PNAS publishes shiny, but methodologically lax papers. So it is not logical to accept whatever PNAS publishes without thinking about the methods. There are several other examples of such lowquality papers published in PNAS and other (even more reputable journals). And yes, I do not even have a PhD!
05/12/2013 by
You must be logged in to post comments.
4043
views
Published on 03 Dec 2013  18:10 (GMT)
Filesize is 12.91 KB
Categories
Authors
License (what's this?)
Cite "Filename"
Embed "Illustrative effect sizes for sex differences"
Claim article
You claim request was sent. I will be handled in the next 24 hours.
Close window
How very ironic that Dr. Ridgway should accuse the study's authors of violating effect size when in fact a quick and brief perusal of his own publications shows that he and his colleagues violate rules of effect size in their own research, and they do so often. What's perhaps even more ironic is that while criticizing the statistics reported in this PNAS article he does so by only approximating numbers, that is to say he is infering putative results, per his own admission he is only "approximately converting." Yet we should believe what he has to say here. I don't know about any of you, but I am more likely to give credence to something published in the Proceedings of the National Academies of Science than I am to believe anything posted on Figshare. Moreover, I am more likely to believe something that was edited, reviewed, and vetted by Charles Gross a leading scholar and pioneer in the field of neuroscience, before listening to a "research associate" who received his PhD 2 years ago. This is the so called "expert" that some bloggers are referring to when saying that "other experts have crunched the numbers and they state that although the differences are statistically significant, they are actually not substantive." This guy did not "crunch" any numbers he is at best guess'timating. And by the way, statistical analyses of data either yields significant results or it does not, there is no in between as you suggest, go back and take introductory stats. Finally, before saying things like "the paper contains only t and pvalues, without any estimates of effect size," and doing so in a tone that suggests the authors were somehow fraudelent in reporting their data, perhaps you may first want to consider that the Proceeding of the National Academies of Science does not require it! Or should we now also believe that you know better than all members of the National Academy of Sciences who set forth the editorial rules of the journal?
05/12/2013 by J. Dodge