Why GWAS's are evil

OK, yes, I know that genome-wide association studies have identified some important genetic effects but there are some important downsides too. I truthfully do not know whether the benefits have outweighed the costs.

Why am I saying this? (None of this is supposed to be original. Conversely, I am not going to go to the trouble to find the myriad of people who may have said the same thing first. So sue me. I think that's what blogs are for?) Here are some pertinent points:

Nobody believes anything if it's not in a GWAS


Studies which would previously have been judged to give perfectly respectable evidence to implicate a candidate gene are nowadays completely ignored because the stupid GWAS missed said gene. People don't believe a candidate gene study which yields a corrected p value of 0.001 derived from 20 markers but they do believe a GWAS which produces one p value of 10^-8. There is no logic in this position. But a lot of people take this attitude. At a recent conference summarising results of association studies all except GWAS's were completely ignored.

People think that a GWAS covers the genome

It so does not! I have looked at GWAS data and there are simply enormous gaps where no marker is typed. Many of the alleged "markers" on a chip are not actually markers at all because they are completely or almost completely monomorphic. The reliability of genotypes is often atrocious. Try downloading the genotypes from some publicly available source and see how many stupid obvious errors you find. Then think of the much larger number of errors which you don't know about because they're not obvious. The notion that one has "covered" the genome by running it through a standard chip is nonsense.

A GWAS will never, ever, ever find an important but rare variant

Suppose a risk variant has what is nowadays thought of as a very large effect, RR of 10, and an allele frequency of 0.0005, which is actually quite high relative to many known mutations. Even if this risk variant happens to be in complete LD with an SNP (rather than falling in one of the many gaps), the SNP will still not show association. If the minor allele of the SNP is in complete LD with this variant and has a frequency itself of 0.1 then the the SNP itself has RR of only 1.045. The GWAS will simply never, ever see important variants with fairly low frequency. Even if the SNP is right on top of the variant and even if the sample is huge. Do the maths.

Anybody who frets about "missing variance" is stupid

Really, I can't think of a more polite way to put it without actually lying. It might be more accurate to say that anybody who frets about missing variance is a stupid and evil, because they are so damaging to sensible scientific thought, but I will admit that perhaps they are just stupid and that they are unaware of the damage wrought by their promotion of their small-minded opinions. To these stupid people, my message is this. Of course a GWAS is not going to pick up all genetic variance, you idiot. If you read the above, you will see that a GWAS will not detect any rare variant. Not a single one. Never. It will systematically miss every single one. Even the ones which fall right on top of one of your SNPs rather than falling into one of the many gaps between them. And then, after you've done a GWAS, you start agonising about "missing variance". Please. Of course there's going to be missing variance. (I'm not very good at talking to stupid people but perhaps I'll improve with practice.)

GWAS's burn up datasets

Once one has submitted a dataset to a GWAS it becomes extremely difficult to extact any further information because one is constantly hit by the correction factor which has to be applied to the GWAS. So all the important findings for the real variants are lost in a mass of noise and even if somebody points at such a finding in the original analysis or else goes back and analyses another marker then nobody else will take any notice because "overall" the result will not be significant. That's because you've got 500K bits of noise obscuring the experiment you wanted to do. The dataset may consist of thousands of subjects and may have taken years to collect. It may consist of pretty much every case of the disease in question in the country, or on the continent. But once you've let somebody run a GWAS on it it's dead.

GWAS's beg for larger and large sample sizes

Because they incorporate tests of large numbers of markers they require huge sample sizes if they are to produce p values which will survive correction for multiple testing. A more carefully targetted study might have produced the same corrected p value with a far smaller resource. GWAS's suck resources away from other projects.

GWAS's tend to find the unimportant results

Because they can't detect rare variants, even with large effects, GWAS's only detect common variants. Selection pressures mean that common variants do not have large effects. So the only loci one "finds" with a GWAS tend to be those which are not aetiologically particularly important and which it is hard to understand the effect of. It would be much more useful to find a rare variant which had a large effect so that one could see what its function was and better understand disease causation. But a GWAS can't ever do that.

I could go on, but that's quite enough for today. Just letting off a bit of steam. Somebody whose opinion counts needs to take a hard look at these issues.

Comments

  1. Anybody remember affected sib linkage analysis? TDT studies?

    One day, GWASs will be seen the way - an expensive way to fail to discover anything useful.

    ReplyDelete
  2. This article nicely points out a few of the problems with GWASs: Lots wrong with genomewide assocation studies, far more than noted here. http://jama.jamanetwork.com/article.aspx?articleid=1390329. Unfortunately there are plenty more which aren't mentioned.

    ReplyDelete
  3. Good points Dave. What about rare variants which are tagged by multiple SNPs?

    ReplyDelete
  4. The other point I was trying to make yesterday is that if you keep on looking in the same place (which you are doing with arrays which cover the same set of SNPs across multiple GWAS) then you increase your chances of seeing something - so in theory you should correct for multiple testing to account for all the GWAS that have been done too.

    ReplyDelete
    Replies
    1. I don't think that's the way science usually works. If we do a study to see if a risk factor is connected with a disease, we don't correct for all the thousands of other studies which have been done looking at that risk factor and other diseases. The p<0.05 threshold is generally applied to each experiment. The problem with association studies was that it became easy to do multiple experiments, so GWAS thresholds addressed this, by calling the GWAS the experiment.

      Delete

Post a Comment

Popular posts from this blog

Nailing Hunt's lies

Sex and gender

SITUATIONAL JUDGEMENT TEST - example ranking question

How to be a dualist