Join PMWC 2018 Michigan This June to Learn More about How the Modulation of the Epigenome Increases the Risk for Type 2 Diabetes

Dr. Michael Boehnke (University of Michigan) – who pioneered large-scale studies identifying genetic risk in diabetes and bipolar disorder – shared with us some recent insights about recent advances in exome and genome sequencing and their applications to better understand disease biology and etiology of psychiatric disorders, the relevance of statistical data analysis to overcome the burden of multiple testing, and how the genome can influence the epigenome to modulate gene expression and risk of type 2 diabetes.

Dr. Boehnke is a true pioneer in precision medicine having conducted a number of the first large scale studies to identify genetic risk in diabetes and bipolar disorder. As a biostatistician, Dr. Boehnke’s research focuses on the genetic dissection of complex traits. In his 35-year career he has developed methods for analysis of human pedigrees, examined the history of breast cancer in genetically at risk individuals, and contributed important discoveries on the genetics of type 2 diabetes and related traits, such as obesity and blood lipid levels. He has served on the University of Michigan faculty since 1984, focusing on problems of study design and statistical analysis of human genetic data with a particular emphasis on development and application of statistical methods for human gene mapping. His current focus is on disease and trait association studies based on genome sequence and genotype-array data. Read his full bio.

PMWC 2018 Michigan takes place June 6-7, 2018.

Q&A with Michael Boehnke

Q: Little is known about the molecular basis of mood and psychotic disorders such as bipolar and schizophrenia. How do genetic studies using whole genome or exome analysis provide us an insight for the development of novel drugs, therapies, and preventive strategies?

A: Genome-wide association studies (GWAS) based on genotype arrays or sequencing identify genetic variants associated with any disease (or trait), including these psychiatric disorders. Availability of low-cost arrays assaying millions of sites in the genome together with clever statistical and computational tools now allow us to assay all but the rarest or most complex genetic variation in hundreds of thousands or even millions of individuals. Exome and genome sequencing allow near-complete assay even of very rare genetic variation, but sequencing costs need to fall even further to allow the sample sizes we need to identify disease-associated variants with high statistical confidence. Each disease-associated region we identify provides a potential entry point to understand disease biology and etiology, to suggest targets for new drugs, or to better target existing drugs to people for whom they will be helpful and not harmful. Taking the step from associated variant to causal mechanism to a drug is challenging, and a major focus for both academic and pharmaceutical researchers. The good news is that drug targets suggested by genetic studies have a substantially higher rate of progressing through the drug development pipeline than those without support from genetic studies.

Q: What are the challenges and some of the solutions you developed for analyzing genome or exome sequence data from 10,000s of individuals?

A: Analysis of sequence data challenges us computationally and statistically. A BAM file which includes the complete information for a single human genome requires 25 x 109 bytes (25 gigabytes) of computer storage, so that for our NHLBI-funded TOPMed project which has to date sequenced >120,000 genomes requires storage of 3 x 1015 bytes (3 petabytes) of data. It was only a few years ago I learned what the prefix peta meant! Dealing with that much data requires careful consideration of issues such as minimizing data transfers and avoiding multiple data copies. While the cost of sequence data generation has dropped by many orders of magnitude, the cost of computer storage has dropped more slowly, making careful data management critical.

Statistical analysis of such large data files also is challenging and has required us to develop analysis software that is computationally very efficient. To test for association with many millions of sites in the genome requires extreme levels of statistical significance to overcome the burden of multiple testing, so that many of our standard statistical tests are no longer well behaved and have to be modified. For example, we use modified disease association tests when the number of cases with disease is much smaller than the number of controls without disease. Carrying out so many tests on such large samples also requires very careful quality control to avoid even a low rate of false positives that would swamp true association signals with spurious ones. For example, we developed methods to identify and discard DNA samples that are contaminated by DNA from another person.

Q: What are the challenges we face and the opportunities that exist in resolving the complex processes underlying common diseases such as breast cancer and obesity?

A: We geneticists always need to keep in mind that genetics is just small part of the overall picture, and that environmental and behavioral factors also are critical to health and disease. Still, genetic information has the advantages that it is simple (a 4-letter alphabet), finite (3 billion base pairs is a lot, but is finite), and does not change (so we can measure it once and use it forever), whereas behaviors and the environment change all the time and measuring and summarizing them is truly challenging. GWAS identify genetic regions associated with disease, providing valuable entry points to understand human biology and disease. We seek to move from genomic regions to specific causal variants, genes, and pathways, which in turn can illuminate the complex causal processes underlying these and other diseases.

Q: You published last year a paper on genetic regulatory signatures that are associated with increased risk for type 2 diabetics. What is the significance of this discovery and could it help lead to more personalized treatments for diabetes?

A: My research group and our collaborators are working to understand the genetic basis of type 2 diabetes. Our work has identified hundreds of regions in the human genome that impact risk to type 2 diabetes and variability to diabetes-related traits like glucose and insulin levels. An important next step is to identify the specific genes and genetic variants involved, and their mechanisms of action. In Varshney et al., we presented an integrated analysis of human pancreatic islet molecular profiling data. We found that genetic variants associated with type 2 diabetes are more frequently present in regions of the genome where transcription Regulatory Factor X (RFX) is predicted to bind in an islet-specific manner, and that genetic variants that increase type 2 diabetes risk are predicted to disrupt RFX binding. Our findings provide a molecular mechanism by which the genome can influence the epigenome, and so modulate gene expression and risk of type 2 diabetes. It is our hope that these sorts of mechanistic insights, that result from combining molecular data on open chromatin and gene expression with other sources of genomic annotation can help pinpoint the functional mechanisms underlying type 2 diabetes and lead to better understanding of type 2 diabetes etiology and treatment.