3rd Annual Bioinformatics Retreat at the University of Iowa

Time of Event: 
Fri, 08/16/2013 - 9:00am - 3:30pm

The 3rd Annual Bioinformatics T32 Retreat at The University of Iowa will be held on Friday, August 16, 2013. All students and faculty with an interest in bioinformatics and computational biology are invited to attend. Lunch will be provided.



Using Bioinformatics and Genomics to Explore Genetics and Molecular Mechanisms in Human Eye Disease

Our keynote speaker this year will be Dr. Terry Gaasterland of the University of California, San Diego.  More on Dr. Gaasterland may be found here.





All attendees must first register for the retreat here.  We encourage authors to submit their work for posters and talks, but it is not required for attendance at the retreat. The deadline for registration and poster abstract submission has been extended to 5:00 pm, Monday, August 12.




1505 Seamans Center, University of Iowa (Google Maps, Building Floorplan)


9:00-9:30 Poster setup/coffee & bagels
9:30-10:30 Session I, Chaired by Dr. Terry A. Braun
10:30-10:40 Coffee
10:40-11:40 Session II, Chaired by Dr. Todd E. Scheetz
11:40-12:25 Lunch
12:25-1:10 Poster Session
1:10-2:10 Session III, Chaired by Dr. Tom L. Casavant
2:10-2:20 Coffee
2:20-3:30 Keynote Address - Dr. Terry Gaasterland, UCSD
3:30-3:35 Outstanding student talk award & closing remarks

Session I

Session Chair: Dr. Terry A. Braun



Elizabeth Stroebele

A computational approach for identifying and characterizing Notch-signal-dependent transcriptional enhancer sets


Bing He

Global View of Enhancer-Promoter Interactome in Human Cells


Mark Christopher

Computational Discovery of Optic Nerve Head Phenotypes


Volker Brendel

Scalable Genome Annotation (or How to Sail off the Genomic Cliff)



A computational approach for identifying and characterizing Notch-signal-dependent transcriptional enhancer sets


Elizabeth Stroebele and Albert Erives

Dept. of Biology, University of Iowa, Iowa City, IA 52242-3124



Transcriptional enhancers are key nodes at which developmental signals are integrated.  Much work has gone into finding enhancers and determining how they work, but characterizing single enhancers is still difficult. Our lab has had success in determining the logic encoded in enhancers by looking at them in terms of functional sets. For example, the neurogenic ectoderm enhancers (NEEs) of Drosophila drive expression along the dorsal/ventral axis of the embryo using a unique combination and organization of transcription factor binding sites. One key signal integrated by the NEEs is the Notch-signaling pathway, which is used throughout development. We have built a computational pipeline for identifying other functional sets of related Notch target enhancers that are active throughout development and which can be used to study and generalize enhancer logic. The pipeline takes as input comparative phylogenomic data sets from diverse multiple Drosophila species, and through diverse computational techniques we have extracted features (known TF binding sites, de novo prediction of motifs, di-nucleotide frequencies, etc) for candidate enhancer regions. This pipeline has found several new enhancers and allowed us to find the NEE functional set de novo. Further work is planned to use hierarchical bi-clustering of the sequences based on these diverse feature sets to improve the methodology.


Global View of Enhancer-Promoter Interactome in Human Cells


Bing He, Changya Chen, Li Teng, and Kai Tan

Interdisciplinary Graduate Program in Genetics, Department of Internal Medicine, Department of Biomedical Engineering, University of Iowa



Purpose: Enhancer mapping has been greatly facilitated by various genomic marks associated with it. However, little is available in our toolbox to link enhancers with their target promoters, hampering mechanistic understanding of enhancer-promoter (EP) interaction.

Methods: We develop and characterize multiple genomic features for distinguishing true EP pairs from non-interacting pairs. We integrate these features into a probabilistic predictor for EP interactions.

Results: Multiple validation experiments demonstrate a significant improvement of our method over state-of-the-art approaches. Systematic analysis of EP interactions across eleven cell types reveal several global features of EP interactions: 1) about 60% of EP interactions are tissue-specific; 2) promoters controlled by multiple enhancers have higher tissue specificity but the regulating enhancers are less conserved; 3) cohesin plays a role in mediating tissue-specific EP interactions via chromatin looping in a CTCF independent manner.

Conclusion: Our approach presents a systematic and effective strategy to decipher the mechanisms underlying EP communication.


Computational Discovery of Optic Nerve Head Phenotypes


M Christopher, L Tang, JH Fingert, TE Scheetz, MD Abramoff

Dept. of Biomedical Engineering, University of Iowa

Dept. of Electrical and Computer Engineering, University of Iowa

Dept. of Ophthalmology and Visual Science, University of Iowa

Institute for Vision Research, University of Iowa

Dept. of Veterans Affairs, Iowa City, IA



Purpose: To apply computational methods in the discovery of 3D optic nerve head (ONH) structural features for detecting and monitoring glaucoma.

Methods: A subset of participants from the Ocular Hypertension Treatment Study was selected on the basis of availability of simultaneous stereo fundus images. A stereo correspondence algorithm was applied to the set stereo fundus pairs to produce a disparity map that quantitatively measured the ONH structure for each participant. Principal component analysis (PCA) was applied to the disparity maps to extract computational 3D ONH structural features. The first 25 principal components, or features, were retained and examined individually in building a predictive regression models for cup-to-disc ratio. The relationship between the ONH features and the demographic variables of gender, age, and ethnicity were also examined. Bonferroni correction was used to adjust for multiple hypothesis testing.

Results: Five of the 25 computational 3D ONH features were significantly associated (Bonferroni p<0.05) with cup-to-disc ratio. Significant associations were also found between the ONH features and the demographic variables of age and ethnicity.

Conclusions: Using computational methods, we generated a set of structural features for quantifying the 3D shape of the ONH.  These features had significant associations with and predictive power for cup-to-disc ratio, a clinically important measurement used to diagnose and monitor glaucoma. Associations of ONH structural features were also found with age and ethnicity. Future work will explore the power of applying these features to detect and track glaucoma as well as exploring the genetic basis of these computational features.


Scalable Genome Annotation (or How to Sail off the Genomic Cliff)


VP Brendel

Indiana University

Department of Biology&  School of Informatics and Computing

212 South Hawthorne Drive

Simon Hall 205C

Bloomington, IN 47405



Genome annotation has struggled to keep pace with the rate of genomic sequence generation and assembly. But without accurate annotation, genome sequences are of little value.  I will review current approaches to genome annotation as well as novel approaches pursued in my group to address the problem of fast, scalable genome annotation.  In particular, I will discuss our implementation of a complete workflow for genome annotation and its visualization, available through the iPlant cyberinfrastructure.



Session II

Session Chair: Dr. Todd E. Scheetz



Joel Sharbrough

Comparing radical and conservative amino acid substitutions in mitochondrial genomes of sexual and asexual lineages of Potamopyrgus antipodarum


Ali Berens

Transcriptomics of caste development and differential nourishment in primitively social Polistes wasps


Andrew Adrian

Modeling Recombination Rate Variation in D. melanogaster


Xiaoqui Huang

Developing and using bioinformatics tools to address questions in molecular evolution



Comparing radical and conservative amino acid substitutions in mitochondrial genomes of sexual and asexual lineages ofPotamopyrgus antipodarum


JT Sharbrough (1), MA Luse (1), JL Boore (2), JM Logsdon (1), MB Neiman (1)

(1) University of Iowa

(2) University of California-Berkeley



One particularly interesting setting in which to evaluate the connections between genotype and phenotype is provided by comparing asexual vs. sexual organisms. Namely, asexual lineages are expected to experience reduced efficacy of purifying selection and thus exhibit accelerated harmful mutation accumulation relative to sexual counterparts. While several asexual taxa have been shown to accumulate harmful mutations more rapidly than related sexuals, it is still unclear whether mutation accumulation affects fitness enough to contribute to the evolutionary maintenance of sex or asexual lineage extinction. To evaluate the potential effects of harmful mutations, we compared the rate of radical non-synonymous substitution and the rate of conservative non-synonymous substitution across 13 protein-coding loci (~11 kbp) in the mitochondrial genome (mtDNA) for six diploid sexual (2x) and 17 triploid asexual (3x) lineages of Potamopyrgus antipodarum, a New Zealand freshwater snail in which sexual and asexual individuals coexist and compete. Asexual lineages of P. antipodarum have already been shown to exhibit elevated rates of putatively harmful non-synonymous nucleotide substitutions in their mtDNA relative to sexual conspecifics. Because radical non-synonymous substitutions (i.e. amino acid changes between biochemical groupings) are presumed to be more harmful than conservative non-synonymous substitutions (i.e. amino acid changes within biochemical groupings), we expected asexuals to have similar rates of radical substitution, but higher rates of conservative amino acid substitution in their mtDNA compared to sexuals due to decreased efficacy of selection. In our ongoing analyses, we have found that radical mutations are significantly more harmful than conservative mutations in this species.


Transcriptomics of caste development and differential nourishment in primitively social Polistes wasps


AJ Berens, JH Hunt, AL Toth

(AJB, ALT) Iowa State University, Program in Bioinformatics and Computational Biology, Ames, IA 50011; (AJB, ALT) Iowa State University, Department of Ecology, Evolution, and Organismal Biology, Ames, IA 50011; (ALT) Iowa State University, Department of Entomology, Ames, IA 50011; (JHH) North Carolina State University, Department of Biology, Raleigh, NC 27695; (JHH) North Carolina State University, Department of Entomology, Raleigh, NC 27695



Purpose: The presence of queen and worker castes is a defining feature of insect societies, and a spectacular example of biological complexity since alternative phenotypes can be expressed by a single genome. How is this accomplished? Polistes are considered to be “primitively eusocial” because caste differences are flexible in adults, although they are biased during early development. It has been proposed that there is a developmental switch (such as nutrition) that creates a caste bias in Polistes leading to adult phenotype of workers or queens.

Methods: We sequenced transcriptomes derived from four biological replicates of Polistes metricus fifth instar larval heads for two nutritional levels (restricted and ad libitum) and castes (queen- and worker-destined).

Results: Between castes, we identified 736 differentially expressed transcripts, of which 90% were up-regulated in worker-destined relative to queen-destined larvae. 70% of the 284 nutritional differentially expressed transcripts were up-regulated when nutrition was restricted compared to provided ad libitum. However, of the 43 transcripts that were differentially expressed for both caste and nutrition, 89% of the transcripts were up-regulated in worker-destined larvae and 85% were up-regulated with ad libitum nutrition, suggesting a subset of genes show important caste-nutrition connections. It is unknown whether the mechanisms of caste determination are conserved between eusocial species. Comparing across four social insect species, we found that some pairs had an over- or under-representation of shared caste differentially expressed genes, though directionality (up-regulation in worker or queen) is not necessarily conserved.

Conclusions: The vast majority of the differentially expressed transcripts between castes are up-regulated in workers and between nutrition levels are up-regulated with restricted nutrition. There is little overlap in the differentially expressed transcripts for both caste and nutrition.  However, a large proportion of significantly enriched GO terms are shared in common for both caste and nutrition, suggesting that similar pathways may be affected.   Comparisons of genes over- and under-expressed in queen and worker castes in different species have suggested there is a shared “toolkit” of genes or pathways involved in caste determination in several independently evolved social lineages. Comparative analysis of molecular signatures from a variety of social insects provides insights into the molecular mechanisms of eusocial evolution.


Modeling Recombination Rate Variation in D. melanogaster


AB Adrian, JM Comeron

University of Iowa



Purpose: Recent advances in meiotic recombination patterning have revealed the presence of a few DNA motifs highly enriched within recombination hotspots. However, efforts to develop a predictive model of recombination patterning capable of explaining the observed natural variation in recombination have been largely unsuccessful, as motif presence alone is a poor indicator of the likelihood of recombination events. Here, we attempt to develop a new predictive model of recombination occurrence by including motif localization and other dynamic properties of meiotic chromatin.

Methods: Using matrices of enriched motifs produced by MEME, we have calculated the per-interval, FDR-corrected frequency of the top ten recombination-associated motifs. In addition, we have similarly calculated per-interval measurements of meiotic transcription and other potential correlates of recombination. Employing machine learning and a multivariate statistical approach, we are assessing the effect of each parameter in the interest of creating a robust predictive model of recombination landscape patterning across genomes.

Results:  By applying a shrinkage and selection method for linear regression, we are able to account for nearly 50% of the variation in recombination patterning in Drosophila melanogaster.

Conclusions:  As our approach is perfected and additional parameters incorporated, we intend to design the first model capable of predicting recombination patterns on a whole genome scale. This model and approach should have the potential to be applied to many other genomes.


Developing and using bioinformatics tools to address questions in molecular evolution


X. Huang

Department of Computer Science and Plant Sciences Institute

Iowa State University



Purpose: Sudden death syndrome (SDS) of soybean first appeared in Arkansas in 1971 and then spread in 30 years to all major soybean-producing regions in the United States. SDS in North America is solely caused by the fungus Fusarium virguliforme. The rapid occurrence and spread of F. virguliforme in the United States during the last 80 years of mass soybean (native to East Asia) production provides a great opportunity to address the central question in molecular evolution of whether random genetic drift or positive selection plays a major role in the adaptive evolution of F. virguliforme.

Methods: We have developed two versions of our PCAP genome assembly program: PCAP.454 for 454 reads and PCAP.solexa for Illumina reads. A draft assembly of 454 paired reads for the genome of the F. virguliforme isolate named Mont-1 was produced with PCAP.454; a draft assembly of Illumina paired-end reads for the genome of the F. virguliforme isolate named Clinton-1B with PCAP.solexa. Illumina paired-end reads were generated for seven new isolates of five closely related Fusarium species. For each new isolate, single nucleotide polymorphisms or point substitutions (SNPs) between the isolate and Mont-1 were found by mapping the Illumina paired-end reads

from the isolate onto the Mont-1 genome assembly as a reference.

Results: The intraspecies SNP rate of a genomic region of F. virguliforme is more than 20 standard deviations above the mean intraspecies SNP rate of all genomic regions.

Conclusion: Positive selection plays a major role in the adaptive evolution of F. virguliforme.






Session III

Session Chair: Dr. Tom L. Casavant



Drena Dobbs

Analyzing RNA-protein complexes and interactions


Alex Wagner

Positive and Unlabeled Learning for Prioritization


Jacob Michaelson

Building a genome-wide map of nucleotide-level mutability


Julie Dickerson

Bioinformatics opportunities at the NSF




Analyzing RNA-protein complexes and interactions


R Walia, LC Xue, BA Lewis, UK Muppirala, V Honavar, D Dobbs

Bioinformatics and Computational Biology Program, Dept. of Genetics, Development & Cell Biology, and Dept. of Computer Science, Iowa State University, Ames, IA 50011



Purpose: Our long-term goals are twofold: i) to identify determinants of recognition specificity in RNA-protein interactions, and ii) to understand how networks of RNA-protein interactions are regulated and integrated into cellular regulatory and signaling networks. The purpose of this study is to develop reliable computational tools for analyzing RNA-protein complexes and interaction networks, including webservers for predicting interfaces in RNA-protein complexes and for predicting partners in RNA-protein interaction networks.

Methods: Here we focus on the “interface prediction” problem, identifying which residues in an RNA-binding protein are likely to contact RNA. We systematically compared the performance of sequence vs structure-based methods, and evaluated both machine learning and homology-based methods. In support of these studies, we established two databases, PRIDB, a database of interfaces from all structurally characterized protein-RNA complexes, and RPIntDB, a database of experimentally-validated RNA-protein interactions.

Results: Performance evaluation using benchmark datasets demonstrates that machine learning classifiers using PSSM-based encodings of protein sequences out-perform classifiers that use encodings. Predictions from structure-based methods that exploit geometric features generally have higher Specificity, but reduced Sensitivity. The performance of a sequence-based ensemble method, RNABindRPLUS, which combines our best machine learning and homology-based classifiers, is competitive with that of the best structure-based methods, providing predictions of interfacial residues in RNA-binding proteins with an overall accuracy of 90% and ROC AUC of 0.87.

Conclusions: Computational tools can provide reliable predictions of interfacial residues in RNA-protein complexes, focusing the attention of biomedical researchers on key residues for targeted mutagenesis and potentially identifying novel therapeutic targets."


Positive and Unlabeled Learning for Prioritization


AH Wagner, KR Taylor, AP Deluca, RF Mullins, TL Casavant, TE Scheetz, EM Stone, TA Braun

Center for Bioinformatics and Computational Biology

Institute for Vision Research

University of Iowa

Purpose: Diseases of the retina are complex disorders that are caused by numerous genetic factors. Identifying the genetic factors contributing to the disease phenotype for a patient can lead to a greater understanding of disease progression, heritability, and potential treatments. While we are able to utilize second-generation sequencing to identify potential disease-causing variants in the exome, often the number of variants identified is great. This study integrates datasets representing a diverse, quantitative, and informative feature set to prioritize identified variants from exome sequencing of patients with retinal degenerative disorders.

Methods: Analyses of ChIP-seq from mouse retina, mRNA-seq from human retina and sixteen other body tissues, gene characteristics such as length and exon count, and microarray data from ten tissues of the eye are used as a basis for our features. Features that best characterize the set of known retinal disease associated genes are identified using machine learning techniques and used to calculate the probability of association between any gene and a retinal degenerative disorder in an unbiased manner.

Results: We observe a highly significant enrichment for previously characterized disease genes using our prioritization method. Additionally, we prioritized a list of 33 variants identified in a recent exome sequencing study and found that the causative gene—DHDDS—was prioritized to the top of the list.

Conclusion: These results together demonstrate that this system is able to leverage supplemental, quantitative data to effectively prioritize retinal disease genes in candidate lists. 


Building a genome-wide map of nucleotide-level mutability


JJ Michaelson

Department of Psychiatry and Deptartment of Biomedical Engineering, University of Iowa



Not available.


Bioinformatics opportunities at NSF


J Dickerson

National Science Foundation



Not available.