|
Research Capabilities
Overview:
The Coordinated Laboratory for Computational
Genomics (CLCG) was founded in 1996, and has been engaged upon a variety
of research topics in bioinformatics and computational biology. These
include: gene and mutation discovery, expression analysis and functional
genomics, genetic and physical mapping, genome-scale analyses, and information
management. The Center for Bioinformatics and Computational Biology (CBCB)
was established in 2002 to expand involvement of research in BCB on the
UI campus, and to coalesce efforts of current investigators. The CLCG
is a charter member lab of the CBCB.
The CLCG has an integrated sequence processing
pipeline that facilitates cDNA-based gene discovery projects. This pipeline
accepts raw chromatograph files, as well as sequence text as input, and
returns a set of high-quality, annotated sequences. The pipeline includes
base-calling, novel and repeat/low-complexity feature identification,
quality assessment/filtering, clustering, and annotation of the individual
sequences. Standard components are used where applicable (phred, RepeatMasker,
BLAST), but several components have been locally developed (ESTprep, UIcluster).
To date, we have processed and submitted data for more than 1,000,000
ESTs across more than ten organisms (Scheetz et al., in press).
The CLCG also has several years of experience
with expression analysis, including micro-array (synthetic oligo-nucleotide,
cDNA, and Affymetrix), SAGE, and EST-based expression platforms. Current
infrastructure for expression analysis includes a centralized data-store
(database – UIMADS, and hierarchical file system) for all expression-related
data (regardless of source). This encompasses a broad spectrum of storage
needs, from the image and chromatogram files which are stored on disk
array and/or DVD archive media, to analyzed expression data and experimental
descriptions which are stored within a relational database. In addition
to gathering and analyzing experimental expression data, we also have
experience in the design of the experiments themselves - controlling/minimizing
sources of variability, design of custom probe sets, and design of oligo-nucleotide
probes. We have utilized the 10-base SAGE tags (NlaI) for the past two
years, and have recently moved to the longer 14-base system utilizing
the RsaI restriction enzyme.
Support for mapping projects within the CLCG includes
support for both genetic and physical mapping. This infrastructure includes
software (RHSCORER and RHMAPPER) and experience for radiation hybrid mapping,
as well as a high-throughput system for storage and access to high-throughput
genotyping resources (GenoScape and GenoMap) for genetic mapping.
Several analyses of genome-scale data sets are
currently being pursued within the CLCG. Current projects include identification
of microRNAs and their targets, full-length cDNAs, inter- and intra-species
gene families, transcription factor binding sites, transcription start
sites, alternative splice forms, alternative polyadenylation sites, and
simple and complex repetitive elements. These projects utilize diverse
publicly available and locally developed software.
We have also developed an integrated system for
managing information relating to disease gene mutation identification.
The TrAPSS system (Transcript Annotation Prioritization and Screening
System) manages sets of candidate genes for a wide range of diseases and
syndromes. The novel PAR algorithm (Prioritization of Annotated Regions;
T. Braun, in preparation) is used to identify regions that are most likely
to harbor disease-causing mutations. In addition, semi-automated primer
selection, ordering and screening results may all be stored within TrAPSS
- greatly speeding the screening process.
Similar database-centric systems have also been
developed to manage clinical and genotyping data for disease linkage experiments,
and to manage the information related to full-length cDNA sequencing.
GenoScape and GenoMap (Scheetz et al., in press) are related components
that combine to facilitate high-throughput genotyping projects. Similarly,
our full-length cDNA sequencing pipeline provides the ability to track
the progress of individual cDNA libraries and clones, and to provide feedback
into the system regarding the quality of the clones selected for full-length
sequencing.
References:
GenoMap
T.E. Scheetz, T.A. Braun, T.L. Casavant, K.J. Munn, E.M. Stone, and V.C.
Sheffield, GenoMap: A Distributed System for Unifying Genotyping and Genetic
Linkage Analysis, Parallel Computing , Vol 24, 1998, pp. 1567-1592.
UIcluster
Pedretti K, Scheetz T, Braun T, Roberts C, Robinson N, Casavant T. A Parallel
Expressed Sequence Tag (EST) Clustering Program. Lecture Notes in Computer
Science, Vol 2127, 2001, pp. 490.
ESTprep
Scheetz TE, Trivedi N, Roberts CA, Kucaba T, Berger B, Robinson NL, Birkett
CL, Gavin AJ, O’Leary B, Braun TA, Bonaldo MF, Robinson JP, Sheffield
VC, Soares MB, Casavant TL. ESTprep: preprocessing cDNA sequence reads.
Bioinformatics. 2003 Jul 22;19(11):1318-24.
Mapping
Scheetz TE, Raymond MR, Nishimura DY, McClain A, Roberts C, Birkett C,
Gardiner J, Zhang J, Butters N, Sun C, Kwitek-Black A, Jacob H, Casavant
TL, Soares MB, Sheffield VC. Generation of a high-density rat EST map.
Genome Res. 2001 Mar;11(3):497-502.
Gene Discovery
Scheetz TE, Laffin JJ, Berger B, Mackerly S, Baumes SA, Brown II B, Chang
S, Coco J, Conklin J, Crouch K, Donohue M, Doonan G, Estes C, Eyestone
M, Fishler K, Gardiner J, Guo L, Johnson B, Keppel C, Kreger R, Lebeck
M, Marcelino R, Miljkovich V, Perdue M, Qui L, Rehmann J, Reiter RS, Rhoads
B, Schaefer K, Smith C, Sunjevaric I, Trout K, Wu N, Birkett CL, Bischof
J, Gackle B, Gavin A, Mokrzycki B, Moressi C, O’Leary B, Pedretti
K, Roberts C, Smith M, Tack D, Trivedi N, Kucaba T, Freeman T, Lin J,
Bonaldo MF, Casavant TL, Sheffield VC, Soares MB. High-throughput gene
discovery in the rat. Genome Research, in press.
Tuggle CK, Green JA, Fitzsimmons C, Woods R, Prather
RS, Malchenko S, Soares BM, Kucaba T,Crouch K, Smith C, Tack D, Robinson
N, O’Leary B, Scheetz T, Casavant T, Pomp D, Edeal BJ, Zhang Y,
Rothschild MF, Garwood K, Beavis W. EST-based gene discovery in pig: virtual
expression patterns and comparative mapping to human. Mamm Genome. 2003
Aug;14(8):565-79.
Dimopoulos G, Casavant TL, Chang S, Scheetz T,
Roberts C, Donohue M, Schultz J, Benes V, Bork P, Ansorge W, Soares MB,
Kafatos FC. Anopheles gambiae pilot gene discovery project: identification
of mosquito innate immunity genes from expressed sequence tags generated
from immune-competent cell lines. Proc Natl Acad Sci U S A. 2000 Jun 6;97(12):6619-24.
Computing Facilities
and Space:
The CLCG and CBCB consist of approximately 30 full-time faculty, post-docs,
staff, and students occupying 3,000 sq.ft. housed in the Seamans Center
for the Engineering Arts and Sciences Building on the University of Iowa
main campus. This recently remodeled facility is wired for high-speed
networking (10- and 100-megabit, and gigabit ethernet – hardwired
and wireless), and includes 2 dedicated Linux clusters, more than 100
computing systems (workstations and servers), 137 CPUs, 97 Gigabytes of
RAM, and 4 Terabytes of Disk space. The following computer resources are
available:
1. A dedicated compute server cluster of 18 Linux
systems (36 CPUs) connected with a dedicated, switched, copper Gigabit
Ethernet intranet. (18 Dual AMD MP-2400 (2.2 GHz, 2GB memory, 40GB disk
each.
2. A second dedicated compute server cluster of 16 Linux systems (32 CPUs)
connected with a dedicated, switched, fiber-optic Gigabit Ethernet intranet.
(12 Dual Pentium III (500 MHz, 1GB memory, 9GB disk each), and 4 Dual
Pentium III (500 MHz, 2GB memory, 9GB disk each)).
3. A dedicated, dual fiber channel, redundant disk storage system(RAID),
1.2 TB usable space.
4. A second dedicated, dual fiber channel, redudant disk storage system,
412GB usable space.
5. A collection of data-server systems. (1) Dual
Xeon (2.4 Ghz, 2 GB memory, 110GB disk), (1) Dual Pentium III (600 MHz,
1GB memory, 507GB disk), (1) Dual Pentium III (500 MHz, 1GB memory, 81GB
disk), (1) Dual Pentium III (550 MHz, 512 MB, 80 GB)
6 . In addition, substantial computing infrastructure is currently in
place for development and monitoring of production computing. This includes:
(29) Pentium II/III/Athlon workstations running Linux (350-850 MHz, 128-1GB
memory, 4-36GB disk per system); 27 laptop Linux systems; a SPARC-20 database
server with 128MB of memory and 30GB disk, and a large collection of networked
Windows 2000/XP and Macintosh systems.
Space:
Office, and laboratory space is available for all Principal Investigators,
co-investigators, post-docs, staff and students. Convenient meeting space
is also available. Offices are equipped with computers and printers, video-conferencing,
and telephone conferencing. PI offices are equipped with video projection
capabilities for small to medium sized collaborative meetings, and for
multi-site video conferencing. All computers are connected to a 100Mbit
switched Ethernet backbone, and most of the space is covered by 802.11a
and b standard wireless Ethernet. All key personnel involved in this project
have Linux and/or Mac/PC computers (desktop and portable) connected to
the network. Many of the faculty and staff have high-speed connections
at home as well.
|