The Eichler Lab computational facilities consist of three main components: a high-performance cluster, network available storage, and dedicated application/database systems. In addition to the Eichler Lab's dedicated systems, the department maintains a shared infrastructure and an IT team.
The Eichler Lab's Linux-based High Performance Cluster includes a total of 110 nodes with an aggregate 1048 CPU cores. This cluster is specifically useful in running parallel applications, such as running RepeatMasker on many sequences, or running BLAST with many queries on large databases.
There are 491 terabytes (TB) of usable network available storage. The storage is a mix of EMC SAN based storage (22 TB), a CORAID storage server (48 TB), three large Sun Microsystems storage servers (131 TB), and three Dell SAS server (290 TB). To facilitate rapid analysis of data across systems, all storage can be made available to all cluster nodes, application servers, and desktop systems.
The dedicated application and database servers include six servers that use web server front-ends connected to databases in order to share data with the research community and provide an application/database development environment. Each of theses servers has 8 CPU cores and a there is a total of 8 TB of high-speed direct attached disk. The data shared with the research community includes different kinds of bioinformatics data mined from a variety of genomes. The lab also has a Sun Fire V40z Server with 32 GB of RAM and a PowerEdge R905 Server with 128 GB of RAM. These servers are used for memory-intensive computational analysis (i.e. application of phylogenetic algorithms to a large number of sequence taxa or graph theory analysis of multiple pairwise alignments or de novo sequence assembly).
The Genome Sciences Department has a fully dedicated IT staff of nine professionals. This group, honored with the UW Distinguished Staff Award, consists of five System Engineers, three System Administrators, and an IT Director. Shared departmental computing resources include 20 terabytes (TB) of online SAN connected storage, a high-performance cluster with an aggregate 332 CPU cores, and an Aspera server with a dedicated 10 GB/sec Science DMZ/I2 network link for data dissemination and acquisition. The IT group has significant bioinformatics and scientific computing experience; they manage a collection of lab-specific scientific computing systems that include high-performance clusters with 8000+ total CPU cores, 8 Petabytes (PB) of network available storage, and 700+ server systems. All systems are housed in the department’s dedicated data center with redundant cooling, UPS/generator-backed power, key-card controlled access, environmental monitoring, seismically braced racks and gas fire suppression. Departmental and lab-specific systems are backed up to tape, which are regularly shipped offsite for third-party vaulted storage. In addition to close monitoring, weekly restore tests of randomly selected file systems ensure back-ups are functional. Tape back-ups are accomplished using two Oracle SL3000 tape robots with 40 tape drives and a 12 PB tape slot capacity. Server and storage systems are interconnected with 10 Gig Ethernet and 40 Gig QDR InfiniBand.
Multiple Alignment Manipulator (MaM)
mrFAST: micro-read Fast Alignment Search Tool
mrsFAST: micro-read substitution-only Fast Alignment Search Tool
drFAST: dibase-read Fast Alignment Search Tool
VariationHunter/CommonLAW: Tool for Structural Variation Detection using Next-Gen Sequencing
NovelSeq: Tool for Novel Sequence Detection using Next-Gen Sequencing
SPLITREAD: Split read-based INDEL/SV caller for detecting structural variants and indels from genome and exome sequencing data
CoNIFER: Copy Number Inference from Exome Reads