A good essay is 10% inspiration, 15% perspiration, and 75% desperation.

Eichler Lab

Department of Genome Sciences,
University of Washington

Computational Facilities:


The Eichler Lab computational facilities consist of two main components: a high-performance cluster and network-available storage. In addition to the Eichler Lab's dedicated systems, the department maintains a shared infrastructure and an IT team.


The Eichler computing facility consists of three main components. The first component is a Linux-based high-performance computing (HPC) cluster dedicated to large-scale, parallel applications associated with genome-wide analyses. The Eichler lab's HPC cluster includes a total of 218 nodes with an aggregate 3,056 CPU cores. This cluster is specifically structured to execute many operations in parallel, which is critical for analyses and development of new computational methods involving the use of whole-genome or exome NGS datasets, including long-read data generated from SMRT (single-molecule, real-time) sequencing platforms. The second component is 7.6 Petabytes (PB, usable) of disk-based network-available storage. The majority of the storage is DDN GS7k based. The third component is special purpose servers and support infrastructure. The HPC component, which includes three large-memory Dell systems with 512 GB, 2,048 GB, and 3,072 GB of RAM, is available for jobs with large memory footprints, such as de novo genome assembly and detecting structural variation across large combined cohort populations. We also have a Linux system running MySQL databases and an Apache webserver designed to publicly share data with the research community. Each is a standalone server dedicated to disseminating structural variation/duplication data for genomes. Database servers are mainly used to assist lab research. All systems are connected to a lab-dedicated core switch that provides 40 GB/sec connectivity between systems.


The department has eight professional IT support staff members dedicated to design, maintenance, and security of computing systems, including the Eichler computing facility. Shared departmental computing resources include 40 terabytes (TB) of online storage, an HPC cluster with an aggregate 580 CPU cores, and a Globus/SFTP server with a dedicated 10 GB/sec Internet2 “Science DMZ” link for data dissemination and acquisition. Departmental and Eichler lab systems are backed up to tape, which are regularly shipped offsite for third-party vaulted storage. In addition to close monitoring, weekly restore tests of randomly selected systems ensure back-ups are functional. Tape back-ups are accomplished using two Oracle SL3000 tape robot with 36 tape drives and 18 PB tape slot capacity.


Databases

Software

Segmental Duplication Assembler
SMRT-SV v2
PARASIGHT
Multiple Alignment Manipulator (MaM)
DupMasker
mrFAST: micro-read Fast Alignment Search Tool
mrsFAST: micro-read substitution-only Fast Alignment Search Tool
drFAST: dibase-read Fast Alignment Search Tool
VariationHunter/CommonLAW: Tool for Structural Variation Detection using Next-Gen Sequencing
NovelSeq: Tool for Novel Sequence Detection using Next-Gen Sequencing
SPLITREAD: Split read-based INDEL/SV caller for detecting structural variants and indels from genome and exome sequencing data
CoNIFER: Copy Number Inference from Exome Reads