Computational Facilities
The Eichler Lab computational facilities consist of two main components: a high-performance cluster and network-available storage. In addition to the Eichler lab's dedicated systems, the department maintains a shared infrastructure and an IT team.
The Eichler computing facility consists of three main components. The first component is a Linux-based high-performance computing (HPC) cluster dedicated to large-scale, parallel applications associated with genome-wide analyses. The Eichler lab's HPC cluster includes a total of 60 nodes with an aggregate 1,820 CPU cores and more than 22 terabytes (TB) of memory. In addition to these CPU nodes are three GPU-specific nodes containing 16 A100 and 2 V100 GPUs, which help execute base calling and machine-learning applications. This cluster is specifically structured to execute many operations in parallel, which is critical for analyses and development of new computational methods involving the use of whole-genome or exome next-generation sequencing datasets, including long-read data generated from SMRT (single-molecule, real-time) sequencing platforms. The second component is 4.8 Petabytes (PB, usable) of disk-based network-available storage. The majority of the storage is DDN GS7k based. The third component is special purpose servers and support infrastructure. The HPC component, which includes three large-memory Dell systems with 512 GB, 2,048 GB, and 3,072 GB of RAM, is available for jobs with large memory footprints, such as de novo genome assembly and detecting structural variation across large, combined cohort populations. We also have a Linux system running MySQL databases and an Apache web server designed to publicly share data with the research community. Each is a standalone server dedicated to disseminating structural variation/duplication data for genomes. Database servers are mainly used to assist lab research. All systems are connected to a lab-dedicated core switch that provides 40 GB/sec connectivity between systems.
The department has eight professional IT support staff members dedicated to design, maintenance, and security of computing systems, including the Eichler computing facility. Shared departmental computing resources include 40 TB of online storage, an HPC cluster with an aggregate 580 CPU cores, and a Globus/SFTP server with a dedicated 10 GB/sec Internet2 "Science DMZ" link for data dissemination and acquisition. Departmental and Eichler lab systems are backed up to tape, which are regularly shipped offsite for third-party vaulted storage. In addition to close monitoring, weekly restore tests of randomly selected systems ensure back-ups are functional. Tape back-ups are accomplished using two Oracle SL3000 tape robots with 36 tape drives and 18 PB tape slot capacity.
Software
Locityper [Paper: in preparation] *(developed in laboratory of Tobias Marschall)SVbyEye [Paper: SVbyEye: A visual tool to characterize structural variation among whole-genome assemblies]
GAVISUNK [Paper: GAVISUNK: Genome assembly validation via inter-SUNK distances in Oxford Nanopore reads]
StainedGlass [Paper: StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps]
PAV: Phased Assembly Variant Caller [Paper: Haplotype-resolved diverse human genomes and integrated analysis of structural variation]
Segmental Duplication Assembler (SDA) [Paper: Long-read sequence and assembly of segmental duplications]
SMRT-SV v2 [Paper: Discovery and genotyping of structural variation from long-read haploid genome sequence data]
CoNIFER: Copy Number Inference from Exome Reads [Paper: Copy number variation detection and genotyping from exome sequence data]
SPLITREAD (Split read-based INDEL/SV caller for detecting structural variants and indels from genome and exome sequencing data) [Paper: Detection of structural variants and indels within exome data]
mrsFAST: micro-read substitution-only Fast Alignment Search Tool [Paper: mrsFast: a cache-oblivious algorithm for short-read mapping & mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications]
mrFAST: micro-read Fast Alignment Search Tool (GitHub) & mrFAST (original) [Paper: Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing]
VariationHunter & new replacement code TARDIS [Papers: Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes & Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery]
DupMasker [Paper: DupMasker: A tool for annotating primate segmental duplications]
PARASIGHT [Paper: unpublished]