# Description Assembly errors detected with NucFlag # Methods NucFlag is a tool that detects regions of the assembly that are potentially misassembled based on read alignments. PacBio HiFi reads are first aligned to the genome assembly via pbmm2 v1.14.99 with the following command: pbmm2 align --log-level DEBUG --preset SUBREAD --min-length 5000 Then, NucFlag: 1. Generates a read pileup of the first and second most common base frequencies per position in the genome. 2. Calculates the heterozygous base ratio. 3. Calls peaks in both first and second signals. 4. Filters peaks with the most abnormal read alignment coverage based on given thresholds. 5. Overlapps peaks to determine misassembly classifications. # Misassemblies MISJOIN (orange) * Drop in first most common base coverage or a coverage gap. * This region has minimal-to-no reads supporting it or is a scaffold. * Can overlap region with secondary base coverage. COLLAPSE (green) * Collapse with no variants. COLLAPSE_VAR (blue) * Collapse with variants. Overlaps region with high secondary base coverage. COLLAPSE_OTHER (red) * Region with a high het ratio. HET (teal) * Possible heterozygous (het) site. * Determined by the het ratio, the coverage of the second most common base divided by the first most common base coverage plus the second most common base coverage. # Credits Keisuke K. Oshima , Glennis A. Logsdon