# Description

NonB motif annotation with gfa

# Methods

Each updated assembly file was annotated with gfa (https://github.com/abcsFrederick/non-B_gfa) using the following command:  gfa -seq assembly.fa -out assembly_prefix -skipGQ. The output tsv files were converted to bed using the command: awk -v OFS="\t" '(NR>1){s=$4-1; print $1,s,$5}' assembly_prefix_$type.tsv |sort -k1,1 -k2,2n >assembly_prefix_$type.bed, where $type can be each of the nonB motif types "APR" (A-phased repeats) "DR" (direct repeats), "IR" (inverted repeats), "MR" (mirror repeats) "STR" (short tandem repeat) and "Z" (Z-DNA). Triplex motifs (mirror repeats that are annotated as likely to form triplex DNA) were extracted using the command: awk -v OFS="\t" '($12==1){s=$4-1; print $1,s,$5}' assembly_prefix_MR.tsv |sort -k1,1 -k2,2n >assembly_prefix_TRI.bed. Bed files were converted to bigbed files with the command bedToBigBed assembly_prefix_$type.bed assembly.sizes assembly_prefix.$type.bb, where assembly.sizes is a list with two columns of all chromosomes and their sizes. Note that strand information is not given. All motifs except APR are symmetric and exists on both strands. For APR, gfa search for both A-tracts and T-tracts on the forward strand, meaning also motifs on the reverse strand are found. 

# Credits

Linnéa Smeds <lbs5874@psu.edu>, Kateryna D. Makova <kdm16@psu.edu>