###METADATA for "Whole-genome alignments with grasses" tracks: "Syntenome" refers to genomic regions in maize B73 version 5 that are calculated to be syntenic with genomic regions in other plants. These tracks represent syntenic alignment coordinates between maize and the indicated plant. They include regions extended beyond or not represented by annotated gene models. The purpose of these tracks is to indicate the regions in maize that have synteny in other plants to highlight gene model annotations that are likely to be conserved, or to highlight regions where gene model annotations might be missing. #Genome alignment of B73v5 vs non-Zea plants: The syntenomes between maize and non-Zea plants were generated using the program AnchorWave https://github.com/baoxingsong/AnchorWave with the command ''' anchorwave proali -i Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.gff3 -as B73v5_cds.fa -r Zm-B73-REFERENCE-NAM-5.0.fa -a _cds.sam -ar B73v5_ref.sam -s .fasta -n v5_.anchors -R 1 -Q 2 -t 10 -o v5__alignment.maf -f v5__alignment.f.maf > v5__alignment.log ''' where represents the target genome Sorghum, rice, etc. #Post-processing: Maize reference and plant query genomes in the AnchorWave maf output files were swapped using the script anchorwave-maf-swap.py, split into plant query chromosomes using UCSC's mafSplit, swapped back to maize as the reference, then converted to gvcf files using the MAFToGVCFPlugin from the PHG pipline https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/CreatePHG_step2_MAFToGVCFPluginDetails.md ''' singularity exec phg_latest.sif /tassel-5-standalone/run_pipeline.pl -Xmx300g -MAFToGVCFPlugin -referenceFasta Zm-B73-REFERENCE-NAM-5.0.fa -mafFile -sampleName -gvcfOutput -fillGaps true > 2>&1 ''' Finally, the gvcf outputs were converted to bed files and deletions were subtracted from alignment spaces using bedtools https://bedtools.readthedocs.io/en/latest/ with the command ''' for sample in v5_${1}_*.gvcf.gz do zcat ${sample} | grep -v "#" | tr ';' '\t' | grep "Strand" | awk -v OFS="\t" '{if($5 ~/,/)print "chr"$1,$2,length($4),"variant";else if ($5 !~/,/)print "chr"$1,$2,$12,"align"}' | sed -e 's/END=//g' | awk -v OFS="\t" '{if($4 ~/variant/)print $1,$2,$2+$3-1,$4;else print}' | awk -v OFS="\t" '{if($4 ~/variant/ && $3 - $2 > 0)print $1,$2,$3,"del"}' | uniq >> v5-${1}_syntenome_whole-genome_del.bed done sort -k1,1 -k2,2n v5-${1}_syntenome_whole-genome_del.bed > v5-${1}_syntenome_whole-genome_del_sorted.bed for sample in v5_${1}_*.gvcf.gz do zcat ${sample} | grep -v "#" | tr ';' '\t' | grep "Strand" | awk -v OFS="\t" '{if($5 ~/,/)print "chr"$1,$2,length($4),"variant";else if ($5 !~/,/)print "chr"$1,$2,$12,"align"}' | sed -e 's/END=//g' | awk -v OFS="\t" '{if($4 ~/variant/)print $1,$2,$2+$3-1,$4;else print}' | awk -v OFS="\t" '{if($4 ~/variant/ && $3 - $2 > 0)print $1,$2,$3,"del"; else print $1,$2,$3,"other"}' | bedtools merge -d 1 >> v5-${1}_temp done sort -k1,1 -k2,2n v5-${1}_temp > v5-${1}_syntenome_whole-genome_sorted_ndrm.bed bedtools subtract -a v5-${1}_syntenome_whole-genome_sorted_ndrm.bed -b v5-${1}_syntenome_whole-genome_del_sorted.bed > v5-${1}_syntenome_whole-genome_sorted_minus-del.bed ''' and all bed files for a given maize/query-plant analysis were concatenated and sorted into a single, whole-genome bed file. ##For B73v5 vs Zea, the program AnchorWave genoAli was used for alignment ''' anchorwave genoAli -i Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.gff3 -as B73v5_cds.fa -r Zm-B73-REFERENCE-NAM-5.0.fa -a _cds.sam -ar B73v5_ref.sam -s .fa -n v5__alignment.anchors -IV true -o v5__alignment.maf -t 10 -f v5__alignment.f.maf > v5__alignment.log ''' the maf file was converted to gvcf using the MAFToGVCFPlugin as described, then the final result converted to bed files as described. _________ ###METADATA for "Lifted gene model annotations from grasses" tracks: #For all grasses except Gigi and A. virginicus, lifted gene model annotations from each genome to B73v5 was performed on each B73v5 chromosome fasta file separately using the program liftoff https://github.com/agshumate/Liftoff. Lifted gene model annotations from each liftoff output were concatenated, sorted, and then only those annotations which overlapped the syntenome bed files were selected for the final output. For Gigi and A. virginicus, lifted gene model annotations from each genome to B73v5 was performed using the program liftoff. Only those lifted annotations which overlapped the syntenome bed files were selected for the final output.