Output#
MirMachine writes intermediate files under data/ and analyses/ and
final run outputs under results/predictions/.
Main Output Layout#
For --species ExampleSpecies, the key files are:
data/yamls/ExampleSpecies.yaml
results/predictions/gff/ExampleSpecies.PRE.gff
results/predictions/filtered_gff/ExampleSpecies.PRE.gff
results/predictions/fasta/ExampleSpecies.PRE.fasta
results/predictions/heatmap/ExampleSpecies.heatmap.csv
Interpretation#
results/predictions/gff/*.PRE.gffUnfiltered predictions after overlap resolution.
results/predictions/filtered_gff/*.PRE.gffHigh-confidence subset after family-specific trusted cutoff filtering.
results/predictions/fasta/*.PRE.fastaFASTA entries for predicted loci, including confidence label and seed annotations in the header.
results/predictions/heatmap/*.heatmap.csvSummary table for downstream plotting with per-family total and filtered hit counts.
GFF Header Fields#
Header:
##gff-version 3
# MirMachine version: (MirMachine version used)
# CM Models: Built using MirGeneDB 3.0 (database version)
# Total families searched: (Total families searched for this run)
# Node: (Node name given)
# Model: (selected model)
# Genome file: (genome file location)
# Species: (species name given)
# Params: (Command line parameters)
# microRNA families searched: (searched families)
# Expected microRNA family losses: (microRNA family losses for this Node)
# microRNA score: (percent of score families with >=1 hit)
# microRNA seed score: (percent of score families with >=1 hit that has seed != None)
# microRNA hiconf seed score: (percent of score families with >=1 hit that has at least one * seed)
The filtered and unfiltered files each contain all three scores in the header.
Example GFF Row#
An example prediction line:
chrI MirMachine microRNA 9379947 9380005 57.4 - . gene_id=Mir-71.PRE;E-value=3e-10;sequence_with_30nt=TCACACACAGAGGTTGTCTGCTCTGAACGATGAAAGACATGGGTAGTGAGACGTCGGAGCCTCGTCGTATCACTATTCTGTTTTTCGCCGTCGGGATCGTGACCTGGAAGCTGTAAACT
Field meanings:
chrI: chromosome/contig nameMirMachine: annotation sourcemicroRNA: feature type9379947and9380005: start/end57.4: CM bitscore-: strandgene_id: miRNA family IDE-value: hit E-valuesequence_with_30nt: hit sequence with 30 nt flanks
FASTA files#
Example FASTA of a prediction:
>Mir-9.PRE_chrI_9332963_9333028_(+)_56.8_HIGHconf_p5_seed(CTTTGGT)_p3_seed(AAAGCTA,TAAAGCT)
TCTTTGGTGATTCAGCTTCAATGATTGGCTACAGGTTTCTTTCATAAAGCTAGGTTACCAAAGCTC
Header pattern:
>family_chr_start_end_(strand)_bitscore_confidence_seed_annotations
Confidence values are HIGHconf or LOWconf.
Heatmap CSV#
results/predictions/heatmap/<species>.heatmap.csv includes metadata lines
from the GFF header followed by a CSV table with this schema:
species,query_node,family,node,total_hits,filtered_hits,unfiltered_seed,filtered_seed,unfiltered_hiconf_seed,filtered_hiconf_seed