Abstract
The number of publicly available bacterial genome sequences is staggering (1.1milion assemblies in NCBI alone) and the rate of deposition is only increasing, with Brucella sequences being no exception. This wealth of data is juxtaposed with the lack of phylogenetic methods to robustly place these sequences within an evolutionary context. Not only does a phylogenetic placement aid in taxonomic classification, but also informs the evolution of novel phenotypes (host switching), targets of selection (host immune evasion mechanisms), and horizontal gene transfer (AMR/virulence genes). Methods fo reconstructing trees include comparing 16S ribosomal or other single loci, multi locus, whole genome alignments and Kmer based SNP analysis. All of these methods suffer from narrow taxonomic resolution, with 16S working well for higher taxon divisions and whole genome alignments working well for closely related samples. Here I present OrthoPhylo, a phylogenetic pipeline that takes bacterial genomes, annotates them and identifies orthologs, converts protein to nucleotide alignments, then builds species trees with both concatenated alignments and gene tree to species tree methods. The workflow has been designed to accept large numbers of input genomes (>1000) by identifying samples that represent the diversity of the whole dataset, and using these genomes to build models and identify orthologs. This strategy allows the generation of trees for ~1000 bacterial assemblies in ~30hrs using 30 cpus, with the majority of this time being taken up by ML based tree generation. This pipeline is designed to be an easy to install, turn-key solution for generating high resolution bacterial trees from species that can differ by more than 30% nucleotide identity. Here I present findingson the state of publicly available Brucella assemblies and their phylogenetic placements with a focus on sequences from East Africa, an understudied region of endemic Brucella infections. This effort is a part of a recently initiated five-year cross-sectional survey to assess risk factors associated with brucellosis in the East African countries of Tanzania and Rwanda.