Innuendo Whole Genome And Core Genome Mlst Schemas And Datasets For Yersinia Enterocolitica

Rossi, Mirko; Silva, Mickael Santos Da; Ribeiro-Gonçalves, Bruno Filipe; Silva, Diogo Nuno; Machado, Miguel Paulo; Oleastro, Mónica; Borges, Vítor; Isidro, Joana; Viera, Luis; Halkilahti, Jani; Jaakkonen, Anniina; Laukkanen-Ninios, Riikka; Fredriksson-Ahomaa, Maria; Salmenlinna, Saara; Hakkinen, Marjaana; Garaizar, Javier; Bikandi, Joseba; Hilbert, Friederike; Carriço, João André

doi:10.5281/ZENODO.1323671

Innuendo Whole Genome And Core Genome Mlst Schemas And Datasets For Yersinia Enterocolitica

Rossi, Mirko ¹
Silva, Mickael Santos Da ²
Ribeiro-Gonçalves, Bruno Filipe ²
Silva, Diogo Nuno ²
Machado, Miguel Paulo ²
Oleastro, Mónica ³
Borges, Vítor ³
Isidro, Joana ³
Viera, Luis ³
Halkilahti, Jani ⁴
Jaakkonen, Anniina ⁵
Laukkanen-Ninios, Riikka ¹
Fredriksson-Ahomaa, Maria ¹
Salmenlinna, Saara ⁴
Hakkinen, Marjaana ⁵
Garaizar, Javier ⁶
Bikandi, Joseba ⁶
Hilbert, Friederike ⁷
Carriço, João André ²

1 University of Helsinki

University of Helsinki

Helsinki, Finlandia

ROR https://ror.org/040af2s02
2 Universidade de Lisboa

Universidade de Lisboa

Lisboa, Portugal

ROR https://ror.org/01c27hj86
3 Instituto Nacional de Saúde Dr. Ricardo Jorge

Instituto Nacional de Saúde Dr. Ricardo Jorge

Lisboa, Portugal

ROR https://ror.org/03mx8d427
4 Terveyden ja hyvinvoinnin laitos
5 Elintarviketurvallisuusvirasto
6 Universidad del País Vasco/Euskal Herriko Unibertsitatea

Universidad del País Vasco/Euskal Herriko Unibertsitatea

Lejona, España

ROR https://ror.org/000xsnr85
7 University of Veterinary Medicine Vienna

University of Veterinary Medicine Vienna

Viena, Austria

ROR https://ror.org/01w6qp003

Montrer des affiliations +

Éditeur: Zenodo

Année de publication: 2018

Type: Dataset

DOI: 10.5281/ZENODO.1323671 Accès ouvert editor

Résumé

Dataset All the raw reads deposited in the European Nucleotide Archive (ENA) or in the NCBI Sequence Read Archive (SRA) as Y. enterocolitica at the time of the analysis (August 2018) were retrieved using getSeqENA. A total of 252 genomes were successfully assembled using INNUca v3.1. In addition to public available genomes, the database includes 79 novel Y. enterocolitica strains which belong to the INNUENDO Sequence Dataset (PRJEB27020). File 'Metadata/Yenterocolitica_metadata.txt' contains metadata information for each strain including country and year of isolation, source classification, taxon of the host, serotype, biotype, pathotype (according to patho_typing software) and classical pubMLST 7 genes ST according to Hall et al., 2005. The directory 'Genomes' contains all the 331 INNUca V3.1 assemblies of the strains listed in 'Metadata/Yenterocolitica_metadata.txt'. Schema creation and validation All the 331 genomes were used for creating the schema using chewBBACA suite. The quality of the loci have been assessed using chewBBACA Schema Evaluation and loci with single alleles, those with high length variability (i.e. if more than 1 allele is outside the mode +/- 0.05 size) and those present in less than 1% of the genomes have been removed. The wgMLST schema have been further curated, excluding all those loci detected as “Repeated Loci” and loci annotated as “non-informative paralogous hit (NIPH/ NIPHEM)” or “Allele Larger/ Smaller than length mode (ALM/ ASM)” by the chewBBACA Allele Calling in more than 1% of a dataset. File 'Schema/Yenterocolitica_wgMLST_ 6344_schema.tar.gz' contains the wgMLST schema formatted for chewBBACA and includes a total of 6,344 loci. File 'Schema/Yenterocolitica_cgMLST_ 2406_listGenes.txt' contains the list of genes from the wgMLST schema which defines the cgMLST schema. The cgMLST schema consists of 2,406 loci and has been defined as the loci present in at least the 99% of the 331 Y. enterocolitica genomes. Genomes have no more than 2% of missing loci. File 'Allele_Profles/Yenterocolitica_wgMLST_alleleProfiles.tsv' contains the wgMLST allelic profile of the 331 Y. enterocolitica genomes of the dataset. Please note that missing loci follow the annotation of chewBBACA Allele Calling software. File 'Allele_Profles/Yenterocolitica_cgMLST_alleleProfiles.tsv' contains the cgMLST allelic profile of the 331 Y. enterocolitica genomes of the dataset. Please note that missing loci are indicated with a zero. Additional citation The schema are prepared to be used with chewBBACA. When using the schema in this repository please cite also: Silva M, Machado M, Silva D, Rossi M, Moran-Gilad J, Santos S, Ramirez M, Carriço J. chewBBACA: A complete suite for gene-by-gene schema creation and strain identification. 15/03/2018. M Gen 4(3): doi:10.1099/mgen.0.000166 http://mgen.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000166

Innuendo Whole Genome And Core Genome Mlst Schemas And Datasets For Yersinia Enterocolitica

University of Helsinki

Universidade de Lisboa

Instituto Nacional de Saúde Dr. Ricardo Jorge

Universidad del País Vasco/Euskal Herriko Unibertsitatea

University of Veterinary Medicine Vienna

Résumé