Innuendo Whole Genome And Core Genome Mlst Schemas And Datasets For Campylobacter Jejuni

  1. Rossi, Mirko 1
  2. Silva, Mickael Santos Da 2
  3. Ribeiro-Gonçalves, Bruno Filipe 2
  4. Silva, Diogo Nuno 2
  5. Machado, Miguel Paulo 2
  6. Oleastro, Mónica 3
  7. Borges, Vítor 3
  8. Isidro, Joana 3
  9. Viera, Luis 3
  10. Barker, Dillon OR 4
  11. Llarena, Ann-Katrin 1
  12. Halkilahti, Jani 5
  13. Jaakkonen, Anniina 6
  14. Kivistö, Rauni 1
  15. Kovanen, Sara 7
  16. Nieminen, Timo 8
  17. Hänninen, Marja-Liisa 1
  18. Salmenlinna, Saara 5
  19. Hakkinen, Marjaana 6
  20. Garaizar, Javier 9
  21. Bikandi, Joseba 9
  22. Hilbert, Friederike 10
  23. Taboada, Eduardo N 4
  24. Carriço, João André 2
  1. 1 University of Helsinki
    info

    University of Helsinki

    Helsinki, Finlandia

    ROR https://ror.org/040af2s02

  2. 2 Universidade de Lisboa
    info

    Universidade de Lisboa

    Lisboa, Portugal

    ROR https://ror.org/01c27hj86

  3. 3 Instituto Nacional de Saúde Dr. Ricardo Jorge
    info

    Instituto Nacional de Saúde Dr. Ricardo Jorge

    Lisboa, Portugal

    ROR https://ror.org/03mx8d427

  4. 4 Public Health Agency of Canada
  5. 5 Terveyden ja hyvinvoinnin laitos
  6. 6 Elintarviketurvallisuusvirasto
  7. 7 Univeristy of Helsinki
  8. 8 Helsingin yliopisto Ruralia-instituutti
  9. 9 Universidad del País Vasco/Euskal Herriko Unibertsitatea
    info

    Universidad del País Vasco/Euskal Herriko Unibertsitatea

    Lejona, España

    ROR https://ror.org/000xsnr85

  10. 10 University of Veterinary Medicine Vienna
    info

    University of Veterinary Medicine Vienna

    Viena, Austria

    ROR https://ror.org/01w6qp003

Editor: Zenodo

Año de publicación: 2018

Tipo: Dataset

CC BY 4.0

Resumen

<strong>Dataset</strong> Raw reads deposited in the European Nucleotide Archive (ENA) or in the NCBI Sequence Read Archive (SRA) as <em>C. jejuni</em> were retrieved in April 2017. In total 5,691 genomes passed the INNUca v3.1 pipeline have been selected. Additionally, 566 raw reads previously published in Kovanen et al., 2016, Llarena et al., 2016, Kovanen et al., 2014, Kovanen et al., 2014 and Gacia-Sanchez et a., 2017 were included. The database also includes 269 <em>C. jejuni</em> belonging to the INNUENDO Sequence Dataset (PRJEB27020). Genomes were assembled using INNUca v3.1 pipeline and passed the QC. File 'Metadata/Cjejuni_metadata.txt' contains metadata information for each strain including country and year of isolation, source classification and taxa of the host, classical pubMLST 7 genes ST and CC classification. The directory 'Genomes' contains all the 6,526 INNUca V3.1 assemblies of the strains listed in 'Metadata/Cjejuni_metadata.txt'. <strong>Schema creation and validation</strong> Draft genome assemblies were annotated using Prokka and initial pangenome was defined using Roary. The <em>chewBBACA CreateSchema.py</em> was used for creating a whole genome schema starting from roary pangenome. The schema was initially composed by 5,447 loci and has been populated with the 6,526 <em>C. jejuni</em> genomes. The quality of the loci has been assessed using <em>chewBBACA Schema Evaluation</em>. Loci with single alleles and those with high length variability (i.e. if more than 1 allele is outside the mode +/- 0.05 size) have been removed. The wgMLST schema has been further curated, excluding all those loci detected as “Repeated Loci” and loci annotated as “non-informative paralogous hit (NIPH/ NIPHEM)” or “Allele Larger/ Smaller than length mode (ALM/ ASM)” by the <em>chewBBACA Allele Calling</em> engine in more than 1% of the <em>C. jejuni</em> genomes dataset. File 'Schema/Cjejuni_wgMLST_2795_schema.tar.gz' contains the wgMLST schema formatted for chewBBACA and includes a total of 2,795 loci. File 'Schema/Cjejuni_cgMLST_678_listGenes.txt' contains the list of genes from the wgMLST schema which defines the cgMLST schema. The cgMLST schema consists of 678 loci and has been defined as the loci present in at least the 99.9% of the 6,526 <em>C. jejuni</em> genomes. Genomes have no more than 2% of missing loci. File 'Allele_Profles/Cjejuni_wgMLST_alleleProfiles.tsv' contains the wgMLST allelic profile of the 6,526 <em>C. jejuni</em> genomes of the dataset. Please note that missing loci follow the annotation of chewBBACA Allele Calling software. File 'Allele_Profles/Cjejuni_cgMLST_alleleProfiles.tsv' contains the cgMLST allelic profile of the 6,526 <em>C. jejuni</em> genomes of the dataset. Please note that missing loci are indicated with a zero. <strong>Additional citations</strong> The schema are prepared to be used with <strong>chewBBACA</strong>. When using the schema in this repository please cite also Silva M, Machado M, Silva D, Rossi M, Moran-Gilad J, Santos S, Ramirez M, Carriço J. chewBBACA: A complete suite for gene-by-gene schema creation and strain identification. 15/03/2018. M Gen 4(3): doi:10.1099/mgen.0.000166 http://mgen.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000166