- Research article
- Open Access
A GWAS on Helicobacter pylori strains points to genetic variants associated with gastric cancer risk
- Elvire Berthenet1,
- Koji Yahara2,
- Kaisa Thorell3,
- Ben Pascoe4,
- Guillaume Meric4,
- Jane M. Mikhail1, 5,
- Lars Engstrand3,
- Helena Enroth6,
- Alain Burette7,
- Francis Megraud8, 9,
- Christine Varon9,
- John C Atherton10,
- Sinead Smith11,
- Thomas S. Wilkinson1,
- Matthew D. Hitchings1,
- Daniel Falush4Email author and
- Samuel K. Sheppard4Email author
© Sheppard et al. 2018
- Received: 12 June 2018
- Accepted: 19 July 2018
- Published: 2 August 2018
Helicobacter pylori are stomach-dwelling bacteria that are present in about 50% of the global population. Infection is asymptomatic in most cases, but it has been associated with gastritis, gastric ulcers and gastric cancer. Epidemiological evidence shows that progression to cancer depends upon the host and pathogen factors, but questions remain about why cancer phenotypes develop in a minority of infected people. Here, we use comparative genomics approaches to understand how genetic variation amongst bacterial strains influences disease progression.
We performed a genome-wide association study (GWAS) on 173 H. pylori isolates from the European population (hpEurope) with known disease aetiology, including 49 from individuals with gastric cancer. We identified SNPs and genes that differed in frequency between isolates from patients with gastric cancer and those with gastritis. The gastric cancer phenotype was associated with the presence of babA and genes in the cag pathogenicity island, one of the major virulence determinants of H. pylori, as well as non-synonymous variations in several less well-studied genes. We devised a simple risk score based on the risk level of associated elements present, which has the potential to identify strains that are likely to cause cancer but will require refinement and validation.
There are a number of challenges to applying GWAS to bacterial infections, including the difficulty of obtaining matched controls, multiple strain colonization and the possibility that causative strains may not be present when disease is detected. Our results demonstrate that bacterial factors have a sufficiently strong influence on disease progression that even a small-scale GWAS can identify them. Therefore, H. pylori GWAS can elucidate mechanistic pathways to disease and guide clinical treatment options, including for asymptomatic carriers.
- Helicobacter pylori
- Gastric cancer
The bacterium Helicobacter pylori can colonize the stomach for years without causing any symptoms , but its presence is associated with several serious clinical diseases including peptic ulcer, gastric cancer and MALT lymphoma. Progression to clinical disease depends in part upon diet, environment and host factors [2, 3] as well as the genotypes of the bacteria .
A detailed understanding of the pathways to disease and H. pylori’s role at each stage has the potential to inform treatment options. For example, eradication of H. pylori is recommended for asymptomatic cases  in parts of the world where gastric cancer risk is high, but eradication can be difficult and expensive, especially due to increasing antimicrobial resistance . A better understanding of the role of H. pylori in causing disease and identification of virulent strains would allow intervention to be targeted at patients most at risk of the subsequent disease.
Genome-wide association studies (GWAS) have become popular in human genetics as a way of investigating the basis of susceptibility to particular diseases . Individuals with the disease and matched controls are genotyped, and statistical tests are performed to identify variants that are disease-associated. Functional characterization of the associated regions provides insight into how disease develops and allows the identification of “at risk” individuals for prophylactic treatments. GWAS can also be applied to bacteria [8, 9]. There are several challenges that are shared with human association studies, such as the difficulty of accurately delineating phenotypes and obtaining matched controls as well as potential false positives resulting from population structure and genetic linkage.
There are also challenges specific to H. pylori GWAS. For example, causative strains may be absent when disease is detected, particularly because precancerous lesions change the physiology of the stomach and can destroy the niche that the bacterium previously occupied. Furthermore, the pathway from asymptomatic carriage to disease can vary, as can the outcome. For example, antral-predominant gastritis is often associated with a higher level of acid production and is more likely to evolve into duodenal ulcer or MALT lymphoma, whereas corpus-predominant atrophic gastritis is associated with a lower level of acid production and can lead to gastric ulcer or gastric cancer .
Here, we assemble an H. pylori isolate genome collection from clinically characterized samples, including from individuals with non-atrophic gastritis, atrophic gastritis, intestinal metaplasia, and gastric cancer. We applied GWAS techniques that have been developed for other bacteria, limiting analysis to isolates from the hpEurope population to avoid confounding by population structure. We show that signals of association are sufficiently strong to identify putative cancer-associated elements using a small number of samples, highlighting the potential of bacterial GWAS to inform treatment of H. pylori infection.
Genome-wide association study
Summary of the hits obtained in the genome-wide association studies based on 173 strains from hpEurope-derived sub-populations based upon patient disease phenotype
Number of hits with p value
Number of genes with hits of p value
≤ 10− 6
≤ 10− 5
≤ 10− 6
Gastric cancer vs others (k-mer)
Non-atrophic gastritis vs others (k-mer)
Gastric cancer vs others (SNP)
Non-atrophic gastritis vs others (SNP)
Amongst the 32 genes with hits at a p value ≤ 10−5 (Additional file 5: Table S3), 13 were in genes with putative functions associated with virulence of H. pylori such as CagPAI and type IV secretion system [12, 13] (11 genes), buffering of gastric acid  (ureG) or adherence  (babA). Further, 8 genes had putative functions that may also be indirectly linked to virulence, such as colonization (hpaA), motility (fliK) or more generally membrane and outer membrane proteins (5 genes). A total of 12 genes were either hypothetical proteins with unknown functions (2 genes), or had functions not previously linked to virulence; amongst them were genes associated with enzymes (6 genes), ribosome maturation factors (2 genes), transporters (1 gene) and a DNA-binding protein (1 gene).
The most significantly associated 118 GWAS hits were in 12 genes (64 SNPs and 46 k-mers) and had a frequency difference > 20% and a p value ≤ 10−6 (Table 1). Only one gene, babA, was a hit with a p value ≤ 10−6 in two GWAS experiments (SNP GC vs rest and SNP NAG vs rest). In order to keep the number of genes for risk score calculation low, only these 12 genes were investigated further.
Cancer risk genotypes identified in genome-wide association studies of 173 hpEurope isolates
p value (min)
Effect on amino acid sequence4
HP1055 [981621–982,565] (−)
S, associated with G to A substitution at position 797: non-synonymous with T in safe, A in risk
Outer membrane protein
HP0797 [506543–507,325] (+)
C + T
325 and 334
T + G
NS: L/S in safe, F/A in risk
Neuraminyllactose-binding hemagglutinin (HpaA) 
HP1243,babA1 [1314192–1,316,405] (−)
BabA (outer membrane protein) 
HP0747 [317158–317,757] (+)
934 to 937
NS: KA in safe, GT in risk
HP0709 [598549–599,451] (−)
NS: D in safe, N in risk
HP0532,cag12 [817677–818,519] (+)
CagT protein (Censini, 1996)
HP0468 [925539–927,026] (+)
705 to 708
NS: T in safe, A in risk
HP0531,cag11 [816985–817,641] (+)
CagU protein (Censini, 1996)
HP0541,cag20 [825334–826,446] (−)
CagH protein (Censini, 1996)
It has been known for some time that the presence of certain genes in H. pylori strains increases the risk that the host will develop gastric cancer , and for genes such as those in the Cag pathogenicity island, the mechanism is well-characterized . Technical advances in high-throughput DNA sequencing and the increasing availability of whole genome data for diverse H. pylori isolate collections provide opportunities for quantitative genomic analysis of population structure  and the genetic determinants of important disease phenotypes.
Host and environmental factors and different pathways to disease impose additional complexity when identifying cancer-associated genes in H. pylori, compared to standard binary bacterial GWAS . However, even in the relatively small isolate collection in this study, variation in known cancer-associated genes, including CagPAI, was identified, as well as in genes that have not previously been associated with virulence.
Cancer-associated nucleotide variation was largely the result of the presence of accessory genes and enrichment for non-synonymous SNPs in homologous sequence. While interpretation of sequence or whole gene insertion and variation that causes changes in protein sequences is easier to interpret in relation to functional variation, 3 of the 12 most significant GWAS hits were synonymous SNPs associated with gastric cancer isolates. There are several potential explanations for these hits. First, synonymous sequence variation associated with isolates from gastric cancer patients can be in linkage disequilibrium with non-synonymous SNPs, which may give lower p values despite being the functional drivers of the association. Second, synonymous mutations can have functional effects , and there is evidence of selection acting across the H. pylori genome . Third, frameshifts or uncharacterized start codons lead to misinterpretation of non-synonymous SNPs as synonymous. Finally, some may represent false positives.
Investigating the putative function of genes containing sequence elements associated with cancer can provide clues about the bacterial phenotypes that promote the development of disease in infected individuals, as well as providing novel targets for diagnosis and intervention. As expected from previous studies , our GWAS identified elements in CagPAI genes (cag11, cag12 and cag20) and babA that were associated with isolates from patients with gastric cancer. CagPAI-positive strains are known to predominate in gastric cancer patients  and are associated with enhanced immune response through diverse pathways starting with the injection of CagA through a type IV secretion system into host epithelial cells .
The blood group antigen-binding adhesin BabA is an outer membrane protein linked to the activity of the CagPAI island through adhesion to the host cells . The binding characteristics of babA in different strains are known to vary in relation to the blood types in host populations , showing an important and specific evolutionary pressure on H. pylori isolates . BabA expression is regulated by phase variation and recombination between babA and highly homologous genes babB and babC with important consequences for binding characteristics and affinities . The homology between bab genes, which can all be absent or present as duplicates, imposes challenges for the de novo assemblies of Illumina short reads in this study. Specifically, of 173 sequences annotated, 48 contained full babA sequence, while for other genomes, only partial bab gene sequence(s) were annotated, often at the end of a contig, reflecting challenges associated with genome assembly and the interchangeability of these loci.
In addition to quantifying the effect of known H. pylori virulence genes, the GWAS approach employed here also provided evidence for a role for genes that have not previously been linked to gastric cancer. In addition to BabA, a second outer membrane protein, encoded by HP1055, was strongly associated with cancer. While little is known about the specific function of this gene, other than its essentiality demonstrated in transposon mutagenesis experiments , outer membrane proteins can influence host-bacteria interactions mediating virulence by modulating colonization and adherence to the host cells and facilitating secretion of virulence factors. A possible link to enhanced cancer risk is that HP1055 contains sequence enriched for African ancestry  and conflicts between the host and bacterial genetic population are a risk factor for gastric cancer [25, 26].
The function of the gene harbouring the second strongest cancer-associated GWAS hit in this study, HpaA (HP0797), is the subject of some debate. Originally described as a sialic acid-binding protein involved in adhesion , it is now thought to have a role as a lipoprotein  and is essential for stomach colonization in an in vivo mouse model . A speculative role in disease progression could be related to the strong immunogenic properties of the HpaA protein , and the substitutions described in our study alter the orientation of one of the helix formations (Additional file 4: Figure S4) which may be related to changes in protein function. This protein is considered as a target for vaccine development [31, 32].
Other H. pylori genes, in which a highly significant association was found with gastric cancer included trmB (HP0747), and the less well-annotated HP0709 and HP0468. trmB, homologous to E. coli Yggh, encodes a predicted S-adenosylmethionine-dependent methyltransferase regulated by the H. pylori orphan response regulator HP1021 , presumably involved in the regulation of acetone metabolism. It has also been identified as a gene with overrepresented radical substitutions in fast-evolving regions . HP0709 encodes an enzyme that is involved in either methylation of DNA and proteins or in the synthesis of branched amino acids valine, leucine and isoleucine. However, the exact function is not certain and conflicting annotations  make protein structure prediction problematic, making it difficult to compare alleles in our dataset beyond the identification of cancer-associated SNPs. HP0468 encodes a hypothetical protein, poorly conserved outside the Helicobacter genus. It is upregulated by molecular hydrogen in chemolithoautotrophically enhanced growth of H. pylori, but its exact function is yet to be determined .
The GWAS approach used in this study supports known genotype-phenotype associations as well as providing information about specific genetic variations and highlighting a potential role for candidate genes that have not previously been related to gastric cancer. Quantitative GWAS using natural H. pylori populations is complicated by numerous host and pathogen changes in the progression from asymptomatic carriage to gastric cancer. This involves changes to stomach cells, pH, the extracellular mucus layer and changes in the selective landscape for the pathogen, promoting strains with functions related to adherence, motility and immune evasion that can survive in the harsh changing acidic environment. These changes make the phenotype complex, especially since the strains that are most responsible for disease progression need not be those that are isolated from gastric cancer patients. Nevertheless, our results are encouraging since they suggest that the most important factors may have large effect on progression and therefore be detectable in GWAS cohorts despite inevitable imperfections in the sampling design due to the difficulty of finding well-matched cases and controls.
In addition to providing information on the biology of disease progression, GWAS may be of direct relevance in the clinic. By sequencing the strains before eradicating them, we could assess the risk of gastric cancer, enabling closer surveillance of those with increased risk while avoiding unnecessary treatment for the others, therefore reducing the proportion of highly pathogenic strains in the overall H. pylori population and mitigate the spread of antimicrobial resistance.
Isolates and genome sequencing
A total of 565 H. pylori isolate genomes were analysed in this study (Additional file 5: Table S1). This dataset comprised 122 strains isolated from clinical samples including strains isolated in France (from patients from different areas of France enrolled in studies carried out by the GEFH, the GELD and FFCG, and the GELA), Belgium (from patients attending the endoscopy clinic of CHIREC—sites de la Basilique and E. Cavell, Brussels), the UK (biopsies from patients attending for upper GI endoscopy at Nottingham University Hospitals NHS Trust), Sweden (eight hospitals) and Dublin (the Meath Foundation Research Laboratory, Tallaght Hospital, Dublin), and 444 publically available genomes from published papers  and the NCBI database. Swedish isolates were a subset of the collection assembled by Enroth and colleagues in a previously published study . For isolates sequenced for this study, bacteria were sampled from patients presenting with gastric cancer, gastritis, gastrointestinal stromal tumour (GIST) or no symptoms, from 1995 to present by gastric biopsy and grown on H. pylori-selective medium (Dent plates) at 37 °C in a microaerophilic environment (CampyGen or microaerophilic cabinet) for 5 to 10 days. Isolates from gastric MALT lymphoma or other non-adenocarcinoma forms of cancer (apart from 1 GIST isolate) were excluded from analysis. Colonies were isolated as single colonies and subcultured on fresh blood agar plates to obtain sufficient growth, and for genomic DNA extraction, DNA was quantified using a NanoDrop spectrophotometer, as well as the Quant-iT DNA Assay Kit (Life Technologies, Paisley, UK) before sequencing. High-throughput genome sequencing was performed using a HiSeq 2500 machine (Illumina, San Diego, CA, USA), and the 100-bp short read paired-end data was assembled using the de novo assembly algorithm, Velvet  (version 1.2.08). The VelvetOptimiser script (version 2.2.4) was run for all odd k-mer values from 21 to 99. The minimum output contig size was set to 200 bp with default settings, and the scaffolding option was disabled. The average number of contiguous sequences (contigs) for genomes sequenced in this study was 111 with an average total assembled genome size of 1,630,194 bp and an average N50 length of 55.98 kbp. Short reads for the 107 genomes sequenced and assembled in Swansea are available from the NCBI short read archive (SRA) associated with BioProject: PRJNA395900. All 565 contiguous assemblies of whole genome sequences were individually archived on the web-based database platform BIGSdb  and are available at the public data repository figshare (https://figshare.com/articles/Helicobacter_pylori_from_clinical_gastric_infection/5245837).
Individual genes from the 26,695 H. pylori reference genome were locally aligned to the 776 Helicobacter pylori genomes available at the time of analysis using default BLAST parameters implemented in BIGSdb. A gene was recorded as present when the local alignment had at least 70% sequence identity on at least 50% of the sequence length. This allowed gene discovery, sequence export and local gene-by-gene alignments using MAFFT , as previously described [41, 42]. Sixty strains that were not from a human clinical source and 5 strains with a number of genes below 1000 were removed and a tree was constructed from an alignment of the remaining strains using FastTree v2.0 . One hundred forty-six clones were removed from the analysis based on the clustering observed on the tree. The remaining 565 strains constituted our working dataset, and the population structure amongst these strains was inferred from genome-wide haplotype data using chromosome painting and fineSTRUCTURE , as in previously published H. pylori genome analysis . Briefly, donor and recipient DNA chunks were inferred for each recipient haplotype using ChromoPainter (version 0.04). The number of recombination-derived chunks from each donor to each recipient was summarized in a co-ancestry matrix. fineSTRUCTURE (version 0.02) was run with 100,000 iteration burn-in and 100,000 MCMC iterations to cluster isolates based on the co-ancestry matrix. Principal component analysis was carried out on our data using the standard PCA implemented in Eigensoft. Specifically, on all biallelic data after pruning of SNPs with r2 > 0.7, Popstats (“GitHub - pontussk/popstats: Population genetic summary statistics,” n.d.) were used to calculate D-statistics and specify previously described H. pylori populations .
Isolate genomes were partitioned into groups based upon metadata from patient information collected as part of this study or taken from existing publications. To be able to identify risk factors of the carcinogenic progression, three groups were applied: (i) isolates from patients with gastric cancer (GC), (ii) isolates from individuals with intestinal metaplasia or atrophic gastritis, which we termed “progressive to cancer” (Prog) and (iii) isolates from individuals with non-atrophic gastritis (NAG). To reduce the impact of the phylogeographic structure  on identification of disease-associated genetic elements, the remaining analyses focussed on the largest dataset for which patient data and geographic origin were available within one unique fineSTRUCTURE population. This included 173 hpEurope isolates (Additional file 5: Table S2). Subpopulations included in hpEurope were based on previous study and included hspEuropeColombia, hspEuropeN and hspEuropeS . A phylogeny for 173 isolates was constructed for visualization of the population using the simple and efficient tree building software FastTree v2.0  and annotated using iTOL v3ic  (Fig. 1). Input data included 1573 concatenated genes, identified in the 26,695 reference strains, aligned for all isolate genomes.
Genome-wide association studies
The genome-wide association study (GWAS) was conducted with a pipeline based on the bugwas package , as in a recent study . Briefly, in this k-mer-based approach , the genome sequence of each isolate was fragmented into unique, overlapping, 31-bp DNA motifs or k-mers. This allowed the identification of nucleotide variation including single nucleotide polymorphisms (SNPs), indels and the presence or absence of a whole gene or gene region associated with different phenotype groups. DNA motifs significantly associated with gastric cancer were explored after accounting for the inter-dependence of the strains and population structure. An n × n relatedness matrix summarized all genetic covariance amongst the isolate genomes, employing statistical tests for each k-mer by the linear mixed regression model, which uses the relatedness matrix to model the background random effect. Unlike related methods [9, 47], this method does not depend on a single clonal tree that is impossible to construct reliably because of the high rate of recombination in H. pylori. A second GWAS, also implemented in the bugwas package , was carried out based upon SNPs rather than k-mers. Only the SNPs contained in coding sequences were considered. The k-mer and SNP GWAS approaches were applied to bacterial datasets in two binary phenotype association experiments: (i) GC vs Prog and NAG isolates and (ii) NAG vs GC and Prog isolates. This gave a total of four GWAS experiments.
Analysis of associated elements
The odds ratio and p value was calculated for associated elements in the GWAS experiments and the position of hits in a reference genome. Specifically, a reference pan genome was produced using Roary software  with default parameters, and annotation was carried out using Prokka . GWAS hits, representing both core and accessory nucleotide variation, were then analysed individually to investigate the putative function of the associated genes and the effect of the variations identified in the amino acid sequence. Positions of hits in all analyses were considered using the reference strain ELS37 (GCA_000255955.1). This reference strain was chosen as being part of the GC isolates used in our study with a closed genome sequence.
A limitation of k-mer-based GWAS approaches is that they reveal significantly associated sequence within genes and not the entire gene presence and absence. For this reason, the prevalence of genes (presence/absence) containing at least one significant k-mer (p value ≤ 10− 5) was determined for genomes in our dataset (n = 143) using BLAST. A gene was considered present when the sequence from the genome shared more than 70% sequence homology with the corresponding gene sequence from H. pylori reference strain ELS37. We examined the correlation of prevalence patterns across our dataset for these genes, by using the rcorr function in the Hmisc R package to compute correlation coefficients and the p value of the correlation for all possible pairs of gene presence/absence patterns. The input was a binary matrix of presence/absence of the genes in 143 genomes.
All the genes that contained a GWAS hit at p value < 1 × 10−6 were individually investigated using BioEdit , based on a global alignment obtained from the GWAS. Synonymous and non-synonymous variation was identified by comparison to amino acid sequence alignments, and non-synonymous hits were further studied using figures showing repartition of amino acids in each position according to the GWAS group of each strain, using WebLogo . Genes identified showed there was clear enrichment for particular alleles in GC strains. Genes with GWAS hits (p value < 1 × 10−6) were mapped to the corresponding genome position on the reference genome ELS37 (GCA_000255955.1) using Circos V0.69 , and the context of individual genes was characterized with BioCyc .
Prediction of protein structure
For the genes where the risk alleles were associated with non-synonymous changes to the amino acid sequence of the encoded proteins, we tried to predict what impact these changes would impose on the tertiary structure of the proteins. For this purpose, we used hhpred  as it is implemented in the MPI Bioinformatics Toolkit  as of 4 June 2017 to identify the most suitable structure to model from. This structure was then used to model both the safe and risk sequence using default parameters, and the models were annotated and visualized in Swiss-PDB viewer .
We thank all the researchers worldwide that have whole-genome sequenced Helicobacter pylori isolates and made their data available to us, either by personal connections or by making the data publicly available. We thank Dr. Youri Glupczynski, Dr. Yvette Miendje-Deyi, Dr. Deirdre McNamara, and Dr. Colm O’Morain for the recruitment of patients and collection of biopsies.
Elvire Berthenet is funded by a grant from HCRW. Sam K Sheppard is a principal investigator for the MRC CLIMB consortium (MR/L015080/1) and Daniel Falush is supported by a fellowship as part of MRC CLIMB (MR/M501608/1). S.K.S. is also funded by MRC grant G0801929, BBSRC grant BB/I02464X/1 and the Wellcome Trust. Jane Mikhail received funding from MITReG, St David’s Medical Foundation and ABMUHB.
Availability of data and materials
All data generated or analysed during this study are included in this published article and its supplementary information files. Short reads for the 107 genomes sequenced and assembled in Swansea are available from the NCBI short read archive (SRA) associated with BioProject: PRJNA395900. All 565 contiguous assemblies of whole-genome sequences are available at the public data repository figshare (https://figshare.com/articles/Helicobacter_pylori_from_clinical_gastric_infection/5245837).
EB gathered the dataset, analysed the GWAS results and performed the risk score calculation and was a major contributor in writing the manuscript. KY performed the fineSTRUCTURE and GWAS analyses. KT investigated the babA alignment and was a major contributor in discussing the results and writing the manuscript. BP and MDH were involved in sequencing the isolates. GM produced Fig. 2 and Fig. 3. JM was involved in gathering the strains from collaborators and initiating the project. LE, HE, AB, FM, CV, JCA and SS shared strains and information linked to the strains. TSW supervised EB and reviewed the manuscript. DF and SKS lead the project and were major contributors in writing the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
All strains were collected with full ethics approval. No animal or human tissue was used in this study.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Peek RM, Blaser MJ. Helicobacter pylori and gastrointestinal tract adenocarcinomas. Nat Rev Cancer. 2002;2:28–37.View ArticlePubMedGoogle Scholar
- Cristescu R, et al. Molecular analysis of gastric cancer identifies subtypes associated with distinct clinical outcomes. Nat Med. 2015;21:449–56.View ArticlePubMedGoogle Scholar
- Suerbaum S, Josenhans C. Helicobacter pylori evolution and phenotypic diversification in a changing host. Nat Rev Microbiol. 2007;5:441–52.View ArticlePubMedGoogle Scholar
- Cover TL. Helicobacter pylori diversity and gastric cancer risk. mBio. 2016;7:e01869–15.View ArticlePubMedPubMed CentralGoogle Scholar
- Ierardi E, Giorgio F, Losurdo G, Di Leo A, Principi M. How antibiotic resistances could change Helicobacter pylori treatment: a matter of geography? World J Gastroenterol. 2013;19:8168–80.View ArticlePubMedPubMed CentralGoogle Scholar
- Binh TT, Suzuki R, Trang TTH, Kwon DH, Yamaoka Y. Search for novel candidate mutations for metronidazole resistance in Helicobacter pylori using next-generation sequencing. Antimicrob Agents Chemother. 2015;59:2343–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Consortium" WTCC, et al. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–78.View ArticleGoogle Scholar
- Earle SG, et al. Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nat Microbiol. 2016;1:16041.View ArticlePubMedPubMed CentralGoogle Scholar
- Sheppard SK, et al. Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter. Proc Natl Acad Sci U S A. 2013;110:11923–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Chung D, Glickman J, Carey M, & Chung R (2005) HST.121 Gastroenterology. Fall 2005. Massachusetts Institute of Technology: MIT OpenCourseWare.Google Scholar
- Lawson DJ, Hellenthal G, Myers S, Falush D. Inference of population structure using dense haplotype data. PLoS Genet. 2012;8:e1002453.View ArticlePubMedPubMed CentralGoogle Scholar
- Bhattacharya S, Mukherjee O, Mukhopadhyay AK, Chowdhury R. A conserved Helicobacter pylori gene, HP0102, is induced upon contact with gastric cells and has multiple roles in pathogenicity. J Infect Dis. 2016; https://doi.org/10.1093/infdis/jiw139.
- Parsonnet J, Friedman GD, Orentreich N, Vogelman H. Risk for gastric cancer in people with CagA positive or CagA negative Helicobacter pylori infection. Gut. 1997;40:297–301.View ArticlePubMedPubMed CentralGoogle Scholar
- Mobley HL, Hu LT, Foxal PA. Helicobacter pylori urease: properties and role in pathogenesis. Scand J Gastroenterol Suppl. 1991;187:39–46.View ArticlePubMedGoogle Scholar
- Kim A, et al. Helicobacter pylori bab paralog distribution and association with cagA, vacA, and homA/B genotypes in American and South Korean clinical isolates. PLoS One. 2015;10:e0137078.View ArticlePubMedPubMed CentralGoogle Scholar
- Gerhard M, et al. Clinical relevance of the Helicobacter pylori gene for blood-group antigen-binding adhesin. Proc Natl Acad Sci U S A. 1999;96:12778–83.View ArticlePubMedPubMed CentralGoogle Scholar
- Thorell K, et al. Rapid evolution of distinct Helicobacter pylori subpopulations in the Americas. PLoS Genet. 2017;13:e1006546.View ArticlePubMedPubMed CentralGoogle Scholar
- Bentley SD, Parkhill J. Comparative genomic structure of prokaryotes. Annu Rev Genet. 2004;38:771–91.View ArticlePubMedGoogle Scholar
- Yahara K, et al. Genome-wide survey of codons under diversifying selection in a highly recombining bacterial species, Helicobacter pylori. DNA Res. 2016;23:135–43.Google Scholar
- Odenbreit S, et al. Translocation of Helicobacter pylori CagA into gastric epithelial cells by type IV secretion. Science (New York, N.Y.). 2000;287:1497–500.View ArticleGoogle Scholar
- Borén T, Falk P, Roth KA, Larson G, Normark S. Attachment of Helicobacter pylori to human gastric epithelium mediated by blood group antigens. Science (New York, N.Y.). 1993;262:1892–5.View ArticleGoogle Scholar
- Aspholm-Hurtig M, et al. Functional adaptation of BabA, the H. pylori ABO blood group antigen binding adhesin. Science (New York, N.Y.). 2004;305:519–22.View ArticleGoogle Scholar
- Thorell K, et al. Identification of a Latin American-specific BabA adhesin variant through whole genome sequencing of Helicobacter pylori patient isolates from Nicaragua. BMC Evol Biol. 2016;16:53.View ArticlePubMedPubMed CentralGoogle Scholar
- Salama NR, Shepherd B, Falkow S. Global transposon mutagenesis and essential gene analysis of Helicobacter pylori. J Bacteriol. 2004;186:7926–35.View ArticlePubMedPubMed CentralGoogle Scholar
- de Sablet T, et al. Phylogeographic origin of Helicobacter pylori is a determinant of gastric cancer risk. Gut. 2011;60:1189–95.View ArticlePubMedPubMed CentralGoogle Scholar
- Kodaman N, et al. Human and Helicobacter pylori coevolution shapes the risk of gastric disease. Proc Natl Acad Sci U S A. 2014;111:1455–60.View ArticlePubMedPubMed CentralGoogle Scholar
- Evans DG, Karjalainen TK, Evans DJ, Graham DY, Lee CH. Cloning, nucleotide sequence, and expression of a gene encoding an adhesin subunit protein of Helicobacter pylori. J Bacteriol. 1993;175:674–83.View ArticlePubMedPubMed CentralGoogle Scholar
- O'Toole PW, et al. The putative neuraminyllactose-binding hemagglutinin HpaA of Helicobacter pylori CCUG 17874 is a lipoprotein. J Bacteriol. 1995;177:6049–57.View ArticlePubMedPubMed CentralGoogle Scholar
- Carlsohn E, Nyström J, Bölin I, Nilsson CL, Svennerholm A-M. HpaA is essential for Helicobacter pylori colonization in mice. Infect Immun. 2006;74:920–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Sutton P, et al. Effectiveness of vaccination with recombinant HpaA from Helicobacter pylori is influenced by host genetic background. FEMS Immunol Med Microbiol. 2007;50:213–9.View ArticlePubMedGoogle Scholar
- Tobias J, Lebens M, Wai SN, Holmgren J, Svennerholm A-M. Surface expression of Helicobacter pylori HpaA adhesion antigen on Vibrio cholerae, enhanced by co-expressed enterotoxigenic Escherichia coli fimbrial antigens. Microb Pathog. 2017;105:177–84.Google Scholar
- Zhang R, et al. Construction of a recombinant Lactococcus lactis strain expressing a fusion protein of Omp22 and HpaA from Helicobacter pylori for oral vaccine development. Biotechnol Lett. 2016;38:1911–6.View ArticlePubMedGoogle Scholar
- Pflock M, et al. The orphan response regulator HP1021 of Helicobacter pylori regulates transcription of a gene cluster presumably involved in acetone metabolism. J Bacteriol. 2007;189:2339–49.View ArticlePubMedPubMed CentralGoogle Scholar
- Zheng Y, Roberts RJ, Kasif S. Identification of genes with fast-evolving regions in microbial genomes. Nucleic Acids Res. 2004;32:6347–57.View ArticlePubMedPubMed CentralGoogle Scholar
- Deng H, O'Hagan D. The fluorinase, the chlorinase and the duf-62 enzymes. Curr Opin Chem Biol. 2008;12:582–92.View ArticlePubMedGoogle Scholar
- Kuhns LG, et al. Carbon fixation driven by molecular hydrogen results in chemolithoautotrophically enhanced growth of Helicobacter pylori. J Bacteriol. 2016;198:1423–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Enroth H, Kraaz W, Engstrand L, Nyrén O, Rohan T. Helicobacter pylori strain types and risk of gastric cancer: a case-control study. Cancer Epidemiol Biomarkers Prev. 2000;9:981–5.PubMedGoogle Scholar
- Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18(5):821–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Jolley KA, Maiden MC. BIGSdb: scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics. 2010;11:595.View ArticlePubMedPubMed CentralGoogle Scholar
- Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–80.View ArticlePubMedPubMed CentralGoogle Scholar
- Méric G, et al. A reference pan-genome approach to comparative bacterial genomics: identification of novel epidemiological markers in pathogenic Campylobacter. PLoS One. 2014;9:e92798.View ArticlePubMedPubMed CentralGoogle Scholar
- Sheppard SK, Jolley KA, Maiden MCJ. A gene-by-gene approach to bacterial population genomics: whole genome MLST of Campylobacter. Genes. 2012;3:261–77.View ArticlePubMedPubMed CentralGoogle Scholar
- Price MN, Dehal PS, Arkin AP. FastTree 2 – approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5:e9490.View ArticlePubMedPubMed CentralGoogle Scholar
- Yahara K, et al. Chromosome painting in silico in a bacterial species reveals fine population structure. Mol Biol Evol. 2013;30:1454–64.View ArticlePubMedPubMed CentralGoogle Scholar
- Letunic I, Bork P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 2016;44:W242–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Suzuki M, et al. A genome-wide association study identifies a horizontally transferred bacterial surface adhesin gene associated with antimicrobial resistant strains. Sci Rep. 2016;6:37811.View ArticlePubMedPubMed CentralGoogle Scholar
- Pascoe B, et al. Enhanced biofilm formation and multi-host transmission evolve from divergent genetic backgrounds in Campylobacter jejuni. Environ Microbiol. 2015;17:4779–89.View ArticlePubMedPubMed CentralGoogle Scholar
- Page AJ, et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics. 2015;31:3691–3.View ArticlePubMedPubMed CentralGoogle Scholar
- Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30:2068–9.View ArticlePubMedGoogle Scholar
- Hall TA. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser. 1999;41:95–8.Google Scholar
- Crooks GE, Hon G, Chandonia J-M, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–90.View ArticlePubMedPubMed CentralGoogle Scholar
- Krzywinski M, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19:1639–45.View ArticlePubMedPubMed CentralGoogle Scholar
- Caspi R, et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 2016;44:D471–80.View ArticlePubMedGoogle Scholar
- Hildebrand A, Remmert M, Biegert A, Soding J. Fast and accurate automatic structure prediction with HHpred. Proteins. 2009;77(Suppl 9):128–32.View ArticlePubMedGoogle Scholar
- Alva V, Nam SZ, Soding J, Lupas AN. The MPI bioinformatics toolkit as an integrative platform for advanced protein sequence and structure analysis. Nucleic Acids Res. 2016;44(W1):W410–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Guex N, Peitsch MC. SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis. 1997;18(15):2714–23.View ArticlePubMedGoogle Scholar