An Attempt to Elucidate the Genes Encoding the Surface Exposed Proteins R3, Z1 and Z2 in Two Strains of Streptococcus agalactiae from Zimbabwe
MetadataShow full item record
Surface exposed proteins of Streptococcus agalactiae (GBS) may be used in serotyping and may have a potential role as vaccine candidates. The proteins R3 and the recently discovered Z1 and Z2 were found to be important markers in GBS from Zimbabwe. However, their prevalence in most geographical areas, and the genes encoding these proteins have so far not been identified. Therefore, the aim of this work was to identify candidate genes (CGs) for the R3, Z1 and Z2 GBS surface exposed proteins in GBS. Two GBS strains from Zimbabwe, GMFR293 and CMFR30, found to express R3, Z1 and Z2, and Z1, respectively, were genome sequenced. CMFR30 was sequenced on a Pacific Biosciences instrument and assembled to a complete genome. GMFR293 was sequenced by Roche 454 pyro sequencing, which was combined with optical mapping for assembly to a complete genome. RAST was used for in silico gene prediction and functional annotation for each genome, for comparison of predicted coding sequences (CDSs) and for comparison with four reference genomes of R3, Z1 and Z2 negative strains. The CDSs were analysed by various bioinformatics tools to identify candidate genes. CDSs were analysed to estimate the molecular weight (MW) of the encoded protein and to predict the potential surface exposition. Based on previous published characteristics of the R3, Z1 and Z2 proteins, CGs were chosen among CDSs encoding proteins of a MW higher than 50 kDa, which had a functional annotation as membrane or surface associated protein or as hypothetical protein (HP) predicted to be potentially surface exposed. GBS strain GMFR293 comprised 2,037,090 bp and CMFR30 2,062,772 bp, respectively. A total of 2023 CDSs were predicted in GMFR293 and 2060 in CMFR30. Around 80% of all CDSs had a putative assigned function. Unique genes were identified when they were compared with the other GBS strains. 26% of the CDSs from both genomes were predicted as TM proteins. From these, 113 CDSs from strain GMFR293 had a MW >50 kDa: 21 harboured a signal peptide, eight and four had an LPxTG and/or YSIRK signal, respectively, and 14 were identified as lipoproteins. In comparison, of 70 CDSs predicted as TMs in CMFR30 that had a MW >50 kDa, nine harboured a signal peptide, seven and one had an LPxTG and/or YSIRK signal, respectively, and 6 were identified as lipoproteins. Finally, 51 CDSs were chosen as CGs for R3, Z1 and Z2 in the GMFR293 genome, and 32 CDSs were chosen as CGs for Z1 in the GMFR30 genome. Among them were CDSs annotated as hypothetical protein, with putative function and some with predicted function. The CGs identified by in silico analyses in this study need to be further tested in experimental analyses, before. This work demonstrates that identification of candidate genes for the surface exposed proteins R3, Z1 and Z2 can be done by comprehensive in silico characterization of selected reference genomes. Among the CGs for R3 was a hypothetical protein of 105kDa which showed 97% similarity with the R5 (BPS) protein encoded by the sar5 gene published in NCBI. To test the hypothesis whether R5 may be similar or identical to R5, the sar5 gene was coned in E. coli LB21 expression of R3 protein and was thereafter tested by immunological methods. However, the observation that transformants were negative for expression of R3 by immunofluorescence testing may indicate that R3 and R5 are different proteins. However, there may be other possible explanations for these results, which need to be evaluated in further experiments. In this study we have assembled two GBS strains to near complete genomes, and done a thorough in silico characterization of the two GBS genomes with prioritization of potential candidate genes for the surface associated proteins R3, Z1 and Z2. Final identification of the genes encoding these proteins depend on either that more information about the physical and phenotypic characteristics of these proteins becomes available in the future, or experimental analysis of expression of the proteins in overexpression or gene knockout experiments. This work describes the first attempt to identify CGs for these three GBS proteins.