University of Göttingen  |  Faculty of Biology  |  Institute of Microbiology and Genetics  |  Dept. of Bioinformatics

jpHMM-HIV [Method]

About jpHMM

jpHMM is a probabilistic approach to compare a nucleic acid or protein sequence S to a given multiple alignment A of a sequence family for which a classification into subtypes is available. We called this approach a jumping profile HMM since it combines the idea of a profile HMM with the jumping alignment (JALI) approach proposed by Spang et al. as a strategy to database searching. In contrast to standard methods JALI does not align and compare a database sequence S to the multiple alignment A as a whole, but aligns local segments of S to those segments of individual sequences from A that are most similar to them. Within this alignment, S can jump between the different sequences of A at arbitrary positions. This approach is particularly useful if the database sequence S is a recombinant sequence such that different parts of S are related to different sequences from the alignment A.
In our approach we assume that a partition of the multiple alignment A into subtypes is given. Each subtype of the alignment is modeled as a profile hidden Markov model. In addition to the usual state transitions within these profile HMMs, transitions, called jumps, between the different profile HMMs are allowed so a path through the model can jump between states corresponding to the different subtypes, depending on which subtype is locally most similar to the database sequence S. The most probable path through this model, the Viterbi path, determines the jumping alignment of the query sequence S to the multiple alignment A and thus the recombination of the query sequence S: The subtype to which a certain sequence position is aligned to is the predicted subtype for this position. Positions of jumps between different subtypes indicate phylogenetic recombination breakpoints. A detailed description of the algorithm can be found in Schultz et al. 2006. For more information about this web server please see Zhang et al..

New feature of jpHMM:

To get the reliability of the jpHMM recombination prediction, i.e. the accuracy of the predicted breakpoint positions and parental subtypes, we extended the output of jpHMM to include information on regions where the model is 'uncertain' about the parental subtype and provide an interval estimate of the breakpoint, called breakpoint interval (see Schultz et al. 2009). For each sequence position the so-called posterior probability for each subtype is calculated. This is the probability that the respective sequence position belongs to the considered subtype under the assumption that the whole sequence is generated by the model (output sample). The length of a breakpoint interval depends on how precisely the breakpoint can be located. A large interval is the consequence of the uncertainty of the model to locate the exact breakpoint position between two subtypes. Thus, the user can see, which breakpoints can be located relative precisely or which breakpoints are approximative.
For uncertainty regions no parental strain can confidently be determined. However, by examining the graph of the posterior probabilities the user can see which subtypes are closest related in these regions.
At positions outside uncertainty regions and breakpoint intervals the user can now be more confident in the predicted parental subtype, as our results show (will be published soon).

Evaluation

To evaluate the performance of this tool a large set of virus genome sequences as well as simulated recombined genome sequences have been tested. The recombination breakpoints identified by jpHMM were found to be significantly more accurate than breakpoints defined by traditional methods based on comparing single representative. Details can be found in the references.

Contributions

This project is a collaborative effort between the Department of Bioinformatics of the University of Göttingen and the Los Alamos HIV Sequence Database Group. A.-K. Schultz developed, implemented and tested the jpHMM algorithm, M. Zhang tested and evaluated the program on HIV data, designed and programmed the jpHMM Web interface, C. Calef developed the HIV genome mapper tool, C. Kuiken evaluated the program on HCV data, T. Leitner, B. Korber and B. Morgenstern guided the project. B. Korber firstly conceived the idea of applying jpHMM in HIV subtyping. M. Stanke developed the jpHMM approach and supervised the program process. The group members from prof. Burkhard Morgenstern's group and the Los Alamos HIV Sequence Database provided a lot of help that make this project possible.

This project was funded in part by grant NIH Y1-AI-1500-01, the NIH-DOE interagency agreement on HIV Immunology and Sequence Database and by grant MO 1048/1-1 of the Deutsche Forschungsgemeinschaft.

References

Please cite one of the following papers if you use this tool in your publication (a list of all references can be found here):

Questions or comments? Email contact
Copyright © 2005-2006 Dept. of Bioinformatics (IMG)