3.2 隐马尔可夫模型(Hidden Markov Model)

引言

之前

存在的问题：仅靠👆还不足以真正完成序列比对，因为现有的状态模型只是区分了空位状态X Y M，而没有考虑具体的残基

解决

隐马尔可夫模型(Hidden Markov Model HMM)

The observable symbols (“tokens”, y(t)) are generated according to their corresponding states (x(t))

可观察的符号（“tokens”，y(t)）是根据其相应的状态（x(t)）生成

在状态的基础上增加了符号的概念
In addition to State Transition Probability, each state of HMM has a probability distribution over the possible output tokens(Emission Probability).

除了状态转移概率外，HMM的每个状态都有一个关于可能的输出标记的概率分布（生成概率）

除了状态转移概率之外，隐马尔可夫模型进一步引入了生成概率的概念，每个状态都有自己的生成概率分布，可以按照不同的概率产生一组可以被观测到的符号
Thus, a HMM is consist of two strings of information.
The state path
The token path (emitted sequence).

因此，HMM是由两串信息组成的。
— 状态路径
— 符号路径（生成序列）。
But the state path is not directly visible

但是与马尔可夫模型相比，HMM状态路径并不直接可见
Instead, we have to infer the underling state path, based on the observable token path.

相反，我们必须基于可观察到的符号路径推断出底层的状态路径
例：取值最大的那一条路径
Given a HMM, a sequence of tokens could be generated as following:
- When we “visit” a state, we emit a token from the state’s emission probability distribution.
  
  当我们 "访问 "一个状态时，我们从该状态的生成概率分布中发射一个令牌
- Then, we choose which state to visit next, according to the state’s transition probability distribution.
  
  然后，我们根据该状态的转移概率分布选择下一个要访问的状态

序列比对问题

之前没有考虑到残基

用隐马尔可夫模型来补回这一点

用生成概率来处理残基
Sequence alignment with HMM
- Each “token” of the HMM is an aligned pair of two residues
  (M state), or of a residue and a gap (X or Y state).
  — Transition and emission probabilities define the probability of each aligned pair of sequences.
  
  HMM的每个 "令牌 "是一对比对成功的两个残基(M状态)，或一个残基和一个间隙(X或Y状态)
  — 转换和生成概率定义了每对比对序列的概率
- Based on the HMM, each alignment of two sequences can
  be assigned with a probability
  — Given two input sequences, we look for an alignment with the maximum probability.
  
  在HMM的基础上，两个序列的每一次比对都可以被分配一个概率
  — 给定两个输入序列，我们寻找一个具有最大概率的比对
隐马尔可夫模型的好处
- 有效的给出了序列比对的概率解释 —— Probabilistic interpretation
- 有助于用概率论的知识做概率论的分析 —— Probabilistic inference
  - For example, to calculate the probability that a given pair of sequences are related by any (unspecified) alignment
    
    例如，计算给定的一对序列通过任何（未指定的）比对方式相关的概率
    —— Or, what’s the best likelihood we can expect for given two sequences?
    
    —— 或者，对于给定的两个序列，我们可以期待的最佳可能性是什么？
  - Given the nature of HMM, many different state paths can give rise to the same token sequence
    
    鉴于HMM的性质，许多不同的状态路径可以产生相同的符号序列
    
    So we can simply sum up them together to get the full probability of a given token sequence.
    
    所以我们可以简单地把它们加在一起来得到给定符号序列的全部概率