Temporal Difference Learning Method is a mix of Monte Carlo method and Dynamic programming method. Some of the advantages of this method include:
- It can learn in every step online or offline.
- It can learn from a sequence which is not complete as well.
- It can work in continuous environments.
- It has lower variance compared to MC method and is more efficient than MC method.
Limitations of TD method are: - It is a biased estimation.
- It is more sensitive to initialization.