Say your a car and want to get ice cream, and also you want to avoid zombies. Which action would be best to take?
x2-y2 is state 1 x2-y1 is state 2 x1-y2 is state 3 x1-y1 is state 4
Syntax: state action reward next-state
Results: (Note: Press compute policy button multiple times until it gets the optimal policy)
Per Action:
Congratulations! found optimal solution from every state
A more complicated example