Solving a Puzzle with QLearning

Say your a car and want to get ice cream, and also you want to avoid zombies. Which action would be best to take?

1.) Mapping Agent(car) coordinates/situation into states

x2-y2 is state 1
x2-y1 is state 2
x1-y2 is state 3
x1-y1 is state 4

Syntax: state action reward next-state

Results: (Note: Press compute policy button multiple times until it gets the optimal policy)

Per Action:

Congratulations! found optimal solution from every state