Once ZV had explored the maze, it did a speed run and the path it took in the upper right portion of the maze was very different than the other competitors.
I finally got a chance to run the path through the simulator and understand what happened. The image below shows the path ZV took. Notice that in the upper right, instead of turning in, it went straight and then turned in. The path it took is fairly suboptimal.
The maze solver I have is an orthogonal maze solver and so it only takes into account the cost of a 90 deg. turn and the acceleration/deceleration when moving multiple cells in a straight line. The weighting for a turn in the cost array is too high and so the mouse preferred straight paths.
Here is the cost ZV used: 178, 236, 305, 366, 421, 467, 506, 539, 572, 605, 638, 671, 704, 737, 770. This cost array says the mouse prefers to go forward three cells to avoid a turn.
If I reduce the cost of the turn, the mouse will then take paths with more turns.
I notice that ZV did not explore the top right section which is, I guess, why it failed to turn in there.
The learning algorithm is setup to search the maze until it finds the best path.
In this case, given the cost table, it decided that a path going through the upper right was worse than the longer but straighter path, so it didn’t bother to explore the region. I experimented with a different cost table where the turn cost wasn’t so high and the simulation did explore that region. I think the “right” solution is to write a diagonal solver. I will probably attempt that next summer or more likely try to convince a friend to write one that takes into account all the different turn costs, different acceleration and deceleration and max. velocity profiles.