This post describes the data structures used in the diagonal solver.
Coordinates – these will be used not just the cell center but also the edges.
• X, Y with a range of 2 * width + 1 and 2 * height + 1.
○ 16 x 16 => 33 x 33.
○ 32 x 32 => 65 x 65.
• The start cell is coordinate (1, 1).
• The start bottom left peg, which will be south west of the start cell is (0, 0).
• An Orthogonal cell has an odd coordinate for both the X and the Y coordinate.
• A diagonal/edge has an even coordinate for either the X or the Y coordinate.
• Coordinates are specified as (x, y).
○ x is the horizontal coordinate.
○ y is the vertical coordinate.
• Conversion between (x, y) and memory address is as follows:
Memory address = y * (2 * width + 1) + x.
ExpandQueue – used to track which cells we need to expand.
• This is a FIFO of a structure that has two elements – Coordinate and direction to expand.
○ The coordinate is the cell to expand next.
○ The direction is the direction to expand next from the cell.
• The size of the FIFO is (2 * width + 1) * (2 * height + 1).
○ It shouldn’t need to be this big.
○ The worst case scenario is to flood an empty maze because this will result in expansion of very cell and edge.
• If we implementing an A* algorithm based solver, this queue would be sorted with the coordinate expected to be on the lowest time path at the front of the FIFO.
For a 16 x 16 maze, where the coordinate and direction are two byte and one byte respectively, the FIFO will (2 * 16 + 1) * (2 * 16 + 1) * 3 = 3,267 bytes of RAM.
MoveCost Array – used to keep track of the path and cost of the path.
• This is an array that has two elements for each direction – the move that got us here and the cost to get here.
○ The reason entries are needed for each direction is that different paths could be passing through the cell in different direction.
• The path is an enumeration of moves. See “Diagonal Solver Introduction and Flooding” for types of moves.
• The cost is a 16 bit number.
○ To make it faster for the ARM architecture, it could be 32 bit if we have the RAM.
• The size of the array is (2 * width + 1) * (2 * height + 1) * 8 directions.
○ It shouldn’t need entries for the pegs but in the initial implementation, that won’t be optimized.
○ It should only need four directions because:
§ In the cell center, you can only be facing North, East, South and West.
§ In the edge, you can only be facing NorthEast, SouthEast, SouthWest, NorthWest.
However, for the first implementation, we are going to make this eight directions so that the direction from the ExpandQueue can be used directly. If this was only four directions, then when in the cell center, the direction would have to mean North, East, South and West and on an edge, the same four indices would have to mean NorthEast, SouthEast, SouthWest, NorthWest and we would need a comparison to determine this.
For a 16 x 16 maze, where the path is one byte and the cost is two bytes, the array will (2 * 16 + 1) * (2 * 16 + 1) * 8 * 3 = 26,136 bytes of RAM.
good work! thanks for sharing your precious experience with us. I am surprised it actually consumed 26K ram, even though LQFP64 STM32 has maximum 64K ram from F103. But I feel this will be efficient since you trade time with memory, and it should worth it. My method uses less memory but takes longer timer to compute since I only use 1byte for each cell for its weight for running time but the weight determine process is way more complicated than this and I hope mine can finish within 1 second(which is my bottom line) 🙂
The STM32F103 I use has 512KB of code store and 64K of RAM. I think currently, the entire micromouse code is around 40KB. I’m with you that throwing memory for speed is worth it.
As part of this effort, I went back and looked at gcc and ARM code generation for 8 bit and 16 bit variables. It turns out that code to use 8 bit and 16 bit variables is longer and slower than that for 32 bit variables. So, I spent some time figuring out how to use bit-banding without some of the problems Peter and I had seen in the past – there is a write up on this site. My plan is to store wall/no wall info. in the bit-band region. I’ll try to find time to write up how I did this.
I think one second is way, way too long for the solver to run. I don’t follow why the weight determination process has to be more complicated than this.
On a tangent, I noticed from your IP address that you are in the Los Angeles area. The APEC (www.apec-conf.org) micromouse contest is in Long Beach, on March 18th. Would be great to see student entries there.
Thanks for your apply.
Since we both have 64K version STM32, ram might not an issue for us. What I am curious is Kato’s tetra only has 20K ram on his STM32 since he uses LQFP48 package and 20K is the max for that series. He got to have something different. I barely see people share these interesting details other than this place(micromouseonline.com), I am looking forward to see more of these kinda of topics on this site!.
I did some quick research on bit-band region and according to what they described on http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0337h/Behcjiic.html, seem it worth us to work on it. I utilized systick to track program execution time in us in my STM32 and turned out the stack push and pull process for stack(I use my own stack instead of stl stack for c++) takes most of the time and the pure math computation was too fast to be neglected. I saw you mentioned 8 bit and 16 bit do take longer time than 32 bit variables, and the variable I push and pull into stack is 8 bit unsigned int, that might be the reason to slower down my overall computation speed for path generation. I said 1 second is my bottom line, it actually execute faster than my eye blink speed : ), the results come out pretty much instantaneously if I don’t print any extra information when I used my own stm32 dev board simulate my floodfill and print to screen threw serial port. But eventually it will gets slower because I am still add more details to make the weight determine more accurate.
Peter noticed my location last year when I was making some comments on his post and he invited my to APEC as well. I personally really want to go APEC unfortunately I can’t make it that day. I will definitely attend it if I can manage time that day, thanks once again.
The memory requirement for an orthogonal solver is much lower – in the order of a couple K bytes of RAM. I think the total RAM usage for my mouse with the orthogonal solver is around 4K to 5K of RAM.
I think if you work at it a little, the diagonal solver’s RAM footprint can cut down to 1/2 to 1/4 of what I have. In the Maze Cost array, I include storage for the pegs which is not necessary. Also, I include space for eight directions – this isn’t necessary. You really just need storage for four directions.
Since you have SysTick working, if you want, you can use that to time your code.
Thanks for your advice. My orthogonal solver was pretty conservative and I ended up use 256byte x 2 + 256byte for stack + variables. I use MDK to compile my code and it had some compatible issues between printf and C++ STL stack. even though I didn’t enable MicroLib and only redirect print to serial port, calling printf still made my MCU freeze, I ended up wrote another function similar to printf and wrote my own stack running in C with some reference online. I used malloc to request memory from system and system only gave me 420bit for maximum, I search and found similar issue online but didn’t find any solution so far, I’ve heard the memory that malloc give you is based on your hardware and compiler. This postponed my plan on doing halfsize micromouse this year since a 32 by 32 maze will need a stack with minimum 32x32x16bit = 2kB size for stack for worst case. I don’t know if you use floodfill for your orthogonal solver and I want to know what are you going to do if the stack(or queue) you have has a size limit especially on a larger maze(32by32 halfsize mouse)?
I use a floodfill for the orthogonal solver too. It is very similar, in concept, to the diagonal solver. Instead of expanding eight ways (see flooding article), it only expands three ways.
Using the heap is very slow. This means, you shouldn’t be using malloc/free. Instead, create an array that is the right size for what you need and then use a head and tail pointer or index to keep track of the valid elements in the array.
I thought about create an array for a larger size stack but I thought it would be slower than the way I am using now. I thought this was a ‘stupid’ method and if I couldn’t figure out a better way to replace the current method I am using stack, I will use array stack for my future half-size micromouse. Thank you very much for telling me this good news, I will write one and test in systick to see if it is faster than the memory allocated by malloc, if so, I can do more aggressive strategy on my orthogonal solver : D. My case now is the stack push-pull process takes max 2200us for worst case, and I in order to make it solve faster, I only allow the solver to push 7 by 7 cell around the current cell in order to reduce the work load and ended up the max computation time stayed at 710us for worst case with about 112 push-pull action. Even though my pid is running under 1ms interrupt, and this won’t be a big issue even if I push the entire cell needed into stack but I feel pretty uncomfortable with this :(. This will limit the potential for my mouse to reach faster exploring speed in future. It definitely worth me to rewrite my stack if what you said was true, thanks once again.