Recently, I found an interesting programming exercise, which read:
"A robot is located at the bottom-left corner of a 5x5 grid. The robot can move either up, down, left, or right, but can not visit the same spot twice. The robot is trying to reach the top-right corner of the grid."
In previous posts we'd seen crude solutions to the problem above. Good news is that there's little respite to be found in this post also and it's more of the same sort of entertainment. Those concerns we've had so far remain, a burgeoning memory requirement, whilst looking to apply this same calculation for a 10x10 grid. The problem has been that our stack grows and then explodes resulting in an OutOfMemoryException. Naturally, the answer must be to use a disk-based stack. You can't help but think that the right answer for this involves NoSQL but, dammit, we're going to prove a relational database can't handle it before removing our tin foil hat!
First there's just the small matter of some unexpected performance gains to be made by replacing the LinkedList<int> with List<int> when representing robot journeys. Another little refactoring results in a decent performance improvement! Our Program class stripped of validation and instrumentation looks like this:
[code:c#] public static Board Board;
static void Main(string[] args)
{
var boardLength = int.Parse(args[0]);
Board = new Board(boardLength);
var newJourney = new List(){ Board.StartLocation };
var explorer = new BoardExplorer(newJourney);
var journeys = new Stack<List>();
var factory = new BoardExplorerCloneFactory(journeys);
journeys.Push(explorer.Journey);
while (factory.HasClones)
{
var explorersCount = 0;
UpdateInstrumentation();
foreach (var movedExplorer in factory.GetExplorers())
{
if (movedExplorer.IsExploring)
journeys.Push(movedExplorer.Journey);
explorersCount++;
}
}
}
[/code]
Changes to the BoardExplorer are almost too trivial to show once again. More fluff than stuff, cue our newest, lighter-weight robot:
[code:c#] public struct BoardExplorer
{
public List Journey;
public BoardExplorer(List journey)
{
this.Journey = journey;
}
internal int GetLocation()
{
return Journey[Journey.Count-1];
}
public bool IsExploring
{
get
{
return GetLocation() != Program.Board.EndLocation;
}
}
internal bool HasVisited(int location)
{
foreach (var visitedLocation in Journey)
if (visitedLocation == location)
return true;
return false;
}
public BoardExplorer Move(int location)
{
var journeyBeyond = new List(Journey);
journeyBeyond.Add(location);
var moved = new BoardExplorer(journeyBeyond);
if (!moved.IsExploring)
BoardExplorerController.CompleteJourneys.Enqueue(moved.Journey);
return moved;
}
}
[/code]
The Controller continues to look as anaemic as ever.
[code:c#] public class BoardExplorerController
{
public static Queue<List> CompleteJourneys = new Queue<List>();
internal static IEnumerable GetValidMoves(BoardExplorer explorer)
{
var availableMoves = Program.Board.GetAvailableMovesFromLocation(explorer.GetLocation());
while (availableMoves.Count > 0)
{
var location = availableMoves.Pop();
if (!explorer.HasVisited(location))
yield return location;
}
}
}
[/code]
And whoops of delight followed. Not really. Better numbers though.
There are 1,262,816 unique paths from the top-left corner to the bottom-right corner of a 6x6 grid!
There were 18,470,410 robots used to discover this!
Total memory: 321,723,412
Time elapsed: 44,092ms
Garbage collection gen0: 2,923
Garbage collection gen1: 1,019
Garbage collection gen2: 0
We managed to knock off 30 seconds from the time it takes to calculate those paths. Almost twice the performance greets muted celebration. Is this good enough for a grid almost three times the size? Few seem to think so. First, let's see how this copes with a 7x7 grid.
Out of Memory exception! 4,884,621 paths found before I capitulated.
Looking at the current state of our robot affairs we believe there's one final optimization we can make before we resort to storing the journey stack on disk. The 'epiphany' is this: Why do we need the robot at all? It was suggested by the way the problem was phrased and had 'robot' been omitted from the problem statement then R2 might never have made it into our code. Essentially, we care only about unique paths so we're about to send R2 into retirement. Our new Program minus BoardExplorer results in impressive performance gains:
There are 1,262,816 unique paths from the top-left corner to the bottom-right corner of a 6x6 grid!
There were 18,470,410 robots used to discover this!
Total memory: 319,371,768
Time elapsed: 34,679ms
Garbage collection gen0: 2,906
Garbage collection gen1: 1,029
Garbage collection gen2: 0
Something to cheer also is that we've reduced the size of the application. Fewer lines of readable code is always better. I can't think of an exception right now so it must be true. Our new Program without robot participation:
[code:c#] static void Main(string[] args)
{
var boardLength = int.Parse(args[0]);
Board = new Board(boardLength);
Journeys = new Stack<List>();
var inaugural = new List();
inaugural.Add(Board.StartLocation);
Journeys.Push(inaugural);
var factory = new JourneyFactory();
var warmTimer = GetStartedStopWatch();
while (factory.HasIncompleteJourneys)
{
var explorersCount = 0;
UpdateInstrumentation();
foreach (var journey in factory.GetJourneys())
{
if (journey[journey.Count - 1] == Board.EndLocation)
JourneyController.CompleteJourneys.Enqueue(journey);
else
Journeys.Push(journey);
explorersCount++;
}
}
}
[/code]
Without the BoardExplorer it no longer makes sense to have a BoardExplorerController so the controller becomes a JourneyController:
[code:c#] public class JourneyController
{
public static Queue<List> CompleteJourneys = new Queue<List>();
internal static IEnumerable GetValidMoves(List journey)
{
var availableMoves = Program.Board.GetAvailableMovesFromLocation(journey[journey.Count - 1]);
while (availableMoves.Count > 0)
{
var location = availableMoves.Pop();
if (!journey.Contains(location))
yield return location;
}
}
}
[/code]
Our BoardExplorerCloneFactory is renamed to the more pleasant JourneyFactory:
[code:c#] class JourneyFactory
{
public static int Count = 0;
public bool HasIncompleteJourneys { get; private set; }
public JourneyFactory()
{
this.HasIncompleteJourneys = true;
}
internal IEnumerable<List> GetJourneys()
{
HasIncompleteJourneys = false;
while (Program.Journeys.Count > 0)
{
var explorer = Program.Journeys.Pop();
foreach (var location in JourneyController.GetValidMoves(explorer))
{
HasIncompleteJourneys = true;
Count++;
var journey = new List(explorer);
journey.Add(location);
yield return journey;
}
}
}
}
[/code]
You know, we're quietly optimistic that we've a decent chance now at taking on this 7x7 grid. And our journey planner says:
Out of Memory exception! 4,839,550 paths found before I croaked.
Okay, we've clearly found the limit. 'Clearly' because we've said so. It's time for our relational database management system and there's just one more change to make before we're good to call on SQL Server. We'll transform the List<int> into a comma separated list of integers in string format for database table insertion. In order to be able to easily switch between using a RAM based stack and a disk based stack we'll introduce an interface 'IStack':
[code:c#] interface IStack
{
int GetCount();
void Push(List journey);
List Pop();
}
[/code]
The RAM based implementation embraces composition over inheritence:
[code:c#] class MemStack : IStack
{
private Stack<List> journeys;
public MemStack()
{
journeys = new Stack<List>();
}
public void Push(List journey)
{
journeys.Push(journey);
}
public List Pop()
{
var journey = journeys.Pop();
return journey;
}
public int GetCount()
{
return journeys.Count;
}
}
[/code]
In order to use the new MemStack there are three lines of code that change in the main Program:
[code:c#] public static Board Board;
public static IStack Journeys; // <- Change #1 here
static void Main(string[] args)
{
var boardLength = int.Parse(args[0]);
Board = new Board(boardLength);
Journeys = new MemStack(); // <- Change #2 here
var inaugural = new List();
inaugural.Add(Board.StartLocation);
Journeys.Push(inaugural);
var factory = new JourneyFactory();
while (factory.HasIncompleteJourneys)
{
var explorersCount = 0;
foreach (var journey in factory.GetJourneys()) // <- Change #3 here
{
if (journey[journey.Count - 1] == Board.EndLocation)
JourneyController.CompleteJourneys.Enqueue(journey);
else
Journeys.Push(journey);
explorersCount++;
}
}
}
[/code]
Before going any further we make sure that we've not broken anything by running the application to test performance. It's interesting to note here that we've moved away from TDD and we're focused on PDD (Performance Driven Development). I didn't coin this phrase so if you've not heard of it then it's because the phrase never really caught on! The reason unit tests are of minimal importance here is because we're consistently able to guarantee the units work because if they didn't then we'd see an incorrect count for the number of unique paths. The performance after this refactoring show we're still good.
There are 1,262,816 unique paths from the top-left corner to the bottom-right corner of a 6x6 grid!
There were 18,470,410 robots used to discover this!
Total memory: 321,396,728
Time elapsed: 32,512ms
Garbage collection gen0: 2,920
Garbage collection gen1: 1,021
Garbage collection gen2: 0
Now let's turn attention to the database and the SQL script for pursuing this absurd course:
[code:tsql] CREATE TABLE [dbo].[Journey](
[Id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED ,
[Path] [varchar](128) NOT NULL
)
GO
CREATE PROCEDURE [dbo].[Push]
@path varchar(128)
AS
DECLARE @id int
INSERT INTO [Journey] ([Path])
SELECT @path
SELECT @id = SCOPE_IDENTITY()
GO
CREATE PROCEDURE [dbo].[Pop]
AS
DECLARE @id int,
@path varchar(128)
SELECT TOP 1
@id = [Id],
@path = [Path]
FROM
[Journey]
ORDER BY
id DESC
DELETE FROM [Journey]
WHERE id = @id
SELECT @path 'Path'
GO
CREATE PROCEDURE [dbo].[CountPaths]
AS
SELECT COUNT(*) 'Count'
FROM [Journey]
GO
[/code]
The database based stack implementation implements the IStack interface and uses Dapper.NET, a simple object mapper for .Net, for data access.
[code:c#] class DbStack : IStack
{
private string connectionString = "Server=.;Database=Wookie;Trusted_Connection=True;";
public void Push(List journey)
{
using (var connection = new SqlConnection(connectionString))
{
connection.Open();
var param = journey.ToCommaSeperatedIntegers();
var sql = string.Format("EXEC Push '{0}'", param);
var result = connection.ExecuteMapperQuery(sql);
}
}
private int maxCount = 0;
public int MaxCount()
{
return maxCount;
}
public int Count()
{
using (var connection = new SqlConnection(connectionString))
{
connection.Open();
var sql = string.Format("EXEC CountPaths");
var result = connection.ExecuteMapperQuery(sql);
var count = result[0].Count;
if (count > maxCount)
maxCount = count;
return count;
}
}
public List Pop()
{
using (var connection = new SqlConnection(connectionString))
{
connection.Open();
var sql = string.Format("EXEC pop");
var results = connection.ExecuteMapperQuery(sql);
return results.ToList();
}
}
}
[/code]
In order to find out how large the stack grew, because this has implications on database performance, we've added a MaxCount method to the IStack interface.
[code:c#] interface IStack
{
int Count();
int MaxCount();
void Push(List journey);
List Pop();
}
[/code]
We're finally good to go so what's the reason for the pessimism that this is going to give less than stellar results? Well, for example:
There are 8,512 unique paths from the top-left corner to the bottom-right corner of a 5x5 grid!
There were 90,110 robots used to discover this!
Total memory: 24,096,248
Max stack: 16
Time elapsed: 884,238ms
At this point we can forget about metrics in milliseconds. Using the new DbStack to calculate the unique paths in a 5x5 grid took almost fifteen minutes. Using MemStack, however, the same calculation takes ~200 milliseconds.
There are 8,512 unique paths from the top-left corner to the bottom-right corner of a 5x5 grid!
There were 90,110 robots used to discover this!
Total memory: 24,096,248
Max stack: 16
Time elapsed: 163ms
DbStack is an incredible 5000x slower than MemStack in this instance so it appears we're not going to be able to calculate the paths for a 10x10 grid using a relational database to maintain the stack after all. In the next post we look at a more hopeful approach that involves NoSQL.