How is TeamMentor being used?

by Kofi Sarfo 29. April 2013 00:31

In order to assess how TeamMentor is being used, many user events are recorded. This includes each user logging in and logging out, for example, as well as each article being read.

The recorded properties are Date, Time, Username, IP Address, Action and (action) Detail where Action might be View Html Article and Detail might be Allocate Enough Buffer Space for Copying Data. This information is made available via TBOT (TeamMentor Bot) Reports shown below. Of particular interest here are UsersActivities and UsersExpiration:


The first report showing Users'Activities, History [example], includes links to limit the information being shown to just a single type of action, action detail or user name.


The second report, History Filtered [example], differs from the earlier type by allowing you to drill down with each selection so that it's possible to view data based on a combination of filters.


The reports on Users' Expiration show Users with Disabled Accounts, Users with Expired Accounts and Users with Expired Passwords.

TeamMentor is ASP.NET, yes, but not as you've seen it often elsewhere

by Kofi Sarfo 20. April 2013 08:32

The SPA [Single Page Application] is all the rage... err, and has been for a while. Gmail appeared in 2004. In that light TeamMentor ought not to seem unusual, but it does. At its heart, it is a Single Page Application with a twist.

TeamMentor is an ASP.NET Webforms application but there isn't a single postback. Plus, it's built on top of the O2 platform which means extension methods everywhere. There is a presentation on the platform given by its creator, @DinisCruz, entitled "OWASP O2 Platform - Automating Security Knowledge through Unit Tests". It's designed to offer something like a Strongly Typed Python.

Opening the solution reveals Global.asax.cs is a wrapper for an O2 action pipeline per application/session event. The start page TeamMentor.html shows that the views are dynamically loaded using JavaScript. Also, there's hand-rolled bundling and minification rather than taking a dependency on ASP.NET MVC 4. It might be worth asking Dinis why he prefers using techniques he's discovered ahead of ASP.NET implementations.

At first glance, it's an interesting piece of software with a good amount of thought behind it. There's an interesting philosophy driving its development and implementation pulls inspiration from other languages / practices. I'll happily pair with anyone on this for an hour or two if you're so inclined.

Disclaimer: I'm not a Web Developer.

In return for your time I'll buy you dinner.

Thanks

@wimirotech

NCrunch and TeamMentor

by Kofi Sarfo 18. April 2013 13:59

I've been a huge fan of continuous testing since discovering NCrunch! Along with ReSharper and TestDriven.NET it's an IDE extension I'll install given two minutes.

Lacking the discipline to run tests after each change the instant feedback, without having to ALT-2 (shortcut for running tests), is delightful. I now tend to look to NCrunch for information on whether the solution compiles ahead of the ReSharper indicators. If you too like not to keep having to run tests manually then this is for you. With TeamMentor you'll get the following error, initially:

The "PostSharp21MoveWithRetry" task was not found

Rub Configuration belly thus (by clicking the NCrunch > Configuration menu item):

Make sure the following directories are selected:

And finally, it's worth increasing the Request.Timeout value in the function used to detect whether an internet connection exists here:

TeamMentor.CoreLib\TM_AppCode\O2_Scripts_APIs\_Extra_Web_ExtensionMethods.cs

    public static class Extra_ExtensionMethods_Web
    {
        public static WebHeaderCollection HEAD_Headers(this Uri uri)
        {
            var request = (HttpWebRequestWebRequest.Create(uri);
            request.Timeout = 3000;
            request.AllowAutoRedirect = false;            
            request.Method = "HEAD";
            try
            {
                return request.GetResponse().Headers;                                
            }
            catch (WebException)
            {                
                return null;
            }
        }
        public static bool HEAD(this Uri uri)
        {
            return uri.HEAD_Headers().notNull();
        }
    }

And now you should have all tests passing (and without having to run them manually).

Inconsistent Line Endings Can Cost You Hours

by Kofi Sarfo 18. April 2013 08:21

"The line endings in the following file are not consistent. Do you want to normalize the line endings?"

Text on a dialog box designed to relieve any solid sense of contentment.

Having started using Github for something other than to marvel at what the cool kids now do with JavaScript nowadays, I find that Stack Overlow is, well, overflowing with questions on how to make Git work so that it doesn't evolve into the timesink it threatens to become. My interpretation.

Jeff Atwood has a great post, The Newline Schism (2010), with a detailed explanation on what's happening here.

Getting Started With TeamMentor

by Kofi Sarfo 17. April 2013 11:31

When running TeamMentor for the first time you should notice that no content has been loaded.

The following series of steps shows how this is done. Previously this knowledge had passed from one developer to another via hushed tones in secret meetings. Please treat this knowledge with care.


  • Fork the TeamMentor Development repo

  • Make sure you have PostSharp installed

  • Run Visual Studio 2010 as Administrator

  • Hit CTRL F5 (run web application)

  • Log into TeamMentor as Admin

  • Click Control Panel
  • Once the main screen opens you'll find the link to the Control Panel to the top right hand side of the screen

    Team Mentor > Find Control Panel

     

  • Click Install / Upload libraries
  •  

  • Click OWASP to select this library for installation
  •  

  • Click Install (installs the OWASP library)
  •  

  • Click Top 20 Vulnerabilities to select this library for installation; click Install
  •  

  • Click TM Documentation to select this library for installation; click Install
  • Click TM Documentation to select this library for installation; click Install
  • Once these three libraries have been installed, we need to reload the server cache

  • Click Open TBot
  • Click Reload Server Objects
  • Click Reload TMConfig 
  •  

  • Click Reload Cache

 

And that's it!

Return to the main screen and you should see the libraries have been installed and are available to use.

You're right. Perhaps a video would have been better :)

Open Source Security

by Kofi Sarfo 17. April 2013 09:20

A little over a year ago I began working on my|deposits Scotland, an ASP.NET MVC3 web application for managing tenancy deposits. It was a month into the project before the subject of security arose and without any security expertise in the team, the podcast episode in which Troy Hunt Secures ASP.NET was welcome and timely.

Fast forward twelve months and I've been invited to help Security Innovation with their product, Team Mentor, which assists development teams reduce application security risk. It's quite a departure from the multi-threaded, multi-process, service oriented architecture work for pricing journeys at Transport for London. It's a good excuse to return briefly to some web development between these server-side gigs.

This project is a little unique in that all the source code is available on github. It's an interesting model in that the platform is entirely free and is designed to serve as an example of best practise, besides allowing analysis using the tools available once compiled/hosted. So the license (less free), essentially, grants access to the encyclopedia of vulnerabilities and prevention guidance.

I'm working with @diniscruz for the next ten days. This looks interesting,

For When You Absolutely Need a Bigger Stack Than RAM Allows

by Kofi Sarfo 6. April 2011 10:32

Recently, I found an interesting programming exercise, which read:

"A robot is located at the bottom-left corner of a 5x5 grid. The robot can move either up, down, left, or right, but can not visit the same spot twice. The robot is trying to reach the top-right corner of the grid."

In a series of blog posts we've managed to solve this simply and reasonably quickly whilst demonstrating the process of going from problem to solution. However, we cannot say we've done this very efficiently. Shredding through MB of memory in order to complete calculations like this one above meant we tired quickly of the OutOfMemoryException when attempting this for grids larger than 6x6. To help put this in perspective, that's only 36 squares! It shouldn't be so difficult.

The method in a nutshell: The paths being taken were stored on a Stack and at each juncture where, for the next step, there was more than one square to choose from a path was added to the stack for each after the current path had been removed from the stack. See, is this why few bother with pseudo-code. Thankfully, nobody's reading this so no clearer explanation is necessary.

Having worked out that some kind of database is required and having ruled out a relational database we turn to Redis, an open source, advanced key-value store. The best part is that apart from being lightening fast it implements a stack. Perfect!

Whilst we're adding code to use Redis stack functionality we're interested in being able to preserve and swap out the different stack implementations: Memory, SQL Server & Redis. To do this we abstract towards an interface:

interface IStack { int Count(); int MaxCount(); void Push(T journey); T Pop(); }

The IStack of T is introduced to allow List and List to be used indiscriminately for journeys as we've not settled on a representation for our disk based stack. The implementation for each of the three now looks like this:

class MemStack : IStack { private Stack journeys; public MemStack() { journeys = new Stack(); } public void Push(T journey) { journeys.Push(journey); } public T Pop() { var journey = journeys.Pop(); return journey; } private int maxCount; public int MaxCount() { return maxCount; } public int Count() { var count = journeys.Count; if (count > maxCount) maxCount = count; return journeys.Count; } } class RediStack : IStack<List<int>> { RedisNativeClient redis = null; private const string explorerKey = "Whitt-Ku.Journeys"; public RediStack() { redis = new RedisNativeClient(); redis.FlushAll(); } public void Push(List<int> journey) { redis.RPush(explorerKey, journey.GetBytes()); } public List<int> Pop() { var bytes = redis.RPop(explorerKey); return bytes.ToList<int>(); } private int maxCount; public int MaxCount() { return maxCount; } public int Count() { var count = redis.LLen(explorerKey); if (count > maxCount) maxCount = count; return count; } }

The RedisNativeClient is in fact a class within the ServiceStack.Redis library. It's an "Open Source C# Redis client based on Miguel de Icaza previous efforts with redis-sharp." We change the program for the last time:

public static Board Board; public static IStack<List<int>> Journeys; static void Main(string[] args) { var boardLength = int.Parse(args[0]); Board = new Board(boardLength); Journeys = new RediStack(); var inaugural = new List<int>(); inaugural.Add(Board.StartLocation); Journeys.Push(inaugural); var factory = new JourneyFactory(); while (factory.HasIncompleteJourneys) { var explorersCount = 0; foreach (var journey in factory.GetJourneys()) { if (journey[journey.Count - 1] == Board.EndLocation) JourneyController.CompleteJourneys.Enqueue(journey); else Journeys.Push(journey); } } }

We run this with fingers crossed for a 5x5 grid and find that it's ~70x slower than using RAM:


	There are 8,512 unique paths from the top-left corner to the bottom-right corner of a 5x5 grid!
	There were 90,110 robots used to discover this!
        Max stack: 16
        Total memory: 3,740,864
        Time elapsed: 18,538ms
        Garbage collection gen0: 701
        Garbage collection gen1: 10
        Garbage collection gen2: 0

The most recent results using RAM and List Generics implementation:


	There are 8,512 unique paths from the top-left corner to the bottom-right corner of a 5x5 grid!
	There were 90,110 robots used to discover this!
        Max stack: 16
	Total memory: 2,704,944
	Time elapsed: 256ms
	Garbage collection gen0: 25
	Garbage collection gen1: 6
	Garbage collection gen2: 0

The reason why the same implementation using List was never used as the approach for Redis is shown below:


	There are 8,512 unique paths from the top-left corner to the bottom-right corner of a 5x5 grid!
	There were 90,110 robots used to discover this!
        Max stack: 16
	Total memory: 3,798,576
	Time elapsed: 1,302ms
	Garbage collection gen0: 49
	Garbage collection gen1: 4
	Garbage collection gen2: 0

The code responsible for converting List<int> to List<string> has terrible performance so a binary representation of List<int> is used instead as the value stored in Redis.

Finally, with little more performance to squeeze out using the current architecture we invite you to use this let us know how many unique paths there are on a 10x10 grid. Thanks.

SQL Server Based Stacks. Why Would You, Really?

by Kofi Sarfo 5. April 2011 07:47

Recently, I found an interesting programming exercise, which read:

"A robot is located at the bottom-left corner of a 5x5 grid. The robot can move either up, down, left, or right, but can not visit the same spot twice. The robot is trying to reach the top-right corner of the grid."

In previous posts we'd seen crude solutions to the problem above. Good news is that there's little respite to be found in this post also and it's more of the same sort of entertainment. Those concerns we've had so far remain, a burgeoning memory requirement, whilst looking to apply this same calculation for a 10x10 grid. The problem has been that our stack grows and then explodes resulting in an OutOfMemoryException. Naturally, the answer must be to use a disk-based stack. You can't help but think that the right answer for this involves NoSQL but, dammit, we're going to prove a relational database can't handle it before removing our tin foil hat!

First there's just the small matter of some unexpected performance gains to be made by replacing the LinkedList<int> with List<int> when representing robot journeys. Another little refactoring results in a decent performance improvement! Our Program class stripped of validation and instrumentation looks like this:

public static Board Board; static void Main(string[] args) { var boardLength = int.Parse(args[0]); Board = new Board(boardLength); var newJourney = new List<int>(){ Board.StartLocation }; var explorer = new BoardExplorer(newJourney); var journeys = new Stack<List<int>>(); var factory = new BoardExplorerCloneFactory(journeys); journeys.Push(explorer.Journey); while (factory.HasClones) { var explorersCount = 0; UpdateInstrumentation(); foreach (var movedExplorer in factory.GetExplorers()) { if (movedExplorer.IsExploring) journeys.Push(movedExplorer.Journey); explorersCount++; } } }

Changes to the BoardExplorer are almost too trivial to show once again. More fluff than stuff, cue our newest, lighter-weight robot:

public struct BoardExplorer { public List<int> Journey; public BoardExplorer(List<int> journey) { this.Journey = journey; } internal int GetLocation() { return Journey[Journey.Count-1]; } public bool IsExploring { get { return GetLocation() != Program.Board.EndLocation; } } internal bool HasVisited(int location) { foreach (var visitedLocation in Journey) if (visitedLocation == location) return true; return false; } public BoardExplorer Move(int location) { var journeyBeyond = new List<int>(Journey); journeyBeyond.Add(location); var moved = new BoardExplorer(journeyBeyond); if (!moved.IsExploring) BoardExplorerController.CompleteJourneys.Enqueue(moved.Journey); return moved; } }

The Controller continues to look as anaemic as ever.

public class BoardExplorerController { public static Queue<List<int>> CompleteJourneys = new Queue<List<int>>(); internal static IEnumerable<int> GetValidMoves(BoardExplorer explorer) { var availableMoves = Program.Board.GetAvailableMovesFromLocation(explorer.GetLocation()); while (availableMoves.Count > 0) { var location = availableMoves.Pop(); if (!explorer.HasVisited(location)) yield return location; } } }

And whoops of delight followed. Not really. Better numbers though.


	There are 1,262,816 unique paths from the top-left corner to the bottom-right corner of a 6x6 grid!
	There were 18,470,410 robots used to discover this!
	Total memory: 321,723,412
	Time elapsed: 44,092ms
	Garbage collection gen0: 2,923
	Garbage collection gen1: 1,019
	Garbage collection gen2: 0

We managed to knock off 30 seconds from the time it takes to calculate those paths. Almost twice the performance greets muted celebration. Is this good enough for a grid almost three times the size? Few seem to think so. First, let's see how this copes with a 7x7 grid.


	Out of Memory exception! 4,884,621 paths found before I capitulated.

Looking at the current state of our robot affairs we believe there's one final optimization we can make before we resort to storing the journey stack on disk. The 'epiphany' is this: Why do we need the robot at all? It was suggested by the way the problem was phrased and had 'robot' been omitted from the problem statement then R2 might never have made it into our code. Essentially, we care only about unique paths so we're about to send R2 into retirement. Our new Program minus BoardExplorer results in impressive performance gains:


	There are 1,262,816 unique paths from the top-left corner to the bottom-right corner of a 6x6 grid!
	There were 18,470,410 robots used to discover this!
	Total memory: 319,371,768
	Time elapsed: 34,679ms
	Garbage collection gen0: 2,906
	Garbage collection gen1: 1,029
	Garbage collection gen2: 0

Something to cheer also is that we've reduced the size of the application. Fewer lines of readable code is always better. I can't think of an exception right now so it must be true. Our new Program without robot participation:

static void Main(string[] args) { var boardLength = int.Parse(args[0]); Board = new Board(boardLength); Journeys = new Stack<List<int>>(); var inaugural = new List<int>(); inaugural.Add(Board.StartLocation); Journeys.Push(inaugural); var factory = new JourneyFactory(); var warmTimer = GetStartedStopWatch(); while (factory.HasIncompleteJourneys) { var explorersCount = 0; UpdateInstrumentation(); foreach (var journey in factory.GetJourneys()) { if (journey[journey.Count - 1] == Board.EndLocation) JourneyController.CompleteJourneys.Enqueue(journey); else Journeys.Push(journey); explorersCount++; } } }

Without the BoardExplorer it no longer makes sense to have a BoardExplorerController so the controller becomes a JourneyController:

public class JourneyController { public static Queue<List<int>> CompleteJourneys = new Queue<List<int>>(); internal static IEnumerable<int> GetValidMoves(List<int> journey) { var availableMoves = Program.Board.GetAvailableMovesFromLocation(journey[journey.Count - 1]); while (availableMoves.Count > 0) { var location = availableMoves.Pop(); if (!journey.Contains(location)) yield return location; } } }

Our BoardExplorerCloneFactory is renamed to the more pleasant JourneyFactory:

class JourneyFactory { public static int Count = 0; public bool HasIncompleteJourneys { get; private set; } public JourneyFactory() { this.HasIncompleteJourneys = true; } internal IEnumerable<List<int>> GetJourneys() { HasIncompleteJourneys = false; while (Program.Journeys.Count > 0) { var explorer = Program.Journeys.Pop(); foreach (var location in JourneyController.GetValidMoves(explorer)) { HasIncompleteJourneys = true; Count++; var journey = new List<int>(explorer); journey.Add(location); yield return journey; } } } }

You know, we're quietly optimistic that we've a decent chance now at taking on this 7x7 grid. And our journey planner says:


	Out of Memory exception! 4,839,550 paths found before I croaked.

Okay, we've clearly found the limit. 'Clearly' because we've said so. It's time for our relational database management system and there's just one more change to make before we're good to call on SQL Server. We'll transform the List<int> into a comma separated list of integers in string format for database table insertion. In order to be able to easily switch between using a RAM based stack and a disk based stack we'll introduce an interface 'IStack':

interface IStack { int GetCount(); void Push(List<int> journey); List<int> Pop(); }

The RAM based implementation embraces composition over inheritence:

class MemStack : IStack { private Stack<List<int>> journeys; public MemStack() { journeys = new Stack<List<int>>(); } public void Push(List<int> journey) { journeys.Push(journey); } public List<int> Pop() { var journey = journeys.Pop(); return journey; } public int GetCount() { return journeys.Count; } }

In order to use the new MemStack there are three lines of code that change in the main Program:

public static Board Board; public static IStack Journeys; // <- Change #1 here static void Main(string[] args) { var boardLength = int.Parse(args[0]); Board = new Board(boardLength); Journeys = new MemStack(); // <- Change #2 here var inaugural = new List<int>(); inaugural.Add(Board.StartLocation); Journeys.Push(inaugural); var factory = new JourneyFactory(); while (factory.HasIncompleteJourneys) { var explorersCount = 0; foreach (var journey in factory.GetJourneys()) // <- Change #3 here { if (journey[journey.Count - 1] == Board.EndLocation) JourneyController.CompleteJourneys.Enqueue(journey); else Journeys.Push(journey); explorersCount++; } } }

Before going any further we make sure that we've not broken anything by running the application to test performance. It's interesting to note here that we've moved away from TDD and we're focused on PDD (Performance Driven Development). I didn't coin this phrase so if you've not heard of it then it's because the phrase never really caught on! The reason unit tests are of minimal importance here is because we're consistently able to guarantee the units work because if they didn't then we'd see an incorrect count for the number of unique paths. The performance after this refactoring show we're still good.


	There are 1,262,816 unique paths from the top-left corner to the bottom-right corner of a 6x6 grid!
	There were 18,470,410 robots used to discover this!
	Total memory: 321,396,728
	Time elapsed: 32,512ms
	Garbage collection gen0: 2,920
	Garbage collection gen1: 1,021
	Garbage collection gen2: 0

Now let's turn attention to the database and the SQL script for pursuing this absurd course:

CREATE TABLE [dbo].[Journey]( [Id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED , [Path] [varchar](128) NOT NULL ) GO CREATE PROCEDURE [dbo].[Push] @path varchar(128) AS DECLARE @id int INSERT INTO [Journey] ([Path]) SELECT @path SELECT @id = SCOPE_IDENTITY() GO CREATE PROCEDURE [dbo].[Pop] AS DECLARE @id int, @path varchar(128) SELECT TOP 1 @id = [Id], @path = [Path] FROM [Journey] ORDER BY id DESC DELETE FROM [Journey] WHERE id = @id SELECT @path 'Path' GO CREATE PROCEDURE [dbo].[CountPaths] AS SELECT COUNT(*) 'Count' FROM [Journey] GO

The database based stack implementation implements the IStack interface and uses Dapper.NET, a simple object mapper for .Net, for data access.

class DbStack : IStack { private string connectionString = "Server=.;Database=Wookie;Trusted_Connection=True;"; public void Push(List<int> journey) { using (var connection = new SqlConnection(connectionString)) { connection.Open(); var param = journey.ToCommaSeperatedIntegers(); var sql = string.Format("EXEC Push '{0}'", param); var result = connection.ExecuteMapperQuery(sql); } } private int maxCount = 0; public int MaxCount() { return maxCount; } public int Count() { using (var connection = new SqlConnection(connectionString)) { connection.Open(); var sql = string.Format("EXEC CountPaths"); var result = connection.ExecuteMapperQuery(sql); var count = result[0].Count; if (count > maxCount) maxCount = count; return count; } } public List<int> Pop() { using (var connection = new SqlConnection(connectionString)) { connection.Open(); var sql = string.Format("EXEC pop"); var results = connection.ExecuteMapperQuery(sql); return results.ToList(); } } }

In order to find out how large the stack grew, because this has implications on database performance, we've added a MaxCount method to the IStack interface.

interface IStack { int Count(); int MaxCount(); void Push(List<int> journey); List<int> Pop(); }

We're finally good to go so what's the reason for the pessimism that this is going to give less than stellar results? Well, for example:


	There are 8,512 unique paths from the top-left corner to the bottom-right corner of a 5x5 grid!
	There were 90,110 robots used to discover this!
	Total memory: 24,096,248
	Max stack: 16
	Time elapsed: 884,238ms

At this point we can forget about metrics in milliseconds. Using the new DbStack to calculate the unique paths in a 5x5 grid took almost fifteen minutes. Using MemStack, however, the same calculation takes ~200 milliseconds.


	There are 8,512 unique paths from the top-left corner to the bottom-right corner of a 5x5 grid!
	There were 90,110 robots used to discover this!
	Total memory: 24,096,248
	Max stack: 16
	Time elapsed: 163ms

DbStack is an incredible 5000x slower than MemStack in this instance so it appears we're not going to be able to calculate the paths for a 10x10 grid using a relational database to maintain the stack after all. In the next post we look at a more hopeful approach that involves NoSQL.

Structs For a More Satisfying Robot Factory

by Kofi Sarfo 4. April 2011 12:52

Recently, I found an interesting programming exercise, which read:

"A robot is located at the bottom-left corner of a 5x5 grid. The robot can move either up, down, left, or right, but can not visit the same spot twice. The robot is trying to reach the top-right corner of the grid."

So far we've come up with a solution but not altogether happy it's fit for future purpose. We're currently optimizing and the title gives away the fact that the next optimisation will be to use a Struct rather than a Class for the BoardExplorer. If we're going to be creating well over hundreds of millions of these robots then ideally we'd like them as lightweight as possible. The key aspects noteworthy parts of our new improved robot are shown below:

public struct BoardExplorer { public LinkedList<int> Journey; public int NextLocation; public BoardExplorer(int location, int nextLocation, LinkedList<int> journey) { this.Journey = journey; this.NextLocation = nextLocation; } internal int GetLocation() { return Journey.Last.Value; } internal void Move() { Journey.AddAfter(Journey.Last, NextLocation); } public bool IsExploring { get { return GetLocation() != Program.Board.EndLocation; } } internal LinkedList<int> CopyJourney() { var copy = new LinkedList<int>(); var steps = new int[Journey.Count]; Journey.CopyTo(steps, 0); return new LinkedList<int>(steps); } public BoardExplorer Clone(int location) { return new BoardExplorer(GetLocation(), location, CopyJourney()); } }

And the metrics for this change confirm this as being the way to go.

There are 1,262,816 unique paths from the top-left corner to the bottom-right corner of a 6x6 grid!
There were 18,470,410 robots used to discover this!
Total memory: 869,039,780
Time elapsed: 74,598ms
Garbage collection gen0: 4740
Garbage collection gen1: 1627
Garbage collection gen2: 0

The surprise here is the performance improvement. Memory use hasn't been affected much. Things to note here is that because a struct is a value type rather than a reference type these BoardExplorers aren't being stored on the heap so we need to explain why there is increased garbage collection activity!

Though our robots aren't being allocated memory on the heap their journeys do appear there as their of type LinkedList. Significantly, these are more transient and are removed more quickly than the earlier type of robots being stored and none of these robots too remain in memory by the time the calculation has been performed. Whilst this is certainly preferable it doesn't promise being able to use this improved solution for a 10x10 grid.

I think we can do better still and so we consider the next refactoring. We consider that the robot need not end on the queue as their journey is all we really care about so we make the following changes without tests driving these. In a contained system as small as this we have the luxury of being able to hold all the code and dependencies inside our developer's head. Ordinarily this may not be the case at all.

It's been a about a minute. How about some more refactoring? The BoardExplorerController is about to lose further responsibility!

public class BoardExplorerController { public static Queue<LinkedList<int>> CompleteJourneys = new Queue<LinkedList<int>>(); internal static IEnumerable<int> GetValidMoves(BoardExplorer explorer) { var availableMoves = Program.Board.GetAvailableMovesFromLocation(explorer.GetLocation()); while (availableMoves.Count > 0) { var location = availableMoves.Pop(); if (!explorer.HasVisited(location)) yield return location; } } }

The BoardExplorer has their Clone method replaced by a Move method which now also performs the action of propelling the robot onto the next square and then checking whether they've reached their destination.

public BoardExplorer Move(int location) { var journeyBeyond = CopyJourney(); journeyBeyond.AddLast(location); var moved = new BoardExplorer(journeyBeyond); if (!moved.IsExploring) BoardExplorerController.CompleteJourneys.Enqueue(moved.Journey); return moved; }

This looks like we've reintroduced recursion but the stack for maintaining robots only ever drops one level deep so there's little concern about additional memory requirement. New metrics will show this we're sure. Also the BoardExplorerCloneFactory needs to spit out BoardExplorers based on journeys found on the stack maintaining journeys in progress.

class BoardExplorerCloneFactory { public static int Count = 0; private Stack<LinkedList<int>> journeys; // <- changed from Stack public bool HasClones { get; private set; } public BoardExplorerCloneFactory(Stack<LinkedList<int>> journeys) { this.HasClones = true; this.journeys = journeys; } internal IEnumerable GetExplorers() { HasClones = false; while (journeys.Count > 0) { var journey = journeys.Pop(); var explorer = new BoardExplorer(journey); // <- new addition here too! foreach (var location in BoardExplorerController.GetValidMoves(explorer)) { HasClones = true; Count++; yield return explorer.Move(location); } } } }

I didn't think the additional step of constructing a robot from a saved journey would add much overhead either and it didn't:

There are 1262,816 unique paths from the top-left corner to the bottom-right corner of a 6x6 grid!
There were 184,70,410 robots used to discover this!
Total memory: 870,567,144
Time elapsed: 70,736ms
Garbage collection gen0: 4,694
Garbage collection gen1: 1,610
Garbage collection gen2: 0

The four second performance gain is spectacularly inconsequential. The memory requirement hasn't been dented either. To show that this three minute refactoring didn't need a test-first approach the changes to the main program appear below and this compiled first time without errors/warnings:

Board = new Board(boardLength); var newJourney = new LinkedList<int>(); newJourney.AddFirst(Board.StartLocation); var explorer = new BoardExplorer(newJourney); var journeys = new Stack<LinkedList<int>>(); var factory = new BoardExplorerCloneFactory(journeys); var warmTimer = GetStartedStopWatch(); journeys.Push(explorer.Journey); while (factory.HasClones) { var explorersCount = 0; UpdateInstrumentation(); foreach (var movedExplorer in factory.GetExplorers()) { if (movedExplorer.IsExploring) journeys.Push(movedExplorer.Journey); explorersCount++; } }

So we've abandoned fixing tests until the very end and we're happy with performance. This is because a contrived example allows it. Also, we know that the 'architecture' is going to keep changing and we don't yet know what it will resemble ultimately so many of the tests we write en-route will i) be thrown away almost immediately; and ii) slow us down. These robots are no Mars Rover!

To recap the series of changes, replacing instances of BoardExplorerControllers with a single static version yielded some gains. Replacing the BoardExplorer Class with a Struct had greater benefit too but it's really the BoardExplorerCloneFactory that's had the greatest impact. Storing journeys instead of robots helped only marginally in terms of performance and next we look at changing the representation stored once more and look outside for help in order to allow the size of the stack to grow without having the amount of memory consumed explode. Eventually, we may even be able to calculate the number of paths for a 10x10 grid.

Performance Metrics for a Design Driven by Tests

by Kofi Sarfo 2. April 2011 08:03

Recently, I found an interesting programming exercise, which read:

"A robot is located at the bottom-left corner of a 5x5 grid. The robot can move either up, down, left, or right, but can not visit the same spot twice. The robot is trying to reach the top-right corner of the grid."

Problem solved. We have a simple application that maps unique paths on a square board. However, there's one correctly held suspicion that this cannot scale. The reason for this is likely to be down to how memory is managed in .NET and that robot references are held as the controller recurses down to spawn more robots to explore the grid. This is an entirely crude implementation, and remember, one that had no prior design.

With the creation of each robot, a little more memory is required. For a grid of 3x3, for example, only ~250KB is used for the 50 or so robots used to map out the 12 unique paths. Once the grid increases to 5x5, however, there are 90,000+ robots chewing through ~65MB! of RAM. Despite this the Garbage Collector cleans up generation 0 only 47 times and the answer appears in less than half a second!

The way memory management occurs in .NET hasn't changed significantly since an article in the MSDN Magazine from November 2000. Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework. Essentially, every so often the garbage collector will remove objects from memory that no longer have objects referencing them. How often this clean up process occurs is determined by how often the memory allocated for the shortest living objects (Generation Zero) is filled. Recursion and deep stacks are a sure way for this to happen more frequently and to reduce the amount of memory available to quickly multiplying machines. There isn't even enough memory to calculate paths for an 8x8 grid so the design will need some attention!

First we allow for some instrumentation:

public class ProgramInstrumentation { public static int CollectionCount; public static long TotalMemory; } public struct BoardExplorerControllerInstrumentation { public int CollectionCount; public long TotalMemory; public BoardExplorerControllerInstrumentation(int collectionCount, long totalMemory) { this.CollectionCount = collectionCount; this.TotalMemory = totalMemory; } }

And adding a property on the BoardExplorerController to keep score.

// allows memory comparisons between board explorer controllers public BoardExplorerControllerInstrumentation Instrumentation { get; private set; }

In the previous version each robot had a property offering current location information

private LinkedListNode<int> CurrentLocation;

It's replaced by the last node of the journey because they point to the same location and this is redundant!

internal int LocationId { get { return Journey.Last.Value; } }

No celebration just yet, losing one reference type only buys us 16 bytes on x64 so we press on. If the problem is that we have controllers that create new controllers, each with the possibility of creating four new robots (one for each direction) then fixing this might mean separating creation from the responsibility for maintaining each bot. The controller passes off responsibility for newing up bots to a factory and then can place them on a queue. No further need for a robot's controller once this has been done then!

So let's use a BoardExplorerController with static methods rather than an instance of the object; no reference is held to either Board or BoardExplorer. A controller stripped bare:

public static Queue<LinkedList<int>> CompleteJourneys = new Queue<LinkedList<int>>(); internal static IEnumerable<int> GetValidMoves(BoardExplorer explorer) { var availableMoves = Program.Board.GetAvailableMovesFromLocation(explorer.Location); while (availableMoves.Count > 0) { var location = availableMoves.Pop(); if (!explorer.HasVisited(location)) yield return location; } } internal static BoardExplorer Move(BoardExplorer explorer) { explorer.Move(); if (explorer.Location == Program.Board.FinalSquare) { var journey = explorer.Journey; explorer = null; CompleteJourneys.Enqueue(journey); } return explorer; }

Before being placed on the queue, each robot now needs to know where it's going to be moved next so a property is added to the BoardExplorer.

public int NextLocation { get; set; }

The result is somewhat expected. Memory use is improved requiring only ~10MB, down from ~65MB and performance is comparable. Go team! The secret source is in the Factory and the main Program refactored to perform more coordination:

class BoardExplorerCloneFactory { public static int Count = 0; private Stack explorers; public bool HasLocations { get; private set; } public BoardExplorerCloneFactory(Stack explorers) { this.HasLocations = true; this.explorers = explorers; } internal IEnumerable GetExplorers() { HasLocations = false; while (explorers.Count > 0) { var explorer = explorers.Pop(); foreach (var location in BoardExplorerController.GetValidMoves(explorer)) { HasLocations = true; Count++; yield return explorer.Clone(location); } } } }

The Main method now resembles the following:

static void Main(string[] args) { Board = new Board(int.Parse(args[0])); var explorer = new BoardExplorer(); var explorers = new Stack(); var factory = new BoardExplorerCloneFactory(explorers); var warmTimer = GetStartedStopWatch(); explorers.Push(explorer); while (factory.HasLocations) { var explorersCount = 0; UpdateInstrumentation(); foreach (var newExplorer in factory.GetExplorers()) { var movedExplorer = BoardExplorerController.Move(newExplorer); if (movedExplorer != null) explorers.Push(movedExplorer); movedExplorer = null; explorersCount++; } } }

So our project stakeholders are thrilled that we now use less memory! Actually no. They don't care particularly. They want a solution that is capable of doing 10x10 grids and up. The metrics for a smaller grid 6x6 suggest that this solution won't get us there:


    There are 1,262,816 unique paths from the top-left corner to the bottom-right corner of a 6x6 grid!
    There were 18,470,410 robots used to discover this!
    Total memory: 995,957,488
    Generation Zero Garbage Collections: 2,936
    Time elapsed: 157,840ms

It's impressive to be creating and managing over 100,000 robots a second but at the expense of ~100MB this approach doesn't inspire much confidence. There must be more we can do in terms of the data structure we're using and that's a segue way into the next post.

Kiva Loans

  • Banco Comunal Corazon De Jesus Group

    Banco Comunal Corazon De Jesus Group

    Grocery Store

    Requested loan: $1500

    Amount raised: $0

    Leon, Nicaragua

    to buy stock merchandise for her store.

    Loan Now »

  • Nuestros Lirios De Los Valles Group

    Nuestros Lirios De Los Valles Group

    Grocery Store

    Requested loan: $2425

    Amount raised: $0

    Leon, Nicaragua

    to invest in working capital and stock her store for buying and selling of groceries such as coffee, rice, sugar, milk, bread, beans, detergents, soap, juice, and soda.

    Loan Now »

  • Vamos En Victoria Group

    Vamos En Victoria Group

    Grocery Store

    Requested loan: $950

    Amount raised: $0

    Leon, Nicaragua

    Buy food products such as coffee, rice, beans, oil, sugar, bread, milk, and eggs

    Loan Now »

To see more entrepreneurs »

Make a loan to an entrepreneur across the globe for as little as $25. Kiva is the world's first online lending platform connecting online lenders to entrepreneurs across the globe.