Thinking Spatially
Twitter: JosephDaigle
E-mail: joseph _at_ cridion.com
December 3, 2010
Automatic App Startup with ASP.NET 4.0

Step 1: Setup your Application and Application Pool as you normally would.

Step 2: Change your Application Pool configuration:

For this we will change the startMode property from OnDemand to AlwaysRunning.

%windir%\system32\inetsrv\appcmd.exe set apppool "MyApplicationPool" /startMode:AlwaysRunning

Step 3: Change your Application configuration:

We need to set some properties. First serviceAutoStartEnabled is set to true. Then you must provide a serviceAutoStartProvider. If you have AppFabric installed, then you should already have one called Service in your applicationHostConfig. Another blog post will detail the process of creating an Auto Start Provider.

%windir%\system32\inetsrv\appcmd.exe set app "Default Web Site/MyApplication" /serviceAutoStartEnabled:true /serviceAutoStartProvider:Service

And that’s it!

UPDATE

I had mistakenly indicated that you need to set the serviceAutoStartMode property to All or Custom in your Application. However this property only exists if you have AppFabric installed. It is used by AppFabric’s Auto Start Provider to know which WCF services to automatically start-up.

July 21, 2010
From a Big Ball of Mud to Query Model Zen

I have found that moving away from an anemic domain model has been a relatively organic process, particularly on the querying side of the equation. Like most that have come before our single-model system architecture pattern was based on NHibernate. This meant that the ORM was responsible for pretty much everything: querying, persisting, transaction management, and to an extent, how we built out our behaviors. While re-architecting with a CQRS approach, the path of least resistance seemed to be to continue to use NHibernate to drive querying.

Dreaming About ICriteria

This approach works, and pretty well I might add. There are three basic steps:

  1. Build your DTO
  2. Create your NHibernate class map
  3. Build a query object to encapsulate ICriteria/HQL/LINQ/whatever

And it actually turned out to be extremely simple. That was… until we weren’t doing simple queries.

Dreams Turn into Nightmares

Our query model only had a few requirements, most notably our DTOs needed to be flat representations of the specific data we needed for our view and we need to control the specific SQL statements that are run for performance reasons. This essentially means our DTOs are the view model itself, but is populated from a database query.

But what about a view model that requires pulling in data from multiple tables, spanning several joins, and joins from joined tables? NHibernate’s mapping model favors building out many entities, instead trying to cram everything into one. As a result, you can only map a join one level deep. So this isn’t going to work.

But there is a solution, and an elegant one at that. We can simply define the base of our query (SELECT … FROM … JOIN …) in a SQL view, and map our DTO straight to that view. Success! This approach is actually quite clean, fairly elegant, and more importantly it works. It really allows us to create exactly the SQL we want executed on the server.

Exactly the SQL We Want Executed

Let’s take a step back and think about this… what is the simplest possibly way to execute exact SQL against a database and return some data to our view? That would probably be writing some SQL and using a data reader. Maybe we could take the results from that data reader and map them to a DTO, 1-1 with columns/properties. Why do we even need ORM?

The ORM adds complexity. We need to create mapping which end up 1-1 99.99% of the time anyway. So let’s just get rid of that, map directly via some simple reflection.

Our queries are defined as having fixed sets of parameters. Why do we need to create a query using a dynamic engine such as ICriteria? So let’s just pass our parameters directly into a SQL command.

Now we just need a place to define our SQL statement.

Query Nirvana

We’re almost there. We’ve now simplified our model to only requiring a DTO and an object to encapsulate finding a query by name and parameter. Now we just need to store our SQL statements.

The easiest solution might be just to write the SQL in code, returning SQL commands. Yeah… that’s easy. It’s messy.

What about an XML file. I know… it’s ugly. But this isn’t code; it’s just a resource our code consumes so I think it’s perfectly acceptable. XML also lets use declaratively describe other things about the query such as parameter sizes (to optimize the number of query plans compiled on the server) and lets have an optimized “Select” and “Count” statement for each named query. So now we have three items: our DTO, an object that returns queries for that DTO, and a file containing the named query definitions. Our query definitions can live right next to where they are used allow for quick maintenance.

We could take this a step further and build some sort of testing framework which can consume all of the queries in the system, verifying that they will run against a particular database schema. Automate this and you won’t have to worry about queries becoming out-of-sync from the database.

Summary

We split up our single-model architecture into separately architected components so that we could optimize how we query against the database. Essentially this means crafting each query so that the SQL is exactly what we want to execute on the server. Starting with NHibernate, because it was closest thing to how queries were previously working, we discovered that more time was spent getting NHibernate to execute specific SQL than we were spending creating new queries. As a result, moving completely away from NHibernate and just crafting SQL queries exactly how we want was found to be the simplest solution. With some clever architecture, we can build a system that uses hand-crafted SQL but is still maintainable.

July 20, 2010
Event Sourcing and a Relational Database: Best of Both Worlds?

Many proponents of event sourcing (the pattern of generating events that encapsulate a series of changes to the state of your system) like to argue that a relational database adds too much friction to your overall system architecture. For instance, Greg Young suggests that events can simply be serialized into an event store and published such that a reporting store can subscribe and update persistent view models. The approach does not necessarily require a relational database system to store the consistent state your system, instead state is recreated by replaying events from a snapshot.

I like the simplicity of this architecture. But I also feel it adds complexity and a certain level co-dependence on particular system architecture. For instance, what if the system architecture radically changes in the next 5 years? Where is all my data?

I believe a relational database with a well designed and normalized schema will allow your business data to long out live any applications that use it. I’ve seen this first hand as applications are rebuilt from scratch, it works.

However I think it’s entirely possibly to build an OLTP system, based on a relational database, and still use event sourcing to solve certain problems.

Queries

All applications perform queries. When building an application with many views it becomes a fairly straight-forward task of first defining the data necessary for the view, and then constructing a database query with returns exactly the data required.

A normalized database schema means there is no doubt I’m getting the most consistent and up-to-date state of information that has been processed by the system. However the major caveat is that many systems will perform an order of magnitude more reads than writes, and in certain high-load scenarios, you could easily make the relational database unavailable for other reads, or more importantly the processing of transactions. Caches

Entering caching: everyone’s favorite solution to dealing with preventing a database from being hammered by queries. But there is a catch, the moment you introduce caching, you give up you consistency.

Well okay, so no multi-user system is actually 100% consistent anyway. Users are always looking at stale data regardless of how awesome our database is. The great thing about building a cache is that we can control the SLA for the staleness of the data. One minute? Five minutes? Three hours? Take your pick, and be prepared explain eventual consistency to your business analysts and sales folks.

But what if we could invalidate our caches when the underlying data changed? I think this is a perfect example of where event sourcing shines.

Event Sourcing

Let’s image your system displays a nearly real time list of patients checking in at a very busy doctor’s office. The system, behind the scenes, is doing all sorts of fancy processing such as checking the patient’s insurance for eligibility and benefits. The information is then presented to the front-desk workers nearly as soon as it updates.

Now let’s imagine it’s a web application where we can’t easily subscribe to events. Instead we need to poll the server every few seconds. Multiply this by several hundred facilities that your system supports, and suddenly we’re performing more queries than the database can easily support.

Wouldn’t it better if the information displayed on these lists was in some sort of materialized view? By this a mean a table or cache that contains exactly the data we need to display, indexed and specifically optimized for the one query we need to perform.

In my mind, this is an example of where event sourcing will shine. We can have some autonomous component that subscribes to events that would ultimately cause data stored in this materialized view to change. Depending on exactly how this data is stored, we might be provided with a row identifier along with our event, or perhaps an identifier for a subset of rows. Now our “de-normalizer” can perform a query against the normalized database schema, and pre-populate our materialized view. The view will query, and data is returned far more efficiently that joining together from our normalized schema.

This type of caching might also require some sort of SLA for what would be considered “live” data. We can’t honestly be expected to maintain the cache of all possible records in our system (maybe we could because storage is cheap), but we don’t necessarily want to. So instead we only keep track of the rows that count.

Pretty cool.

November 15, 2009
Version Numbering for Frequently Released Software

Proper software version numbering is always a complicated decision to make. Virtually all software shops implement their own system or standard. However for software that is frequently released such as a website I have discovered and refined a standard that I very much enjoy:

[Year].[Month].[Day].[Hour]

This gives you several advantages:

  1. By the version number alone, you know when the release was made.
  2. You can support more than 1 release in a day using the [Hour] increment.
  3. Software is easier to rollback based on a date.

So for instance, if I were to version this blog post, it would be: 2009.11.15.2151.

August 17, 2009
Achieving the Singleton Pattern Without Going Mad

In a world that disavows the mistreated Singleton Pattern we can indeed discover inner-peace with our code.

The singleton anti-pattern usually appears in this naive form:

public static class Foo
{
    static Foo instance;    
    public static Foo Current
    {
        if (instance == null)
        {
            instance = new Foo();
        }
        return instance;
    }
    private Foo()
    {
    }
}

But this creates a world of hurt. What if my Foo takes in a dependency Bar? What if Bar takes a dependency Baz? Now we’re in trouble. We have to manage the lifetime of our Foo object AND handle dependencies? What about concurrency? If only there was a tool that did these sort of things for us…

Oh wait! DI and IoC to the rescue. I’m a personal fan of StructureMap, but in the end I don’t care what library you use as long as you use it to create your singletons.

June 11, 2009
Why Debug.Assert is only useful for defining behavior and not contracts.

As programmers we are often find ourselves writing code which will be ultimately be reused by another programmer. In order to accomplish this we generally like to provide two useful bits of information. The first is a description of the expected behavior of a method or function. It’s usually pretty easy to figure this out, and really easy if the programmer properly documented the code!

The code contract is other part of the equation. Unfortunately C# only gets us part of the way there. It defines input parameters (and their types) and any return types. But it doesn’t indicate anything further. Null referenced? Out-of-bound values? These can’t be specified by the language itself.

The following is taken from Microsoft Research http://research.microsoft.com/en-us/projects/contracts/faq.aspx on why Debug.Assert is not fully suited for specifying these sort of “method-boundary” contracts.

Precondition visibility: Preconditions specify the conditions a caller should establish prior to calling a method. Because of this, preconditions must be visible to the caller in the sense that the caller should be able to determine if he/she satisfies the preconditions. Thus, preconditions should not refer to internal state that the caller cannot access. Debug.Assert on the other hand can be used to specify internal consistency.

Postconditions: Using Debug.Assert for postconditions is tedious and error-prone. A normal postcondition should be checked on every normal return from a method. With Debug.Assert, the programmer is forced to insert it at the appropriate places in the code. This is error prone, causes duplication, and is hard to maintain.

Inheritance: Method contracts of overridden methods should have the same precondition as their parent methods, and postconditions that are at least as strong as the parent method’s postconditions. With Debug.Assert, the programmer has to duplicate pre- and postcondition checks in their code, making it error prone. Also, there is no tool support to enforce the required consistency among contracts in overridden methods.

To really fix the issue of method-boundary contracts we would need to introduce new language or CLR features. However there are a few patterns you can follow.

  1. Check all parameters arguments at the start of the method and throw the appropriate exception or use Debug.Assert. This is the easiest to implement, but it is tedious and repetitive.

  2. Check the reference or value of the return type when the method returns and throw the appropriate exception or use Debug.Assert. This pattern seems easy enough, but can get awfully complicated, especially dealing with a method or function with multiple places it can return. It quickly becomes really hard to maintain.

The hope is that the Microsoft Code Contracts problem evolves well enough to help solve these problems at the CLR level. We’ll just have to wait and see.

April 28, 2009
Building a “Database Schema Discoverable” DAL

I spend a lot of my time thinking about and researching ORM technology since the software I write is usually working directly at the database and data model level. From my findings I’ve discovered that regardless of whether it’s commercial, free or open source, ORMs all share a flaw; it never matches your data access needs 100%. Some products are just too simple or not flexible enough for your software goals. Other products are heavy and bloated, and sometimes they even make solving business problems harder and more complicated. My conclusion: ORMs suck. So I ended up rolling my own data access layer. I don’t necessarily recommend or encourage others to do this because it’s not a trivial task by any means. But I had a goal in mind to basically build an anti-ORM. I may in the future talk about how I accomplished this, but today I will be focusing on schema discovery techniques.

Database schema discovery is a key requirement for the type of data access layer needed for the software we write here. What does it entail? Because we don’t have an ORM we don’t have any static information about the persistence structure of a database. So for a given connection we will obtain a representation of tables and columns which will be used by a query engine to generate proper SQL commands.

The approach I took is to discover information on a table-by-table basis. When a table is referenced in code by name it will first check the local schema information cache. If that results in a miss, a call is made to some code which will attempt to pull information about that table down from a connection. If it can’t find the table, nothing is returned. The code that retrieves the schema is of course going to be somewhat specific to your persistence layer. But if you can establish an OleDb connection to your persistence, this is probably the easiest approach The code is pretty simple and looks something like this:


using (OleDbCommand command = 
       new OleDbCommand(fullyQualifiedTableName, connection))
{
    command.CommandType = CommandType.TableDirect;
    try
    {
        using (OleDbDataReader reader = 
               command.ExecuteReader(CommandBehavior.KeyInfo | 
                                     CommandBehavior.SchemaOnly))
        {
            DataTable schemaTable = reader.GetSchemaTable();
            foreach (DataRow row in schemaTable.Rows)
            {
                if (row.Table.Columns.Contains("IsHidden") && 
                   (bool)row["IsHidden"])
                {
                // Don't process "hidden" columns. Some views
                // have duplicate columns which are hidden.
                continue;
                }
            // Get field information here
            string columnName = (string)row["ColumnName"];
            }
        }            
    }    
    catch (OleDbException exception)
    {
        // Table not found
        if (exception.ErrorCode == -2147217865)
            return null;        
        else
            throw;
    }
}

That’s basically it. But you should refer to the GetSchema implementation details for OleDB here for all the column information you can pull out of the database.

For ArcObjects it’s even easier, so easy I won’t even go into that much detail. From your open object class using the ITable interface you can pull out an enumeration of fields. The IField interface contain all the schema information you want. I will however point out the ISubtypes and IWorkspaceDomains interfaces. These are really useful for obtaining subtype information from a class and domain information from the SDE schema itself, which is something you probably want to use in your applications.

April 22, 2009
…you can’t replace a programmer with just any other programmer and get similar results.
April 16, 2009
An Introduction

I have two goals in mind with this blog. First to provide an outlet for the crazy ideas I get in my head. The second is to give back to the community of programmers and GIS experts I rely on and hopefully provide something useful to others.

That said I plan to cover topics ranging from enterprise .NET programming and software engineering to GIS and everything in between.

Unfortunately for the time being there will be a few rules. Well, really only one. I need to keep my employer and industry confidential; the competition is tight in this extremely niche market!

I’m looking forward to this. It will be a grand adventure.

April 15, 2009
Hello World?

First post.