Leaky Abstraction – Linq Usage

I am not sure how many percents of developers are thinking about Leaky Abstraction when coding, especially coding in OOP umbrella. Me? Not much since recently. I do not know why. I just simply did not think about it. Common trends are that we, as developers, focus on the new technologies, design pattern, best practices, … all those cool and fancy stuff. Many developers build great things without knowing that concept. I understood that. However, what if we know it, we might build a better product with fewer bugs and easy to maintain. When I actually understand it, I feel smarter. Let’s see what I am talking about.

First, check it out from these trusted resources. If you are a developer, there is a high chance that you know those sources.

Wikipedia

Joel on Software

Coding Horror

Recent years, I improved my skill via Pluralsight The courses from Zoran Horvat have changed the way I think about programming, the way I think about OOP. At the same time, I have many code base that I have been working on for years. When I looked at the code and compared to what I have learned, there is a big gap. The problem was that I did not know how to close that gap. I was kind of stuck in my own thinking.

With time, I started to understand them deeply. And then I started to make small changes in a stable, safe way. Let’s start the journey.

 

Let’s assume that we have a School and many Teachers. And at the end of an education year, the school has to give statistic about

  1. How many teachers does it have?
  2. How many mathematics teachers?
  3. How many of them have been in the school for more than 10 years?

They are very basic requirements. Without any hesitation, one can come up with this code. It works perfectly as required.

   public class School
    {
        public string Name { get; set; }
        public IList<Teacher> Teachers { get; set; }
    }

    public class Teacher
    {
        public string Name { get; set; }
        public string Specialty { get; set; }
        public DateTime StartedOn { get; set; }
        public bool IsStillAtWork { get; set; }
    }
    class Program
    {
        static void Main(string[] args)
        {
            var school = new School
            {
                Name = "Gotham City"
            };
            school.Teachers.Add(new Teacher
            {
                Name = "Batman",
                Specialty = "Mathematics",
                StartedOn = DateTime.Now.AddYears(-11),
                IsStillAtWork = true
            });
            school.Teachers.Add(new Teacher
            {
                Name = "Joker",
                Specialty = "Chemical",
                StartedOn = DateTime.Now.AddYears(-6),
                IsStillAtWork = false
            });
            Console.WriteLine("Total teachers: {0}", school.Teachers.Count(x => x.IsStillAtWork));
            Console.WriteLine("Mathematics teachers are: {0}", 
                string.Join("; ", school.Teachers
                                        .Where(x => x.Specialty== "Mathematics")
                                        .Select(x => x.Name)));
            Console.WriteLine("> 10 years teachers are: {0}",
                string.Join("; ", school.Teachers
                                        .Where(x => DateTime.Now >= x.StartedOn.AddYears(10))
                                        .Select(x => x.Name)));
        }
    }

How many issues can you find in the above code? The total teacher count only counts the IsStillAtWork. However, the next 2 statements do not. Once identified, a developer can go in and fix the code easily: by adding one more condition for each where statement. A short-revised version

            Console.WriteLine("Total teachers: {0}", school.Teachers.Count(x => x.IsStillAtWork));
            Console.WriteLine("Mathematics teachers are: {0}", 
                string.Join("; ", school.Teachers
                                        .Where(x => x.Specialty== "Mathematics" && x.IsStillAtWork)
                                        .Select(x => x.Name)));
            Console.WriteLine("> 10 years teachers are: {0}",
                string.Join("; ", school.Teachers
                                        .Where(x => DateTime.Now >= x.StartedOn.AddYears(10) && x.IsStillAtWork)
                                        .Select(x => x.Name)));

So far so good! Where is the problem? where is the “Leaky Abstraction”?

 

Let’s distinguish the consumer and the domain. The “Program” class is the consumer. The School and Teacher are the domain. For those simple requirements, the consumer has to know too much about the domain knowledge which should be captured by the domain itself.

  • The consumer has to know how to filter teachers that are at work.
  • The consumer has to know how to decide a teacher is a mathematics.
  • The consumer has to know how to decide a teacher is at work for a long time.

What if we have many consumers over the School class? Then each consumer has to know that knowledge and makes its own implementation. Here we have a real problem of Leaky Abstraction at the simplest level of using Linq to filter data. We also have the duplication issue. The logic is duplicated. If the domain has only one consumer, that code is fine. It does what it is expected to do. In many applications, it is not the case, unfortunately.

It is really hard to have a code without leaky abstraction. It is kind of an impossible mission. What we should do is to aware of the situation, weigh the pros and cons of fixing them.

 

So what are possible solutions? The goal is to capture the logic inside the domain.

Solution 1: We could move the logic into the School class as below

   public class School
    {
        public string Name { get; set; }
        public IList<Teacher> Teachers { get; set; }

        public int CountTeacherAtWork()
        {
            return Teachers.Count(x => x.IsStillAtWork);
        }

        public IEnumerable<Teacher> MathematicsTeachers()
        {
            return Teachers.Where(x => x.Specialty == "Mathematics" && x.IsStillAtWork);
        }

        public IEnumerable<Teacher> ExperiencedTeachers()
        {
           return Teachers.Where(x => DateTime.Now >= x.StartedOn.AddYears(10) && x.IsStillAtWork);
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var school = new School
            {
                Name = "Gotham City"
            };
            school.Teachers.Add(new Teacher
            {
                Name = "Batman",
                Specialty = "Mathematics",
                StartedOn = DateTime.Now.AddYears(-11),
                IsStillAtWork = true
            });
            school.Teachers.Add(new Teacher
            {
                Name = "Joker",
                Specialty = "Chemical",
                StartedOn = DateTime.Now.AddYears(-6),
                IsStillAtWork = false
            });
            Console.WriteLine("Total teachers: {0}", school.CountTeacherAtWork());
            Console.WriteLine("Mathematics teachers are: {0}", 
                string.Join("; ", school.MathematicsTeachers()
                                        .Select(x => x.Name)));
            Console.WriteLine("> 10 years teachers are: {0}",
                string.Join("; ", school.ExperiencedTeachers()
                                        .Select(x => x.Name)));
        }
    }

The logic is captured in 3 methods: CountTeacherAtWork, MathematicsTeachers, and ExperiencedTeachers. So far so good! Any consumer can consume the API without worrying the logic. And we also solve the duplication issue.

But that solution has some issues

  1. The number of methods in School class will explode.
  2. Do we forget to check if Teachers list is null? Are we sure that we have a valid Teacher list?
  3. When adding new methods operating on the Teacher list, some might forget the add the IsStillAtWork condition.

Just name a few. In my opinion, the second issue is the worst.

Solution 2: Capture logic in a Collection class

It is a better solution. Instead of thinking about school and teacher, what if we think of “Collection of Teachers“? So at any point in time, instead of working with a single Teacher, we work with a collection of teachers. Sometimes, the collection might be empty or 1 item.

In OOP, when there is logic, they should be captured inside objects. Let’s another version, where the logic is captured in an object.

    public class TeacherCollection
    {
        private readonly IList<Teacher> _teachers = new List<Teacher>();

        public TeacherCollection(IEnumerable<Teacher> teachers)
        {
            if (teachers != null)
                _teachers = teachers.Where(x => x.IsStillAtWork).ToList();
        }

        public TeacherCollection WhereTeachMathematics()
        {
            return new TeacherCollection(_teachers.Where(x => x.Specialty == "Mathematics"));
        }

        public TeacherCollection WhereExperienced()
        {
            return new TeacherCollection(_teachers.Where(x => DateTime.Now >= x.StartedOn.AddYears(10)));
        }

        public int Count
        {
            get { return _teachers.Count; }
        }

        public IEnumerable<Teacher> AsEnumerable
        {
            get { return _teachers.AsEnumerable(); }
        }
    }

    public class School
    {
     
        public string Name { get; set; }
        public IList<Teacher> Teachers { get; set; }

        public TeacherCollection TeacherCollection
        {
            get
            {
                return new TeacherCollection(Teachers);
            }
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var school = new School
            {
                Name = "Gotham City"
            };
            school.Teachers.Add(new Teacher
            {
                Name = "Batman",
                Specialty = "Mathematics",
                StartedOn = DateTime.Now.AddYears(-11),
                IsStillAtWork = true
            });
            school.Teachers.Add(new Teacher
            {
                Name = "Joker",
                Specialty = "Chemical",
                StartedOn = DateTime.Now.AddYears(-6),
                IsStillAtWork = false
            });
            Console.WriteLine("Total teachers: {0}", school.TeacherCollection.Count);
            Console.WriteLine("Mathematics teachers are: {0}", 
                string.Join("; ", school.TeacherCollection
                                        .WhereTeachMathematics()
                                        .AsEnumerable
                                        .Select(x => x.Name)));
            Console.WriteLine("> 10 years teachers are: {0}",
                string.Join("; ", school.TeacherCollection
                                        .WhereExperienced()
                                        .AsEnumerable
                                        .Select(x => x.Name)));
        }
    }

By introducing the TeacherCollection object, we can handle the query side of the object. As the requirement arises, the number of “Where” clauses also increase. That is a challenge and we must keep an eye on the design and modify when necessary. Regardless of the problems might arise with the new design, we gain these benefits

  1. The collection object is completed. We do not have to check for the state of the object. The inner collection is always not null.
  2. This is a perfect example of the “Map-Reduce” pattern. Where the “Where” is the reduce. By introducing a number of proper “Where”, we can capture the logic and chain the condition to the final collection that we want. It is also easy to create a Map function which allows us to transform to the collection object.
  3. Immutable. Each “Where” will result in a new collection object. Because the collection object is designed to support the query side of the operation, it is crucial to keep in mind that you must not add methods that will change the internal state of the collection object.
  4. Improve readability. The better naming the better readability!

Hmm, where is the command side? How to modify the teachers? The command operations should belong to the main domain object. That is the School class. I still keep the Teachers property for that purpose. I know it is not a good design when exposing the Teachers property for that purpose. But it is there to demonstrate the point. In production code, I would design a better solution to deal with command operations. It is out of the scope of this post.

 

Wow, that is cool! We should go in and fix all the places in our current codebase. No! No! and No! Blindly apply anything will cause more damages than benefits. Linq is a powerful tool and we use it everywhere in the code. Our main problem is that we have not asked the right question. “Is it the right place to use Linq? Should we wrap the logic somewhere?” those are wonderful questions to ask whenever we code.

Ok, then what do I take from here? what should I do? I have a few suggestions

  1. Take a moment and understand the leaky abstraction. Try to map it with a real-life example. The textbook is hard to understand.
  2. Look at your current codebase. Find a place where you think there is a possibility of the leaky abstraction. Hints: Find the Linq usage, or properties of type List, Collection or Enumerable.
  3. To find out if your classes have Leaky Abstraction issue, try to see from the Consumer point of view.
  4. Evaluate the pros and cons. Simply consider is It worth the effort or not?
  5. Make one small change at a time.

It is a journey. Good things take time. Using the same approach, we can find more leaky abstraction in other areas of our code.

Thank you for your time. I hope you can take some and apply to your daily job.

Unit Test from Pain to Joy

Recently I have made an improvement in a project regarding its unit test. The improvement has a huge impact on the system in a good way. It also has a huge impact on my thinking, my philosophy about unit test, again, in a good way.

The Story

I have been working on a huge, complex codebase. It is still written with .NET 4.0, about 6 years old. Part of the system is WCF service employed CQRS style. The code has its existing unit test. And we have added more. The tests have both integration tests and mocked tests.

Integration tests, in our context, means starting with the top-most layer (presentation layer at WCF service) down to the database, and then getting data back from the database using Query Handler or Repository. In short, it is a complex testing style. And to my opinion, it is not good. I would not have done that.

Mocked tests, in our context, means mocking all the dependencies. In short, all the interfaces are mocked. Even for the domain objects, we also create proxy objects and mock their properties, methods. It turns out a big mistake.

Most of the time, we are mocking instead of testing state. There are a few problems (pain) with our approach.

First, it is hard to write. To mock dependencies, you have to know all the dependencies. Which means developers have to read the implementation code, figure out how they are interacting with each other, figure out what methods call on which interfaces. Those are not really fun. And they do not bring any real value.

Second, it is fragile. Whenever we add code or refactor a piece of code, the unit test breaks. Because the unit test assumes that a number of certain methods are there and that they are called in a specific order. Which is, in our context, not suitable.

Third, it is damn hard to write tests to verify a bug fix.

Root Cause

How did the hell on earth I end up in that situation? Everything has its own reasons. And I want to figure out mine.

I have bought this excuse for so long. And I was happy with it, unfortunately

Because the codebase is complex. The design was wrong. We do not have time to redesign it.

The fingers were pointing to the other guys, other developers. No, it was not true.

What role have I played in that mess?“, I asked. Oh, turn out I play a very big role. After reflection, here are some reasons to me not doing well on my that area.

Wrong Mindset

A long time ago, when I started knowing and writing unit tests, I was sold the thought that we have to cut the dependencies, especially the database dependency. And with the raising of mocking, the promising of TDD, I mocked (in the hope of cutting dependencies) as much as I could. It works in many scenarios. However, because I believe it is the right thing, I forget to ask right questions.

Not Ask Right Questions

I just wrote tests without asking right questions.

What am I testing here?

Kind of a stupid question, but very powerful. Depending on types of application, on the architecture, answers are different. Because of my wrong mindset, I focused on how instead of what. Answering that question allows me to analyze further, allows me to actually look at the system in a systematical way, instead of a theoretical way.

Solution

First, I decided to throw away what I thought I know about Unit Test. Here are what I want my Unit Test should be

  1. Easy to write, easy to understand from code.
  2. Resilient to refactoring. Do not have to modify unit tests when using another interface in the implementation code. In short, the tests should be there to guarantee the code correctness at maintenance phase.

While writing this post, I created a github repo, welcome to DotConnect.

What are We Testing?

Such a simple but powerful question! However, I sometimes forgot to ask. We know that we should write unit test for our code. Have we ever considered to answer that question properly? Take an example, given that we will build a simple web service (WCF) to CRUD an Account into the SQL database. What are we going to test? Each will have a different answer, thus, drive their unit test implementation.

When asking that question, the important is to remove the term Unit. I find it is a trap. When that term presents, my mind is trapped in defining what unit is. Therefore, I forget the purpose of my testing.

From my own opinion, at the abstract level, I will categorize them into 2 categories 1) Functional Test and 2) Architecture Test

Functional Test

That are tests to govern the correctness of the system. For this kind of test, we have to define clearly what is the final state.

InputOutput
The simple diagram of all processes

To implement a proper test, one must clearly define the Output. Some common outputs are 1) Database, 2) Filesystem

To define a good output, we have to define the proper scope (which comes later in this post)

Architecture Test

For some systems, architecture is important. Let’s say all the call to the Database must go through a repository. Or that the Controller (MVC application) must delegate the work to the next layer (such as Service or Command/Query Handler).

Usually, we use Mock to accomplish the testing goal. Because we do not really care about the actual implementation. We care about the sequence of calls.

What are Dependencies?

Dependencies must be listed out explicitly. At the minimum, there should be a simple diagram like this

Dependencies
High-level dependencies of the system. Each box (such as Google) is a dependency

And do not go into the detail of those dependencies. Better keep the high-level view.

What is the Scope?

Without a scope, things get messy. A proper, explicit said scope will help to define the Input and Output. I made a mistake at this question so I defined a wrong scope. I, once, defined scope at the Project level. I had a unit test for Command Handler project, which will mock the dependency to the Repository project. Then I had another unit test for Repository project. They, first, looked logical and reasonable. However, with the tested of the time, it proves I was wrong.

Once I realized it (and that is why I write this post), I defined the scope at Command Handler level only, remove the concept of Repository test. Which allows me to define the Input is the Command, and the Output is the changes in the database.

This is a game changer step for me. For years, I have been focusing on the term Unit. The problem is that it is hard to define the unit. Will it be a function, a method? Will it be a class? or Will it be an assembly? Well, I do not know. Better I just choose to forget about them.

So what do I have so far in my toolbox regarding unit test? Here they are

  1. Ask the question: What am I testing?
  2. Explicitly list all dependencies at high-level
  3. Define testing scope
My Unit Test toolbox
My Unit Test toolbox

Applicants

Back to the story, the system I have been working with is a complex system, a data-driven system. The data is back by SQL Server. From the architecture point of view, it is WCF service with CQRS architecture. When a command is executed, there are a bunch of things involved, the domain, the domain service, the external services (AD FS, payment service, …), … eventually, the data is saved into SQL Server database.

From the command side:

Q: What am I testing here?

A: Save data correctly in the database

We should not care about what domain involved, what domain service called, … They are internal implementation. And they are changed frequently. We chased the wrong rabbit hole.

From the query side:

Q: What am I testing here?

A: Get data from the database correctly.

We should not care about how data is filtered, how data is combined, … They are internal implementation. And the same reasoning goes as in Command.

In both cases, the test will give an input and verify the output. However, we still have a problem with dependencies. A big change in my mindset is that I no longer see the database as a dependency. Rather it is a part of the internal system. Why? Because it is an essential part. It can be set up locally and in CI environment. Therefore, my definition of dependency is

Dependency is external systems that we do not control. That they are hard to set up. That we do not care about their implementation. Database should NOT be one of them.

How to mock those dependencies of not using a Mocking framework? A good practice is that for every dependency, there should be a Proxy (the Proxy design pattern). The proxy implementation is injected at the runtime with the help of an IoC framework such as Windsor Container. For Unit Test, I create a fake implementation and tweak as I want.

I took me a little while to set up all those things. But it works. It gives a lot of payoffs.

Implementation Detail

[PS: This section has not finished yet. However, this post has started for a while. I think I should publish it and save the implementation detail for another post.]

To implement this kind of test, I need to interact with the database to insert and get the data regardless of the system logic. This separation is very important. To accomplish this, I use Linq To SQL.

Due to the confidential contract, I am going to create a simple demo instead of using a real application. Let’s create a simple User form MVC application.

[Code]

Having a separated assembly allows me to isolate the changes. The Linq DBContext allows me to interact with the database as I need.

All the tests have a pattern

  1. Assumption: Prepare data. This step insert the data into the database for the command to execute
  2. Arrange: Prepare the command to verify.
  3. Act: Invoke the command handler correspondent to the command. Each command has its own Command Handler.
  4. Assert: Verify the result. Use Linq To SQL to get the data from the database and verify our expectations.

Instead of repeating the steps, I create a Strategy pattern

When Mock?

There are still scenarios where the Mocking is a perfect solution. Usually, it is the top layer of the system. Let’s take MVC (WebAPI) as an example. In my opinion, the Controller should be as light-weight as possible. What a controller should do are

  1. Validate input
  2. Prepare a request (can be a Command or a Query)
  3. Dispatch the request to the next layer. If the system employees CQRS, that layer is usually a Command Handler or Query Handler.
  4. Return the result

Which steps should be mocked? The step #3. What are we testing? We test to ensure that the Controller sends correct command/query to the next layer. We test the behavior of Controllers. The mock might be a perfect fit for Architecture Test.

[Code]

What’s Next?

The implementation detail for all the stuff I write here. Now it is time to let this post out so I get something DONE.