Leaky Abstraction – Linq Usage

I am not sure how many percents of developers are thinking about Leaky Abstraction when coding, especially coding in OOP umbrella. Me? Not much since recently. I do not know why. I just simply did not think about it. Common trends are that we, as developers, focus on the new technologies, design pattern, best practices, … all those cool and fancy stuff. Many developers build great things without knowing that concept. I understood that. However, what if we know it, we might build a better product with fewer bugs and easy to maintain. When I actually understand it, I feel smarter. Let’s see what I am talking about.

First, check it out from these trusted resources. If you are a developer, there is a high chance that you know those sources.

Wikipedia

Joel on Software

Coding Horror

Recent years, I improved my skill via Pluralsight The courses from Zoran Horvat have changed the way I think about programming, the way I think about OOP. At the same time, I have many code base that I have been working on for years. When I looked at the code and compared to what I have learned, there is a big gap. The problem was that I did not know how to close that gap. I was kind of stuck in my own thinking.

With time, I started to understand them deeply. And then I started to make small changes in a stable, safe way. Let’s start the journey.

 

Let’s assume that we have a School and many Teachers. And at the end of an education year, the school has to give statistic about

  1. How many teachers does it have?
  2. How many mathematics teachers?
  3. How many of them have been in the school for more than 10 years?

They are very basic requirements. Without any hesitation, one can come up with this code. It works perfectly as required.

   public class School
    {
        public string Name { get; set; }
        public IList<Teacher> Teachers { get; set; }
    }

    public class Teacher
    {
        public string Name { get; set; }
        public string Specialty { get; set; }
        public DateTime StartedOn { get; set; }
        public bool IsStillAtWork { get; set; }
    }
    class Program
    {
        static void Main(string[] args)
        {
            var school = new School
            {
                Name = "Gotham City"
            };
            school.Teachers.Add(new Teacher
            {
                Name = "Batman",
                Specialty = "Mathematics",
                StartedOn = DateTime.Now.AddYears(-11),
                IsStillAtWork = true
            });
            school.Teachers.Add(new Teacher
            {
                Name = "Joker",
                Specialty = "Chemical",
                StartedOn = DateTime.Now.AddYears(-6),
                IsStillAtWork = false
            });
            Console.WriteLine("Total teachers: {0}", school.Teachers.Count(x => x.IsStillAtWork));
            Console.WriteLine("Mathematics teachers are: {0}", 
                string.Join("; ", school.Teachers
                                        .Where(x => x.Specialty== "Mathematics")
                                        .Select(x => x.Name)));
            Console.WriteLine("> 10 years teachers are: {0}",
                string.Join("; ", school.Teachers
                                        .Where(x => DateTime.Now >= x.StartedOn.AddYears(10))
                                        .Select(x => x.Name)));
        }
    }

How many issues can you find in the above code? The total teacher count only counts the IsStillAtWork. However, the next 2 statements do not. Once identified, a developer can go in and fix the code easily: by adding one more condition for each where statement. A short-revised version

            Console.WriteLine("Total teachers: {0}", school.Teachers.Count(x => x.IsStillAtWork));
            Console.WriteLine("Mathematics teachers are: {0}", 
                string.Join("; ", school.Teachers
                                        .Where(x => x.Specialty== "Mathematics" && x.IsStillAtWork)
                                        .Select(x => x.Name)));
            Console.WriteLine("> 10 years teachers are: {0}",
                string.Join("; ", school.Teachers
                                        .Where(x => DateTime.Now >= x.StartedOn.AddYears(10) && x.IsStillAtWork)
                                        .Select(x => x.Name)));

So far so good! Where is the problem? where is the “Leaky Abstraction”?

 

Let’s distinguish the consumer and the domain. The “Program” class is the consumer. The School and Teacher are the domain. For those simple requirements, the consumer has to know too much about the domain knowledge which should be captured by the domain itself.

  • The consumer has to know how to filter teachers that are at work.
  • The consumer has to know how to decide a teacher is a mathematics.
  • The consumer has to know how to decide a teacher is at work for a long time.

What if we have many consumers over the School class? Then each consumer has to know that knowledge and makes its own implementation. Here we have a real problem of Leaky Abstraction at the simplest level of using Linq to filter data. We also have the duplication issue. The logic is duplicated. If the domain has only one consumer, that code is fine. It does what it is expected to do. In many applications, it is not the case, unfortunately.

It is really hard to have a code without leaky abstraction. It is kind of an impossible mission. What we should do is to aware of the situation, weigh the pros and cons of fixing them.

 

So what are possible solutions? The goal is to capture the logic inside the domain.

Solution 1: We could move the logic into the School class as below

   public class School
    {
        public string Name { get; set; }
        public IList<Teacher> Teachers { get; set; }

        public int CountTeacherAtWork()
        {
            return Teachers.Count(x => x.IsStillAtWork);
        }

        public IEnumerable<Teacher> MathematicsTeachers()
        {
            return Teachers.Where(x => x.Specialty == "Mathematics" && x.IsStillAtWork);
        }

        public IEnumerable<Teacher> ExperiencedTeachers()
        {
           return Teachers.Where(x => DateTime.Now >= x.StartedOn.AddYears(10) && x.IsStillAtWork);
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var school = new School
            {
                Name = "Gotham City"
            };
            school.Teachers.Add(new Teacher
            {
                Name = "Batman",
                Specialty = "Mathematics",
                StartedOn = DateTime.Now.AddYears(-11),
                IsStillAtWork = true
            });
            school.Teachers.Add(new Teacher
            {
                Name = "Joker",
                Specialty = "Chemical",
                StartedOn = DateTime.Now.AddYears(-6),
                IsStillAtWork = false
            });
            Console.WriteLine("Total teachers: {0}", school.CountTeacherAtWork());
            Console.WriteLine("Mathematics teachers are: {0}", 
                string.Join("; ", school.MathematicsTeachers()
                                        .Select(x => x.Name)));
            Console.WriteLine("> 10 years teachers are: {0}",
                string.Join("; ", school.ExperiencedTeachers()
                                        .Select(x => x.Name)));
        }
    }

The logic is captured in 3 methods: CountTeacherAtWork, MathematicsTeachers, and ExperiencedTeachers. So far so good! Any consumer can consume the API without worrying the logic. And we also solve the duplication issue.

But that solution has some issues

  1. The number of methods in School class will explode.
  2. Do we forget to check if Teachers list is null? Are we sure that we have a valid Teacher list?
  3. When adding new methods operating on the Teacher list, some might forget the add the IsStillAtWork condition.

Just name a few. In my opinion, the second issue is the worst.

Solution 2: Capture logic in a Collection class

It is a better solution. Instead of thinking about school and teacher, what if we think of “Collection of Teachers“? So at any point in time, instead of working with a single Teacher, we work with a collection of teachers. Sometimes, the collection might be empty or 1 item.

In OOP, when there is logic, they should be captured inside objects. Let’s another version, where the logic is captured in an object.

    public class TeacherCollection
    {
        private readonly IList<Teacher> _teachers = new List<Teacher>();

        public TeacherCollection(IEnumerable<Teacher> teachers)
        {
            if (teachers != null)
                _teachers = teachers.Where(x => x.IsStillAtWork).ToList();
        }

        public TeacherCollection WhereTeachMathematics()
        {
            return new TeacherCollection(_teachers.Where(x => x.Specialty == "Mathematics"));
        }

        public TeacherCollection WhereExperienced()
        {
            return new TeacherCollection(_teachers.Where(x => DateTime.Now >= x.StartedOn.AddYears(10)));
        }

        public int Count
        {
            get { return _teachers.Count; }
        }

        public IEnumerable<Teacher> AsEnumerable
        {
            get { return _teachers.AsEnumerable(); }
        }
    }

    public class School
    {
     
        public string Name { get; set; }
        public IList<Teacher> Teachers { get; set; }

        public TeacherCollection TeacherCollection
        {
            get
            {
                return new TeacherCollection(Teachers);
            }
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var school = new School
            {
                Name = "Gotham City"
            };
            school.Teachers.Add(new Teacher
            {
                Name = "Batman",
                Specialty = "Mathematics",
                StartedOn = DateTime.Now.AddYears(-11),
                IsStillAtWork = true
            });
            school.Teachers.Add(new Teacher
            {
                Name = "Joker",
                Specialty = "Chemical",
                StartedOn = DateTime.Now.AddYears(-6),
                IsStillAtWork = false
            });
            Console.WriteLine("Total teachers: {0}", school.TeacherCollection.Count);
            Console.WriteLine("Mathematics teachers are: {0}", 
                string.Join("; ", school.TeacherCollection
                                        .WhereTeachMathematics()
                                        .AsEnumerable
                                        .Select(x => x.Name)));
            Console.WriteLine("> 10 years teachers are: {0}",
                string.Join("; ", school.TeacherCollection
                                        .WhereExperienced()
                                        .AsEnumerable
                                        .Select(x => x.Name)));
        }
    }

By introducing the TeacherCollection object, we can handle the query side of the object. As the requirement arises, the number of “Where” clauses also increase. That is a challenge and we must keep an eye on the design and modify when necessary. Regardless of the problems might arise with the new design, we gain these benefits

  1. The collection object is completed. We do not have to check for the state of the object. The inner collection is always not null.
  2. This is a perfect example of the “Map-Reduce” pattern. Where the “Where” is the reduce. By introducing a number of proper “Where”, we can capture the logic and chain the condition to the final collection that we want. It is also easy to create a Map function which allows us to transform to the collection object.
  3. Immutable. Each “Where” will result in a new collection object. Because the collection object is designed to support the query side of the operation, it is crucial to keep in mind that you must not add methods that will change the internal state of the collection object.
  4. Improve readability. The better naming the better readability!

Hmm, where is the command side? How to modify the teachers? The command operations should belong to the main domain object. That is the School class. I still keep the Teachers property for that purpose. I know it is not a good design when exposing the Teachers property for that purpose. But it is there to demonstrate the point. In production code, I would design a better solution to deal with command operations. It is out of the scope of this post.

 

Wow, that is cool! We should go in and fix all the places in our current codebase. No! No! and No! Blindly apply anything will cause more damages than benefits. Linq is a powerful tool and we use it everywhere in the code. Our main problem is that we have not asked the right question. “Is it the right place to use Linq? Should we wrap the logic somewhere?” those are wonderful questions to ask whenever we code.

Ok, then what do I take from here? what should I do? I have a few suggestions

  1. Take a moment and understand the leaky abstraction. Try to map it with a real-life example. The textbook is hard to understand.
  2. Look at your current codebase. Find a place where you think there is a possibility of the leaky abstraction. Hints: Find the Linq usage, or properties of type List, Collection or Enumerable.
  3. To find out if your classes have Leaky Abstraction issue, try to see from the Consumer point of view.
  4. Evaluate the pros and cons. Simply consider is It worth the effort or not?
  5. Make one small change at a time.

It is a journey. Good things take time. Using the same approach, we can find more leaky abstraction in other areas of our code.

Thank you for your time. I hope you can take some and apply to your daily job.

Write a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: