Tag: .Net

  • Why Unit of Work is an anti-pattern but not Repository

    Why Unit of Work is an anti-pattern but not Repository

    Is Repository and Unit of Work (UoW) an anti-pattern? In the .NET/C# world, it is often said that Repository/UoW is an anti-pattern and one should probably use Entity Framework Core directly.

    But if it is an anti-pattern, why are people still using it? Even Mosh Hamedani – a respected YouTube coding trainer whom I follow – wrote about common mistakes when applying this pattern. Surely it must be popular.

    Purpose of the Repository/UoW Pattern

    Let’s talk about why this design pattern exists. It is actually a type of Adapter pattern and the primary use case is for separation of concerns between domain (or “business logic”) and infrastructure (or “data access”).

    The other motivation for using this pattern is to allow mocking repositories for unit testing.

    Must Repository go together with Unit of Work?

    It is also very common to see the Repository pattern used together with the Unit of Work (UoW) pattern. The UoW pattern is added to manage transactions – a feature of relational databases to perform ACID operations.

    However, there is increasing tolerance for eventual consistency and less atomicity; fully ACID operations are starting to matter less these days as software adopts the microservice architecture.

    So unless you are a bank, or dealing with critical financial data, maybe building your software with transactions management – a feature that is specific to RDBMSes – might not be the kind of dependency you want.

    Why is UoW an anti-pattern sometimes

    I’ll start by saying that everything depends on your use case, but for me – I tend to feel that Unit of Work (but not repository) is an anti-pattern.

    Let me explain why.

    You see, the repository interface can be described in its simplest form as an interface for read/write operations.

    // What a typical repository looks like
    public interface IMyRepository
    {
      MyObject GetById(int id); // Read
      IEnumerable<MyObject> SearchByName(string name); // Read
      void Update(MyObject object); // Write
      void DeleteById(int id); // Write
    }

    I’ll use an analogy here: In good old C, the standard input/output header defines basic read/write operations. The interface <stdio.h> is an abstraction of the underlying data access implementation and is a close example of the repository pattern – the interface doesn’t care if you are writing to the console, to a filesystem, or to a serial port – just like a repository interface shouldn’t care if it was an RDBMS, or a CSV file, or even a remote API call.

    Does <stdio.h> expose specific features of a particular filesystem or a serial port? No. So why should a specific feature of an RDBMS (transaction management) be depended upon outside of a repository interface?

    userRepository.Update(user);
    unitOfWork.Save(); // If this was not done, the record will not be updated!

    Using the C example from earlier: Imagine having to always call fsync() (from <unistd.h>) after calling fwrite() in order to commit changes – does it make sense? Sure – you may call fsync() if you wanted to force your changes to disk immediately, but you do not and should not have to explicitly call it.

    (In .NET there is Transaction Scope, but that is a topic for another day.)

    How much to depend on the RDBMS?

    RDBMSes are great tools with important features, but as an application architect, I often ask myself if it really matters to the application I am building.

    Sure, almost 99% of the time the RDBMS is unlikely to be switched for anything else. Heck, in some applications it may even be the same version of MySQL for the rest of the application’s life – that’s probably because many applications were designed with a database-first approach (and I have another blog article coming up about why a database-first approach should probably be avoided.)

    So you may think, YAGNI – let’s not over-design the application. Maybe in such a case, you may be better off without the Repository/UoW pattern entirely, but… this is 2021, and you are unit testing your code, rightttt?

    Unit tests and mocks

    If you have ever attempted to mock an ORM framework, you’ll know it is pretty impossible. Sure, EF Core has an In-Memory Provider that can be used for testing, but that has a lot of caveats.

    As a result, it would be easier to apply the repository pattern instead of attempting to mock the ORM framework.

    Mocking the test, or testing the mock?

    Ever had unit tests pass but integration tests fail because of database constraints? Ever tried testing code that relied on a transaction rollback? Mocking the behavior of an RDBMS is extremely difficult, and spending half our lives trying to mock how an RDBMS works shouldn’t be the case because we do not want to end up testing our mocks instead.

    // Typical use of UoW
    try
    {
      userRepository.Update(user);
      unitOfWork.Save(); // This commits the transaction
    }
    catch (Exception ex)
    {
      unitOfWork.Rollback();
    }

    Example: How would you mock and test the rollback?

    Instead, we should implement repositories as if we were writing data to regular storage – think of it as writing to a CSV file, memory, Redis cache, or something else.

    Do CSV files have transactions? No.

    Then, maybe we should not use Unit of Work:

    try
    {
      userRepository.Update(user);
    }
    catch (Exception ex)
    {
      // Log an error and fix it manually, retry the operation, place
      // the update in a queue to be processed later if you really HAVE 
      // to make this update, otherwise just throw the exception!
    }

    The generics phenomenon

    The typical Repository/UoW pattern is to make each repository represent a single domain entity, e.g.

    public interface IUserRepository
    {
      public void GetById(int id);
      public void Create(User user);
      public void Update(User user);
      public void Delete(int id);
    }
    
    public interface IGroupRepository
    {
      public void GetById(int id);
      public void Create(Group group);
      public void Update(Group group);
      public void Delete(int id);
    } 

    And as a result, it is common to have these further reduced to a generic interface to reduce the repetition on CRUD methods, e.g.

    public interface IRepository<TEntity>
    {
      public void GetById(int id);
      public void Create(TEntity entity);
      public void Update(TEntity entity);
      public void Delete(int id);
    }
    
    public interface IUserRepository : IRepository<User> { ... }
    public interface IGroupRepository : IRepository<Group> { ... }

    However, quite a large number of applications I have written do not use the full CRUD operations on every single repository. Some tables are read-only, some never require an update, some never get deleted.

    The N-N relationship

    Next, how do I add a user to a group, or assign a group to a user?

    Create yet another repository for the intermediary N-N relationship table and insert a record!

    public interface IUserGroupRepository : IRepository<UserGroup> { ... }
    
    // Usage
    var userGroup = new UserGroup(userId, groupId);
    userGroupRepository.Create(userGroup);
    unitOfWork.Save(); // Always remember to save!

    This is why implementations of the Repository/UoW often end up with a crazy list of interfaces and classes, and it’s probably not easy for a developer to figure out which repository to use. Is it called UserGroup or GroupUser? @#$%^&

    It is also extremely counter-intuitive to be actually creating a new object. In a regular object-oriented code, it would probably be written like this:

    group.AddUser(user);

    Why aren’t repositories written more expressively?

    Unlike <stdio.h> in C that I used as an analogy earlier, repository interfaces are not doing byte-level I/O operations – it handles more complex data types, so methods should therefore be written more expressively.

    For example, why not write the User-Group relationship methods in such a manner?

    public interface IUserRepository 
    {
      public void AddGroup(int userId, int groupId); // Inserts into UserGroup
      ...
    }
    
    public interface IGroupRepository
    {
      public void AddUser(int groupId, int userId); // Inserts into UserGroup
      ...
    }

    Even better – if your application will never ever see the need to store Users and Groups in different data stores, why not simply combine them into one repository interface?

    public interface IAccountServiceRepository
    {
      public void CreateUser(User user);
      public void CreateGroup(Group group);
      public void CreateUserWithNewGroup(User user, Group group); // Can use a transaction
      public void AddUserToGroup(int userId, int groupId); // Inserts into UserGroup
      ...
    }

    (Imagine you were writing to the UNIX /etc/passwd and /etc/group – how would you implement it?)

    One may argue that I’ve come full circle and am replicating a UoW but at the same time also violating the single-responsibility principle. Then again, what is the “single responsibility” of a repository? Often the term “single responsibility” is being taken out of context from what the originator (Robert C. Martin) had expressed it to be: “A class should have only one reason to change”. What external or structural influences may cause the interface above to change?

    Lastly, if CreateUserWithNewGroup() required a transaction for an atomic operation, shouldn’t the transaction be managed at the repository rather than by the domain? Should the onus of transaction management be placed on the domain layer or the repository layer? Is handling transactions in the domain logic also violating the single-responsibility principle?

    Conclusion

    Once again, if your software or company really, really, really depended its life on having consistent data in an RDBMS, by all means, continue to use the Repository/UoW pattern (or simply use the ORM directly since it is a hard dependency.)

    But for a large majority of cases, YAGNI. A single repository interface alone is probably good enough.

    This blog article was more of me thinking out aloud than trying to encourage or influence a change in how people implement repository/UoW, and comments are most certainly welcome.

  • Stop returning null

    Stop returning null

    Have you ever debugged a null error? It’s like a void space. The error often doesn’t tell you anything. Null handling sucks the life out of developers.

    Developers should stop returning null. Modern programming languages have exceptions – use them.

    There was a time when we avoided throwing exceptions, because exception handling is thought to be slow, and has other criticisms as well.

    However, returning null instead of throwing an exception is far worse.

    Let me explain.

    If you were tasked to write the implementation of this method to retrieve a user:

    public User GetUserById(int id);

    If the user with value of id is not found, you are probably tempted to return a null value.

    But what is null? Is it an invalid id value (e.g. id <= 0)? Is it because the user is not found? Is it because maybe the user record has been disabled?

    In most cases I have come across, a null return would mean that something erroneous has happened. “User does not exist” is an error!

    All is fine if your team is disciplined and religiously handle null across the entire application, but chances are very slim because it is quite difficult to consistently handle null when built-in types (such as int or double) will never be null.

    Uncaught null exceptions are terrible to debug, not only because the null exceptions don’t tell you much, but also because it often requires back-tracing many lines of code to figure out how and why you got a null.

    Uncaught null values are no different from uncaught exceptions, and if you have been writing code long enough you’ll know that null exceptions are one of the most common exceptions you have to debug.

    If a developer above had thrown something like UserNotFoundException for the method/function above, life would be so much easier – even if it was uncaught. Making it a habit to throw an exception as part of input validation or error handling forces you to think about the error scenario and error message.

    Null is bad for health. Null exceptions are like black holes. Null is less than nothing…

    “What do you mean less than nothing? I don’t think there is any such thing as less than nothing. Nothing is absolutely the limit of nothingness. It’s the lowest you can go. It’s the end of the line. How can something be less than nothing? If there were something that was less than nothing, then nothing would not be nothing, it would be something – even though it’s just a very little bit of something. But if nothing is nothing, then nothing has nothing that is less than it is.”

    E.B. White, Charlotte’s Web