Is Repository and Unit of Work (UoW) an anti-pattern? In the .NET/C# world, it is often said that Repository/UoW is an anti-pattern and one should probably use Entity Framework Core directly.
But if it is an anti-pattern, why are people still using it? Even Mosh Hamedani – a respected YouTube coding trainer whom I follow – wrote about common mistakes when applying this pattern. Surely it must be popular.
Purpose of the Repository/UoW Pattern
Let’s talk about why this design pattern exists. It is actually a type of Adapter pattern and the primary use case is for separation of concerns between domain (or “business logic”) and infrastructure (or “data access”).
The other motivation for using this pattern is to allow mocking repositories for unit testing.
Must Repository go together with Unit of Work?
It is also very common to see the Repository pattern used together with the Unit of Work (UoW) pattern. The UoW pattern is added to manage transactions – a feature of relational databases to perform ACID operations.
However, there is increasing tolerance for eventual consistency and less atomicity; fully ACID operations are starting to matter less these days as software adopts the microservice architecture.
So unless you are a bank, or dealing with critical financial data, maybe building your software with transactions management – a feature that is specific to RDBMSes – might not be the kind of dependency you want.
Why is UoW an anti-pattern sometimes
I’ll start by saying that everything depends on your use case, but for me – I tend to feel that Unit of Work (but not repository) is an anti-pattern.
Let me explain why.
You see, the repository interface can be described in its simplest form as an interface for read/write operations.
// What a typical repository looks like
public interface IMyRepository
{
MyObject GetById(int id); // Read
IEnumerable<MyObject> SearchByName(string name); // Read
void Update(MyObject object); // Write
void DeleteById(int id); // Write
}
I’ll use an analogy here: In good old C, the standard input/output header defines basic read/write operations. The interface <stdio.h
> is an abstraction of the underlying data access implementation and is a close example of the repository pattern – the interface doesn’t care if you are writing to the console, to a filesystem, or to a serial port – just like a repository interface shouldn’t care if it was an RDBMS, or a CSV file, or even a remote API call.
Does <stdio.h>
expose specific features of a particular filesystem or a serial port? No. So why should a specific feature of an RDBMS (transaction management) be depended upon outside of a repository interface?
userRepository.Update(user);
unitOfWork.Save(); // If this was not done, the record will not be updated!
Using the C example from earlier: Imagine having to always call fsync()
(from <unistd.h>
) after calling fwrite()
in order to commit changes – does it make sense? Sure – you may call fsync()
if you wanted to force your changes to disk immediately, but you do not and should not have to explicitly call it.
(In .NET there is Transaction Scope, but that is a topic for another day.)
How much to depend on the RDBMS?
RDBMSes are great tools with important features, but as an application architect, I often ask myself if it really matters to the application I am building.
Sure, almost 99% of the time the RDBMS is unlikely to be switched for anything else. Heck, in some applications it may even be the same version of MySQL for the rest of the application’s life – that’s probably because many applications were designed with a database-first approach (and I have another blog article coming up about why a database-first approach should probably be avoided.)
So you may think, YAGNI – let’s not over-design the application. Maybe in such a case, you may be better off without the Repository/UoW pattern entirely, but… this is 2021, and you are unit testing your code, rightttt?
Unit tests and mocks
If you have ever attempted to mock an ORM framework, you’ll know it is pretty impossible. Sure, EF Core has an In-Memory Provider that can be used for testing, but that has a lot of caveats.
As a result, it would be easier to apply the repository pattern instead of attempting to mock the ORM framework.
Mocking the test, or testing the mock?
Ever had unit tests pass but integration tests fail because of database constraints? Ever tried testing code that relied on a transaction rollback? Mocking the behavior of an RDBMS is extremely difficult, and spending half our lives trying to mock how an RDBMS works shouldn’t be the case because we do not want to end up testing our mocks instead.
// Typical use of UoW
try
{
userRepository.Update(user);
unitOfWork.Save(); // This commits the transaction
}
catch (Exception ex)
{
unitOfWork.Rollback();
}
Example: How would you mock and test the rollback?
Instead, we should implement repositories as if we were writing data to regular storage – think of it as writing to a CSV file, memory, Redis cache, or something else.
Do CSV files have transactions? No.
Then, maybe we should not use Unit of Work:
try
{
userRepository.Update(user);
}
catch (Exception ex)
{
// Log an error and fix it manually, retry the operation, place
// the update in a queue to be processed later if you really HAVE
// to make this update, otherwise just throw the exception!
}
The generics phenomenon
The typical Repository/UoW pattern is to make each repository represent a single domain entity, e.g.
public interface IUserRepository
{
public void GetById(int id);
public void Create(User user);
public void Update(User user);
public void Delete(int id);
}
public interface IGroupRepository
{
public void GetById(int id);
public void Create(Group group);
public void Update(Group group);
public void Delete(int id);
}
And as a result, it is common to have these further reduced to a generic interface to reduce the repetition on CRUD methods, e.g.
public interface IRepository<TEntity>
{
public void GetById(int id);
public void Create(TEntity entity);
public void Update(TEntity entity);
public void Delete(int id);
}
public interface IUserRepository : IRepository<User> { ... }
public interface IGroupRepository : IRepository<Group> { ... }
However, quite a large number of applications I have written do not use the full CRUD operations on every single repository. Some tables are read-only, some never require an update, some never get deleted.
The N-N relationship
Next, how do I add a user to a group, or assign a group to a user?
Create yet another repository for the intermediary N-N relationship table and insert a record!
public interface IUserGroupRepository : IRepository<UserGroup> { ... }
// Usage
var userGroup = new UserGroup(userId, groupId);
userGroupRepository.Create(userGroup);
unitOfWork.Save(); // Always remember to save!
This is why implementations of the Repository/UoW often end up with a crazy list of interfaces and classes, and it’s probably not easy for a developer to figure out which repository to use. Is it called UserGroup or GroupUser? @#$%^&
It is also extremely counter-intuitive to be actually creating a new object. In a regular object-oriented code, it would probably be written like this:
group.AddUser(user);
Why aren’t repositories written more expressively?
Unlike <stdio.h>
in C that I used as an analogy earlier, repository interfaces are not doing byte-level I/O operations – it handles more complex data types, so methods should therefore be written more expressively.
For example, why not write the User-Group relationship methods in such a manner?
public interface IUserRepository
{
public void AddGroup(int userId, int groupId); // Inserts into UserGroup
...
}
public interface IGroupRepository
{
public void AddUser(int groupId, int userId); // Inserts into UserGroup
...
}
Even better – if your application will never ever see the need to store Users and Groups in different data stores, why not simply combine them into one repository interface?
public interface IAccountServiceRepository
{
public void CreateUser(User user);
public void CreateGroup(Group group);
public void CreateUserWithNewGroup(User user, Group group); // Can use a transaction
public void AddUserToGroup(int userId, int groupId); // Inserts into UserGroup
...
}
(Imagine you were writing to the UNIX /etc/passwd
and /etc/group
– how would you implement it?)
One may argue that I’ve come full circle and am replicating a UoW but at the same time also violating the single-responsibility principle. Then again, what is the “single responsibility” of a repository? Often the term “single responsibility” is being taken out of context from what the originator (Robert C. Martin) had expressed it to be: “A class should have only one reason to change”. What external or structural influences may cause the interface above to change?
Lastly, if CreateUserWithNewGroup()
required a transaction for an atomic operation, shouldn’t the transaction be managed at the repository rather than by the domain? Should the onus of transaction management be placed on the domain layer or the repository layer? Is handling transactions in the domain logic also violating the single-responsibility principle?
Conclusion
Once again, if your software or company really, really, really depended its life on having consistent data in an RDBMS, by all means, continue to use the Repository/UoW pattern (or simply use the ORM directly since it is a hard dependency.)
But for a large majority of cases, YAGNI. A single repository interface alone is probably good enough.
This blog article was more of me thinking out aloud than trying to encourage or influence a change in how people implement repository/UoW, and comments are most certainly welcome.