Tenant Isolation Strategies for Multi-Tenant Systems
Many businesses and their application work with multiple customers. The data stored must remain “isolated” in one form or another so they don’t leak or be seen by other customers. Such systems are called “Multi-Tenant” as each customer is considered to be a tenant of that system.
There are 3 different approaches to multi-tenancy with each its pros and cons.
-
Database per tenant approach. In this approach, a new database is created for each Tenant while maintaining the same schema definition across all.
-
Schema per tenant approach. In this approach, only one database exists however a different schema is created for each client while again maintaining the same schema definition.
-
Column Discriminator. In this approach, all the data sit within the same database and schema, with the differentiator being a column in each row that notates the Tenant/Customer.
Database per Tenant
There is no right or wrong answer and it all depends on your requirements and your risk appetite.
Both of those cases can be mixed in a way. On the one hand, the requirements can come from your customer or even defined by your domain that due to X or Y reasons, they prefer their data to be stored separately. This also allows and creates a pathway that your product can work with databases self-hosted by the customer, being able to take copies of the database or even apply individual changes to only one customer. If such requirements are set, then I would advise going with the “Database per tenant” approach.
Furthermore, in different domains such as healthcare, it might be a legal requirement or part of an ISO certification that your data are separated to the maximum possible approach. Usually, these days encryption of data is a sufficient way of “data separation” however as I mentioned the other parameter is your “Risk Appetite”. Complete isolation of the data by separating them across different databases can provide the most simple, secure approach to making sure that data do not leak from one tenant to the other - however this is still related to your application tenant-aware approach which we will discuss later on.
Schema per Tenant
This approach is fairly similar to the Database per Tenant approach and mainly all the above criteria and requirements must be discussed before deciding.
The main difference is that of infrastructure. Usually, a database separation will have a higher cost and more points of failure. You can imagine that maintaining 100s of databases can become a more complex task rather than 1 database with multiple schemas.
Of course, in the current age of cloud providers such as AWS, Azure, and Google the only issue would be the pricing and not so much maintenance.
Some people mention database limitations however at least for the big DBs such as Postgres, there is no hard limit to how this can scale and the limitation will always come from the hardware itself.
I would suggest this approach if you desire to go for a Database per Tenant but your budget and resources are limited.
Column Discriminator
This approach is the simplest one in terms of infrastructure and puts the burden solely on the application itself.
Every table that you wish to have as tenant aware must contain a column that defines the tenant, usually by specifying a unique Identifier to that Tenant.
Your database of choice can impact this approach and you must look for a feature called “Row Level Security” which greatly improves the implementation. Of course, the Column Discriminator can be implemented just by adding query filters in your application, however, this can open the floor to failing to deliver the isolation due to simple coding bugs e.g. someone forgot to add the filter in that specific query.
By leveraging RLS on a database level there is higher confidence that even with some potential bugs the data will remain isolated.
Identifying the Tenant in your application
After you have selected the isolation approach of your liking there is one more thing to consider: “How does my application know which Tenant I need to work with?“. This question is the same for all the approaches.
There are a few approaches in this matter, however, I would say they do not share many differences and the approach of your choice will be based on the resources you have and also how professional you wish your application to be.
- Subdomain definition. Let’s assume you are acme.com and have multiple clients. By using this approach you can give each client a unique URL that is based on subdomains e.g customer1.acme.com
This approach from my experience has one limitation which is that the customer identifier must be unique - although this should not be a problem.
It is the most professional/enterprise approach you can employ and even though your API might not employ it, it is definitely an upselling point for many services out there.
- Url path definition. Let’s continue our assumption that you are acme.com and have multiple clients. By using this approach you add the client’s unique identifier within the request URL e.g https://acme.com/customerIdentifier/
This approach doesn’t differ too much from the previous one except that it has zero configuration in regards to your DNS and Proxy settings.
- Cookies/Headers/Request/Token. Let’s continue our assumption that you are acme.com and have multiple clients. By using this approach you don’t show at all on the request URL or domain the client’s identifier and you rely on different methods of storing or constantly transmitting that info. I won’t dive into the specifics of each one as they are fairly similar.
I believe this approach has a few more pitfalls as you need to identify and decide which is the top authority for setting the client’s identifier, it gives more room for offensive targeting of your system and might even be more complicated than the “URL Path Definition”.
What will happen is that you might end up selecting more than one approach to use in your complete application. For example, you might decide to have the subdomain as a premium feature and only to be used by the FrontEnd while the API communicates with Headers or Tokens.
The simplest approach in my opinion - especially when you have a software developer-heavy team and lack infra/DevOps - is to use Headers and Token. The front end will never show any information about the tenant and any calls to the API will define the identifier via headers or Tokens.
Personally, I have found that you even end up using a hierarchy of multiple approaches to ensure fallback or feature compatibility.