This article shows a simple way to give applications with multi-tenant architectures the ability to add custom fields while keeping the performance, type safety and reporting capabilities of the relational model and SQL language.
SaaS Takes Over the ASP Model
In the late 90's application service providers started leveraging the availability of abundant bandwidth and the popularity of the web to deliver applications in an ASP model. Essentially the application would be hosted on a shared server and managed by the ASP. While this was a step in the right direction, there were still significant challenges with this approach. These models evolved into what is now knows as Software as a Service or SaaS. SaaS architectures introduced multi-tenancy which allowed pools of high powered resources to be shared in a different way. Rather than customers having their own web server and database, a shared infrastructure served dozens or sometimes thousands of customers.
The Customization Requirement
Businesses generally need some level of customization for all their software. The most common need is to store information that is specific to that business, but may not be something every single customer needs to have. For instance, we are a salesforce.com customer. For all our leads, we track the download date so when know when a lead has downloaded a trial of our self-service reporting product for evaluation purposes. Since most salesforce.com customers do not have downloadable trials, it would not make sense for them add this just for us. If you did that in a multi-tenant environment, you'd quickly end up with a data architecture that is unwieldy from a maintenance perspective. Instead, salesforce let's us add custom fields that are relevant to us. They've done a great job of engineering the system so the custom fields work pretty much like a stock field that the system ships with. If you are working on a SaaS application, you will likely run into the same scenario.
Inevitably, customers needed the ability to extend the applications. Since there was a shared code base and schema, vendors needed to find a way to deliver this functionality without changing the database schema which was shared across all customers. Two main approaches became popular.
Two Popular Methods
| Method | Description | Advantages | Disadvantages |
| Entity Attribute Value(EAV) | This method uses name-value pairs | Single universal schema across all tenants. | Eliminates all advantages of using a relational database. Special Code must be written to access the data. Significant performance and scalability problems with large data sets or objects with a large number of fields. Reporting becomes nearly impossible |
| Custom Joined Table | This method adds a secondary table per tenant and which will be joined. | Relatively high performance Retains the benefits of SQL technology | Eliminates all advantages of using a relational database. Special Code must be written to access the data. Significant performance and scalability problems with large data sets or objects with a large number of fields. Reporting becomes nearly impossible |
There Is A Better Way One way to retain all the benefits of the relational model without adding lots of customer-specific tables is to use a single, generic table to store the data and a second table to map it to customer specific fields. Since databases optimize types differently, we will create a set of columns for each time. Note that under this approach we do have a limit to the number of maximum fields there are. For extrely large systems, you may want to break each type up into a separate table.
Create script for custom fields table.
CREATE TABLE [dbo].[CustomFields](
[TenantID] [int],
[TableName] [varchar](50),
[EntityID] [int],
[Int1] [int] NULL,
[Int2] [int] NULL,
[Int3] [int] NULL,
[String1] [varchar](50) NULL,
[String2] [varchar](50) NULL,
[String3] [varchar](50) NULL,
[DateTime1] [datetime] NULL,
[DateTime2] [datetime] NULL,
[DateTime3] [datetime] NULL
) ON [PRIMARY]
This table will store the actual data. The TenantID field locks the data to that particular account or customer in a multi-tenant system with a shared database. TableName represents the name of the table where the entity is stored. The EntityID is the primary key of whatever object this links to. The rest of the fields store actual data based on the type of the field. So if we just created a Lead in our CRM system and needed to record a download date as a custom field our insert would look something like this.
INSERT INTO CustomFields(
TenantID,
TableName,
EntityID,
DateTime1)
VALUES (1,'Leads',1,'08/18/2010')
To retrieve the data we would do a LEFT JOIN. This lets us retrieve any custom fields.
SELECT * FROM Leads
LEFT JOIN CustomFields ON
Leads.TenantID=CustomFields.TenantID
AND TableName='Leads' AND EntityID=LeadID
Next we need the ability to map or alias the name into a more meaningful label. We would do this in the user interface or reporting tool dynamically. To do so we'd need to store the "metadata" whenever a custom field is created. Here's what our tracking table looks like.
CREATE TABLE [dbo].[CustomFieldNames](
[TenantID] [int] NULL,
[TableName] [varchar](50) NULL,
[FieldIndex] [int] NULL,
[Type] [varchar](50) NULL,
[Label] [varchar](50) NULL
) ON [PRIMARY]
To setup our "Download Date" field, we would use an insert like this the first time the field is setup.
INSERT INTO CustomFieldNames(
TenantID,
TableName,
FieldIndex,
[Type],
Label)
VALUES (1,'Leads',1,'datetime','Download Date')
Essentially this would map to DateTime1 and the application UI or reporting tool would alias the field name dynamically based on which tenant the user belonged to.
A more optimized version of the code
For the sake of simplicity, all the TSQL code listed in this article show the basic usage/syntax. For large performance critical applications, you might want to use a more advanced version of the samples code given in this article.
Under default transaction Isolation level (READ COMMITTED) SQL Server will apply a SHARED lock on a row when it tries to read information. If another process has locked the row exclusively for an update/delete operation, the read attemp will block. This type of blocks can happen frequently in a highly transactional OLTP system. This will affect the performance of the SELECT queries that are running for reporting purpose considerably.
A work around is to ask SQL Server, NOT to apply a lock while reading the information. This can be done by specifying NOLOCK in your query. Here is an example that shows a query presented in this article.
SELECT
*
FROM Leads l
LEFT JOIN CustomFields c ON
l.TenantID = c.TenantID
AND TableName = 'Leads'
AND EntityID = LeadID
Here is an enhanced version of the query that uses NOLOCK to tell SQL Server not to apply SHARED locks on the rows being read.
SELECT
*
FROM Leads l WITH(NOLOCK)
LEFT JOIN CustomFields c WITH(NOLOCK) ON l.TenantID = c.TenantID
AND TableName = 'Leads'
AND EntityID = LeadID
You might find this version of the query considerably faster than the version that does not use NOLOCK if the database is highly transactional. On an idle database, you may not see much difference.
Awesome! Isn't it? So what next? Should we add NOLOCK on every query? Wait a second! It is not a good idea to BLINDLY put NOLOCK on all the queries. Using NOLOCK can result in "dirty-reads".
So what are "dirty-reads"? Well this is a term used to describe a scenario where a SELECT query returns a value that is "dirty" or "incorrect". It can happen in a scenario given in the following example:
- Process A starts a transaction
- Current Balance of Customer A is 100$
- Process A adds a deposit of 100$ and updates customer balance to 200$
- Process B runs a query that checks the customer balance with NOLOCK
- Process B gets 200$
- Process B does a withdrawal of 200$. Customer balance changes to 0$
- Process A executes a transaction ROLLBACK
- SQL Server discards the deposit entry and the customer balance falls back to -100$
Well, this is a serious business transaction and a dirty read is not acceptable. However, when it comes to reporting, most applications - or most part of an application can afford to have a dirty read. When the information is not very sensitive/critical, a dirty read is acceptable.
So, what is the option when the data is sensitive and a dirty read is not acceptable? SQL Server 2005 introduced a new transaction isolation level SNAPSHOT ISOLATION. SNAPSHOT isolation uses row-versioning where your SELECT query will get the COMMITTED value prior to the start of the transaction. It does not apply SHARED locks on the rows being read. So the end result is that you get better performance without fear of a dirty read.
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
SELECT
*
FROM Leads l
LEFT JOIN CustomFields c ON l.TenantID = c.TenantID
AND TableName = 'Leads'
AND EntityID = LeadID
Under SNAPSHOT isolation level, the example we saw earlier may be presented as follows:
- Process A starts a transaction
- Current Balance of Customer A is 100$
- Process A adds a deposit of 100$ and updates customer balance to 200$
- Process B runs a query that checks the customer balance with SNAPSHOT
ISOLATION
- Process B gets 100$
- Process B does a withdrawal of 100$. Note that it cannot withdraw 200$
because the other transaction has not committed yet.
- Process A executes a transaction ROLLBACK
- SQL Server discards the deposit entry and the customer balance falls back
to 0$ - because Process B has withdrawn the 100$.
Conclusion
We now have the entire data architecture capable of storing custom fields for our multi-tenant application. Each customer can have their own set of custom fields. Best of all, we keep all the benefits of the relational model and can plug this directly into a reporting tool like Izenda. The only overhead is a single join and adding an index will remove most of that. For producton ready versions of the examples check out the source.
About the author
|
|
Sanjay Bhatia is a first generation American, inventor, entrepreneur and pioneer in decision science and interactive data visualization. Sanjay is founder of Izenda, the market leader in self-service reporting for Microsoft's ASP.NET platform. The company has been highlighted by Microsoft for its refreshingly simple approach to data analytics, is an ATDC Graduate and has been named one of the most innovative companies in georgia by TAG. TechAmerica nominated Izenda "Cool Technology" of the year. Sanjay has also won TIE entrepreneur of the year award as well as The Apex society's Power 30 under 30.
|
|