Wednesday, February 21, 2007

Effective use of TableAdapters and DataSets

Some of our customers are not ready for using an O/R Mapper as the data access layer, so I have to do the boring job of implementing it manually the old-fashioned way (sigh...). I then use VS2005 TableAdapters for implementing the domain object persistence using only the wizards and thus mostly generated code for the DAL.

TableAdapters are code-generated extension to typed datasets; and even if I am no fan of datasets, as long as the stay in the black-box, I can live with them. In addition to the auto-generated TableAdapters, I must manually create adapters that map my domain objects to/from the typed datasets to complete the repository component. This is where the effort goes, I have never written less SQL statements since using TableAdapters - and as a DBA, I actually like writing and tuning SQL statements.

I have been using TableAdapters since fall 2005, and this post is about effective use of TableAdapters and their DataSets. To illustrate my recommendations, I will use the DataSet shown in this figure (click to enlarge):


The central entity table in the dataset is 'DocumentCard', which is an aggregate root that contains a file-collection table (identifying relationship) and some extra information tables that are needed in the 'DocumentCard' domain object. The document cards are structured into a category treeview, using the 'DocumentCategory' entity. The TableAdapters options are configured to make the DataSets updatable as applicable for each table:


The Dataset objects are connected to each other using Relation objects, and together with the generated CRUD SQL statements, this makes for reading and easily modifying data the typed dataset way. Applying a filter when reading data using a combination of a single query and the specification pattern is not as easy, but I shown how this can be done towards the end.

I recommend that the default query of the table adapter is used only as a 'schema definition' query for the dataset, i.e. the query used for defining the table and for generating the CRUD statements that make the typed dataset updatable so that TableAdapter.Update( DataSet) automagically modifies the database to reflect the changes to the dataset. I give the default query a name like 'FillByNoFilter' (perhaps 'DefineSchemaQuery' would be better), and then add a separate 'Fill' query for use by the code. This way the table schema and the C_UD statements stay constant when the inevitable changes affect the logic of the data access layer.

Note that changes to the default query are immediately propagated by the table adapter mechanism to all other queries in the TableAdapter to ensure that they all adhere to the same schema. E.g. if you remove a field from the schema, all queries are automatically updated to have the same SELECT as the default query. This has a rather nasty side-effect on a specific type of scalar type queries, e.g. the 'LookupCardIdFromKey' method from the above example; it will just replace the select statement returning the scalar value with the default select statement returning a record set. This only happens to scalar queries that do not use a SQL function, and I recommend applying the MAX() function to any such query the return just a field instead of a computed value.

I recommend using "info" as the suffix for read-only datasets that are aggregated into a domain object. In the above example, status info about the physical files is aggregated from the FileHandler file management system; e.g. file upload progress. The info-tables have no generated C_UD SQL statements as they are read-only. I do not recommend aggregating this kind of info-table into the root-table of the entity as "assimilated" fields, as this quickly leads to the TableAdapter not being able to generate the C_UD statements for the dataset. You could of course add the info fields manually to the dataset and set their read-only property to true, but as you add more queries to the table adapter, you will have a hard time making the returned data fit the dataset schema - and not the least, ensuring that the fields are filled with data.

The relations of the data set are useful for more than just defining the structure of the tables making up the data set. The way that a typed dataset automatically populates the relations when filling the related tables and then exposes a relation as a collection, makes it possible to keep the number of database queries to a minimum when filling a complex object graph such as the three level deep annotated structure shown in the above example.

A typical implementation of filling a treeview with structured master-details data for a specific customer, would be fetching the nodes and then for each node fetch the related data. I call this read approach recursive-chunked. If there are N nodes, there would be N+1 database reads; one for the node table, and recursively one read for each node to read the related table in chunks. In my example, there are five tables, and with a large N, the number of database reads would explode using such a fill implementation.

I recommend that each table in a relation structure has a query that takes the aggregate root 'read specification' as parameters and returns non-chunked datasets, leaving it to the typed dataset to fill the relations and automatically provide the structure. This way, if there are M tables, there will be only M database reads to fill the complete structure. If there is e.g. two tables the recursive-chunked will use N+1 reads, while the non-chunked will always use 1+1 reads. Using the relation mechanism of typed datasets of course incurs some overhead, but as the use of TableAdapters implies use of typed datasets, why not utilize them to the maximum?

In my example the non-chunked implementation would always require 5 database reads. The recursive-chunked approach would hardly ever use fewer reads, e.g. with just a single category with two related document cards would result in 1+1*(1+2*(1+1+1)) = 8 reads. If there was four categories each with six cards each, there would be 1+4*(1+6*(1+1+1)) = 77 reads. So even if the chunked reads each contain less data, the overhead more of database roundtrips will soon dwarf the five non-chunked roundtrips.

The last tip about effective use of table adapters is about how to support the specification pattern while keeping the number of TableAdapter queries to a minimum. Even if the queries are created using wizards and generated code, each extra item is still another item to maintain. My way of doing this is based on my previous recommendation to use the default query as a schema definition and SQL statement provider only, making it the stable information source for the table adapter.

I recommend using the default query as the basis for building a dynamic parameterized SQL statement that filters the record set according to the specification, and dynamically injecting it into the TableAdapter to apply the selection. The custom ORM adapter is the specification interpreter and needs to build a DbCommand object containing a parameterized SELECT statement adhering to the schema of the TableAdapter. Thus, the interpreter needs access to the TableAdapter's private command collection to get the schema definition query. In addition, the adapter needs access to the private DataAdapter to be able to inject and execute the dynamic query to fill the DataSet.

As the needed TableAdapter members are private, you need to utilize the partial class mechanism to create extra internal properties that provides access to the private members. Double-click the design canvas of the dataset class to generate a partial class extending the dataset, then add the properties to the table adapter:

using System.Data.SqlClient;

namespace KjellSJ.DataAccess.DocumentCardDataSetTableAdapters
{
public partial class DocumentCardTableAdapter
{
internal SqlDataAdapter DataAdapter
{
get { return this._adapter; }
}

internal string SelectFromStatement()
{
return this.CommandCollection[0].CommandText;
}
}
}

namespace KjellSJ.DataAccess
{
partial class DocumentCardDataSet
{
//empty class generated by double-clicking the dataset design canvas
}
}


Adding specification filters that requires the use of related database tables in the query complicates the logic of the selection interpreter, especially when the filter is Nullable<T>. A nullable filter should only be applied if it is not null, i.e. the .HasValue property is true. Basically you have two options, joining the related table to the table of the defining query, or using sub-selects. As the interpreter should only add the related table to the query when at least one refering filter is defined in the specification, using a sub-select is usually the simplest thing to do.

This is how you can implement a hard-coded interpreter and apply the parameterized selection to the TableAdapter:

namespace KjellSJ.DataAccess
{
public class DocumentCardAdapter
{
public ProjectDocuments ListProjectDocuments( EApprovalSession session)
{
return DoListProjectDocuments(session, null, null);
}

public ProjectDocuments ListProjectDocuments( EApprovalSession session, DocumentCardFilter cardFilter)
{
DocumentCardTableAdapter ta = new DocumentCardTableAdapter();
SqlCommand filterCommand = new SqlCommand();
filterCommand.CommandType = CommandType.Text;

string sql = ta.SelectFromStatement();
sql += " WHERE 1=1";

filterCommand.Parameters.AddWithValue( "@ClientKey", customerKey);
filterCommand.Parameters.AddWithValue( "@ProjectKey", projectKey);

if (cardFilter.IsMapped.HasValue)
{
sql += (cardFilter.IsMapped.Value) ? " AND EXISTS " : " AND NOT EXISTS";
sql +=
" (SELECT * FROM DocumentRequirementMapping . . .)";
}

if(String.IsNullOrEmpty( cardFilter.FreeText)==false)
{
sql += " AND (Code like @FreeText";
sql += " OR Title like @FreeText";
sql += " OR Description like @FreeText";
sql += ")";

filterCommand.Parameters.AddWithValue( "@FreeText", wildcard);
}
. . .

filterCommand.CommandText = sql;

return DoListProjectDocuments(session, filterCommand, cardFilter.ExcludeEmptyCategoryNodes);
}

private ProjectDocuments DoListProjectDocuments( EApprovalSession session, SqlCommand filterCommand, bool? excludeEmptyNodes)
{

DocumentCardTableAdapter taData = new DocumentCardTableAdapter();
. . .
if(filterCommand==null)
{
taData.Fill(ds.DocumentCard, customerKey, projectKey);
}
else
{
filterCommand.Connection = taData.Connection;
taData.DataAdapter.SelectCommand = filterCommand;
taData.DataAdapter.Fill(ds.DocumentCard);
}
. . .
}

Having SQL fragments as text in the code should be avoided, but this is the simplest way to make a table adapter support dynamic criteria. Alternatively, I could have used a stored procedure taking the specifications as parameters and implementing all this logic in the database itself; having a 'FillBySpecification' query using the sproc. Alas, I don't recommend using sprocs for queries, and I hardly ever use sprocs for CRUD operations either. Why ? Well, read the discussions summarized in this post by Frans Bouma and decide for your self.

Following these guidelines should help you get the most out of table adapters while writing almost no SQL statements, and still have a maintainable data access layer. My strongest advice is to follow the 'schema definition' query recommendation, as having a stable foundation for the TableAdapter is a must for ensuring that the dataset CRUD operations stays correct as the adapters evolve.

2 comments:

Anonymous said...

Nice article. Is there any way to have a TableAdapter collection and do the following in a base class as:


foreach (TableAdapter oTa in TableAdapter.Collection)
{
oTa.update(dataset):
}

Kjell-Sverre Jerijærvi said...

Use a genrics collection such as List<TableAdapter> to store them with full type fidelity.

Alternatively, cast them back like this:
foreach(Component tableAdapt in TableAdapterCollection)
{
TableAdapter adapter = tableAdapt as TableAdapter;
adapter.Update(...);
}