Monday, February 26, 2007

Two years of blogging

Today is my two year anniversary as a blogger. I have managed a rate of one post per week on average, which is far better than I expected when I started. The topics have changed over time as my projects have moved from MSCRM+Exchange+Outlook+ActiveDirectory via InfoPath and .NET 2/VS2005 stuff to service-oriented solutions using WCF and other NETFX3 technologies.

The most popular article on my blog has been 'Outlook 2003 .NET add-ins with VSTO 2005' for quite a while, with Sue Mosher's OutlookCode.com as the main referrer. My posts about .NET 2 WinForms databinding have also been very popular, especially those about object binding source problems and how to solve them. Google is top-dog referrer to my blog, which is nice as it shows that what I write about are useful for the community.

Of my recent posts, the WCF 'Providing exception shielding details using Generics' article has entered the top five list; along with the 'How to expose a WCF service also as an ASMX web-service' post. I suspect that the growing popularity of Flex for RIAs cause some of the traffic for the latter posting.

That's is all for now, I'll be back soon with more about WCF and stuff from EntLib3 that you just got to love.

Wednesday, February 21, 2007

Effective use of TableAdapters and DataSets

Some of our customers are not ready for using an O/R Mapper as the data access layer, so I have to do the boring job of implementing it manually the old-fashioned way (sigh...). I then use VS2005 TableAdapters for implementing the domain object persistence using only the wizards and thus mostly generated code for the DAL.

TableAdapters are code-generated extension to typed datasets; and even if I am no fan of datasets, as long as the stay in the black-box, I can live with them. In addition to the auto-generated TableAdapters, I must manually create adapters that map my domain objects to/from the typed datasets to complete the repository component. This is where the effort goes, I have never written less SQL statements since using TableAdapters - and as a DBA, I actually like writing and tuning SQL statements.

I have been using TableAdapters since fall 2005, and this post is about effective use of TableAdapters and their DataSets. To illustrate my recommendations, I will use the DataSet shown in this figure (click to enlarge):


The central entity table in the dataset is 'DocumentCard', which is an aggregate root that contains a file-collection table (identifying relationship) and some extra information tables that are needed in the 'DocumentCard' domain object. The document cards are structured into a category treeview, using the 'DocumentCategory' entity. The TableAdapters options are configured to make the DataSets updatable as applicable for each table:


The Dataset objects are connected to each other using Relation objects, and together with the generated CRUD SQL statements, this makes for reading and easily modifying data the typed dataset way. Applying a filter when reading data using a combination of a single query and the specification pattern is not as easy, but I shown how this can be done towards the end.

I recommend that the default query of the table adapter is used only as a 'schema definition' query for the dataset, i.e. the query used for defining the table and for generating the CRUD statements that make the typed dataset updatable so that TableAdapter.Update( DataSet) automagically modifies the database to reflect the changes to the dataset. I give the default query a name like 'FillByNoFilter' (perhaps 'DefineSchemaQuery' would be better), and then add a separate 'Fill' query for use by the code. This way the table schema and the C_UD statements stay constant when the inevitable changes affect the logic of the data access layer.

Note that changes to the default query are immediately propagated by the table adapter mechanism to all other queries in the TableAdapter to ensure that they all adhere to the same schema. E.g. if you remove a field from the schema, all queries are automatically updated to have the same SELECT as the default query. This has a rather nasty side-effect on a specific type of scalar type queries, e.g. the 'LookupCardIdFromKey' method from the above example; it will just replace the select statement returning the scalar value with the default select statement returning a record set. This only happens to scalar queries that do not use a SQL function, and I recommend applying the MAX() function to any such query the return just a field instead of a computed value.

I recommend using "info" as the suffix for read-only datasets that are aggregated into a domain object. In the above example, status info about the physical files is aggregated from the FileHandler file management system; e.g. file upload progress. The info-tables have no generated C_UD SQL statements as they are read-only. I do not recommend aggregating this kind of info-table into the root-table of the entity as "assimilated" fields, as this quickly leads to the TableAdapter not being able to generate the C_UD statements for the dataset. You could of course add the info fields manually to the dataset and set their read-only property to true, but as you add more queries to the table adapter, you will have a hard time making the returned data fit the dataset schema - and not the least, ensuring that the fields are filled with data.

The relations of the data set are useful for more than just defining the structure of the tables making up the data set. The way that a typed dataset automatically populates the relations when filling the related tables and then exposes a relation as a collection, makes it possible to keep the number of database queries to a minimum when filling a complex object graph such as the three level deep annotated structure shown in the above example.

A typical implementation of filling a treeview with structured master-details data for a specific customer, would be fetching the nodes and then for each node fetch the related data. I call this read approach recursive-chunked. If there are N nodes, there would be N+1 database reads; one for the node table, and recursively one read for each node to read the related table in chunks. In my example, there are five tables, and with a large N, the number of database reads would explode using such a fill implementation.

I recommend that each table in a relation structure has a query that takes the aggregate root 'read specification' as parameters and returns non-chunked datasets, leaving it to the typed dataset to fill the relations and automatically provide the structure. This way, if there are M tables, there will be only M database reads to fill the complete structure. If there is e.g. two tables the recursive-chunked will use N+1 reads, while the non-chunked will always use 1+1 reads. Using the relation mechanism of typed datasets of course incurs some overhead, but as the use of TableAdapters implies use of typed datasets, why not utilize them to the maximum?

In my example the non-chunked implementation would always require 5 database reads. The recursive-chunked approach would hardly ever use fewer reads, e.g. with just a single category with two related document cards would result in 1+1*(1+2*(1+1+1)) = 8 reads. If there was four categories each with six cards each, there would be 1+4*(1+6*(1+1+1)) = 77 reads. So even if the chunked reads each contain less data, the overhead more of database roundtrips will soon dwarf the five non-chunked roundtrips.

The last tip about effective use of table adapters is about how to support the specification pattern while keeping the number of TableAdapter queries to a minimum. Even if the queries are created using wizards and generated code, each extra item is still another item to maintain. My way of doing this is based on my previous recommendation to use the default query as a schema definition and SQL statement provider only, making it the stable information source for the table adapter.

I recommend using the default query as the basis for building a dynamic parameterized SQL statement that filters the record set according to the specification, and dynamically injecting it into the TableAdapter to apply the selection. The custom ORM adapter is the specification interpreter and needs to build a DbCommand object containing a parameterized SELECT statement adhering to the schema of the TableAdapter. Thus, the interpreter needs access to the TableAdapter's private command collection to get the schema definition query. In addition, the adapter needs access to the private DataAdapter to be able to inject and execute the dynamic query to fill the DataSet.

As the needed TableAdapter members are private, you need to utilize the partial class mechanism to create extra internal properties that provides access to the private members. Double-click the design canvas of the dataset class to generate a partial class extending the dataset, then add the properties to the table adapter:

using System.Data.SqlClient;

namespace KjellSJ.DataAccess.DocumentCardDataSetTableAdapters
{
public partial class DocumentCardTableAdapter
{
internal SqlDataAdapter DataAdapter
{
get { return this._adapter; }
}

internal string SelectFromStatement()
{
return this.CommandCollection[0].CommandText;
}
}
}

namespace KjellSJ.DataAccess
{
partial class DocumentCardDataSet
{
//empty class generated by double-clicking the dataset design canvas
}
}


Adding specification filters that requires the use of related database tables in the query complicates the logic of the selection interpreter, especially when the filter is Nullable<T>. A nullable filter should only be applied if it is not null, i.e. the .HasValue property is true. Basically you have two options, joining the related table to the table of the defining query, or using sub-selects. As the interpreter should only add the related table to the query when at least one refering filter is defined in the specification, using a sub-select is usually the simplest thing to do.

This is how you can implement a hard-coded interpreter and apply the parameterized selection to the TableAdapter:

namespace KjellSJ.DataAccess
{
public class DocumentCardAdapter
{
public ProjectDocuments ListProjectDocuments( EApprovalSession session)
{
return DoListProjectDocuments(session, null, null);
}

public ProjectDocuments ListProjectDocuments( EApprovalSession session, DocumentCardFilter cardFilter)
{
DocumentCardTableAdapter ta = new DocumentCardTableAdapter();
SqlCommand filterCommand = new SqlCommand();
filterCommand.CommandType = CommandType.Text;

string sql = ta.SelectFromStatement();
sql += " WHERE 1=1";

filterCommand.Parameters.AddWithValue( "@ClientKey", customerKey);
filterCommand.Parameters.AddWithValue( "@ProjectKey", projectKey);

if (cardFilter.IsMapped.HasValue)
{
sql += (cardFilter.IsMapped.Value) ? " AND EXISTS " : " AND NOT EXISTS";
sql +=
" (SELECT * FROM DocumentRequirementMapping . . .)";
}

if(String.IsNullOrEmpty( cardFilter.FreeText)==false)
{
sql += " AND (Code like @FreeText";
sql += " OR Title like @FreeText";
sql += " OR Description like @FreeText";
sql += ")";

filterCommand.Parameters.AddWithValue( "@FreeText", wildcard);
}
. . .

filterCommand.CommandText = sql;

return DoListProjectDocuments(session, filterCommand, cardFilter.ExcludeEmptyCategoryNodes);
}

private ProjectDocuments DoListProjectDocuments( EApprovalSession session, SqlCommand filterCommand, bool? excludeEmptyNodes)
{

DocumentCardTableAdapter taData = new DocumentCardTableAdapter();
. . .
if(filterCommand==null)
{
taData.Fill(ds.DocumentCard, customerKey, projectKey);
}
else
{
filterCommand.Connection = taData.Connection;
taData.DataAdapter.SelectCommand = filterCommand;
taData.DataAdapter.Fill(ds.DocumentCard);
}
. . .
}

Having SQL fragments as text in the code should be avoided, but this is the simplest way to make a table adapter support dynamic criteria. Alternatively, I could have used a stored procedure taking the specifications as parameters and implementing all this logic in the database itself; having a 'FillBySpecification' query using the sproc. Alas, I don't recommend using sprocs for queries, and I hardly ever use sprocs for CRUD operations either. Why ? Well, read the discussions summarized in this post by Frans Bouma and decide for your self.

Following these guidelines should help you get the most out of table adapters while writing almost no SQL statements, and still have a maintainable data access layer. My strongest advice is to follow the 'schema definition' query recommendation, as having a stable foundation for the TableAdapter is a must for ensuring that the dataset CRUD operations stays correct as the adapters evolve.

Wednesday, February 07, 2007

WCF Streaming: Upload files over HTTP

We have a web application that requires a robust and reliable document upload mechanism for very large files, and are currently using Microsoft BITS technology to achieve this. With the introduction of WCF in the solution, I set out to implement an upload service prototype using WCF and a HTTP binding.

You have two main options for uploading files in WCF: streamed or buffered/chunked. The latter option is the one that provides reliable data transfer as you can use wsHttpBinding with WS-ReliableMessaging turned on. Reliable messaging cannot be used with streaming as the WS-RM mechanism requires processing the data as a unity to apply checksums, etc. Processing a large file this way would require a huge buffer and thus a lot of available memory on both client and server; denial-of-service comes to mind. The workaround for this is chunking: splitting the file into e.g. 64KB fragments and using reliable, ordered messaging to transfer the data.

If you do not need the robustness of reliable messaging, streaming lets you transfer large amount of data using small message buffers without the overhead of implementing chuncking. Streaming over HTTP requires you to use basicHttpBinding; thus you will need SSL to encrypt the transferred data. Buffered transfer, on the other hand, can use wsHttpBinding, which by default provides integrity and confidentiality for your messages; thus there is no need for SSL then.

Read more about 'Large Data and Streaming' at MSDN.

So I implemented my upload prototype following the 'How to: Enable Streaming' guidelines, using a VS2005 web application (Cassini) as the service host. I made my operation a OneWay=true void operation with some SOAP headers to provide meta-data:

[OperationContract(Action = "UploadFile", IsOneWay = true)]
void UploadFile(ServiceContracts.FileUploadMessage request);
[MessageContract]
public class FileUploadMessage
{
[MessageHeader(MustUnderstand = true)]
public DataContracts.DnvsDnvxSession DnvxSession
[MessageHeader(MustUnderstand = true)]
public DataContracts.EApprovalContext Context
[MessageHeader(MustUnderstand = true)]
public DataContracts.FileMetaData FileMetaData
[MessageBodyMember(Order = 1)]
public System.IO.Stream FileByteStream
}

The reason for using message headers for meta-data is that WCF requires that the stream object is the only item in the message body for a streamed operation. Headers are the recommended way for sending meta-data when streaming. Note that headers are always sent before the body, thus you can depend on receving the header data before processing the streamed data.

Note that you should provide a separate endpoint (address, binding, contract) for your streamed services. The main reason for this is that configuration such as transferMode = "Streamed" applies to all operations in the endpoint. The same goes for transferMode, maxBufferSize, maxReceivedMessageSize, receiveTimeout, etc.

I use MTOM as the encoding format for the streamed data and set the max file size to 64MB. I also set the buffer size to 64KB even if I read the input stream in 4KB chucks, this to avoid receive buffer underruns. My binding config looks like this:

<basicHttpBinding>
<!-- buffer: 64KB; max size: 64MB -->
<binding name="FileTransferServicesBinding"
closeTimeout="00:01:00" openTimeout="00:01:00" 
receiveTimeout="00:10:00" sendTimeout="00:01:00"
transferMode="Streamed" messageEncoding="Mtom"
maxBufferSize="65536" maxReceivedMessageSize="67108864">
<security mode="None">
<transport clientCredentialType="None"/>
</security>
</binding>
</basicHttpBinding>

I have set the transport security to none as I am testing the prototype without using SSL. I have not modified the service behavior; the default authentication (credentials) and authorization behaviors of the binding is used.

Note that the setting for transferMode does not propagate to clients when using a HTTP binding. You must manually edit the client config file to set transferMode = "Streamed" after using 'Add service reference' or running SVCUTIL.EXE. If you forget to do this, the transfer mode will be "Buffered" and you will get an error like this:

System.ServiceModel.CommunicationException: An error occurred while receiving the HTTP response to http://localhost:1508/Host/FileTransferService.svc. This could be due to the service endpoint binding not using the HTTP protocol. This could also be due to an HTTP request context being aborted by the server (possibly due to the service shutting down). See server logs for more details.



After ensuring that the client basicHttpBinding config was correct, I ran the unit test once more. Now I got this error from VS2005 Cassini:

System.ServiceModel.ProtocolException: The remote server returned an unexpected response: (400) Bad Request.
System.Net.WebException: The remote server returned an error: (400) Bad Request.



I have inspected the HTTP traffic using Fiddler and can see nothing wrong with the MTOM encoded request. Googling lead me to a lot of frustrated "streamed" entries at the forums.microsoft.com "Indigo" group, but no solution to my problem. So I made a self-hosted service using a console application, using the code provided in the WCF samples (see below), and it worked like a charm. I expect that it is just VS2005 Cassini that does not support streamed transfers/MTOM encoding.

I then installed IIS to check whether HTTP streaming works better with IIS. The first error I ran into when uploading my 1KB test file was this:

System.ServiceModel.CommunicationException: The underlying connection was closed: The connection was closed unexpectedly.
System.Net.WebException: The underlying connection was closed: The connection was closed unexpectedly.



Inspecting the traffic with Fiddler, I found the HTTP request to be fine, but no HTTP response. Knowing that this is a sure sign of a server-side exception, I debugged the service operation. The error was caused by the IIS application pool identity (ASPNET worker process) lacking NTFS rights to store the uploaded file on disk. Note that WCF by default will not impersonate the caller, thus the identity used to run your service will need to have sufficient permissions on all resources that your service needs to access. This includes the temp folder, which is used to provide the WSDL file for HTTP GET.

As the UploadFile operation has [OperationContract(IsOneWay = true)], it cannot have a response and neither a fault contract. Thus, there will be no HTTP response when a server-side exception occurs, just the famous "
The underlying connection was closed" message.

I assigned the correct NTFS rights to my upload folder and re-ran the unit test: streamed upload to a WCF service hosted by IIS worked. The next unit test used a 43MB file to check if uploading data larger than the WCF buffer size works. The test gave me this error:



System.ServiceModel.CommunicationException: The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '00:00:59.8590000'.System.IO.IOException: Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host.
System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host



I was a bit puzzled as the file was less than the WCF maxReceivedMessageSize limit set to 64MB, and the stream read buffer was only 4KB as compared to the WCF maxBufferSize being 64KB. Knowing that IIS/ASP.NET has a system enforced limit on the size of a HTTP request to prevent denial-of-service attacks, which seems to be unrelated to WCF as I do not use the AspNetCompatibilityRequirements service setting, I decided to set the maxRequestLength limit to 64MB also (default: 4096KB). This time the "stream large file upload" test passed!

So for WCF streaming to IIS to work properly, you need this <system.web> config in addition to the <system.serviceModel> config in the IIS web.config file:


<!-- maxRequestLength (in KB) max size: 2048MB -->
<httpRuntime 
maxRequestLength="65536"
/>

Make the maximum HTTP request size equal to the WCF maximum request size.

I have to recommend the 'Service Trace Viewer' (SvcTraceViewer.EXE) and the 'Service Configuration Editor' (SvcConfigEditor.EXE) provided in the WCF toolbox (download .NET3 SDK). These tools make it really simple to add tracing and logging to your service for debugging and diagnosing errors. Start the WCF config editor, open your WCF config file, select the 'Diagnostics' node and just click "Enable tracing" in the "Tracing" section to turn on tracing. The figure shows some typical settings (click to enlarge):




Save the config file and run the unit test to invoke the service. This will generate the web_tracelog.svclog file to use as input for the trace viewer. Open the trace file in the trace viewer and look for the errors (red text). Click an error to see the exception details. The figure shows some of the details available for the "Maximum request length exceeded" error (click to enlarge):


The WCF tools installed by the SDK is located here:
C:\Program Files\Microsoft SDKs\Windows\v6.0\Bin\

The code to process the incoming stream is rather simple, storing the uploaded files to a single folder on disk and overwriting any existing files:

public void UploadFile(FileUploadMessage request)
{
FileStream targetStream = null;
Stream sourceStream = request.FileByteStream;
 
string uploadFolder = @"C:\work\temp\";
string filename = request.FileMetaData.Filename;
string filePath = Path.Combine(uploadFolder, filename);
 
using (targetStream = new FileStream(filePath, FileMode.Create, 
                          FileAccess.Write, FileShare.None))
{
//read from the input stream in 4K chunks
//and save to output stream
const int bufferLen = 4096;
byte[] buffer = new byte[bufferLen];
int count = 0;
while ((count = sourceStream.Read(buffer, 0, bufferLen)) > 0)
{
targetStream.Write(buffer, 0, count);
}
targetStream.Close();
sourceStream.Close();
}
}

The 'Stream Sample' available on MSDN contains all the code you need to upload a file as a stream to a self-hosted WCF service and then save it to disk on the server by reading the stream in 4KB chunks. The download contains about 150 solutions with more than 4800 files, so there is a lot of stuff in it. Absolutely worth a look. The streaming sample is located here:
<unzip folder> \TechnologySamples\Basic\Contract\Service\Stream


Please let me know if you have been able to get HTTP streaming with MTOM encoding to work with Microsoft VS2005 Cassini.
Finally, a useful tip for streaming download: How to release streams and files using Disponse() on the the message contract.

Thursday, February 01, 2007

WCF: Importing InfoPath forms to Data Contracts

The solution that we are implementing includes a WCF service that allows a set of document metadata to be imported into our system. The operation is ImportProjectDocuments( ImportProjectDocumentsMessage request) and the message contains the ProjectDocuments data contract specifying the document structure. The data contract of course adhers to my 'separate structure from data' rule. The import operation is intended for application-to-application layer usage by our e-biz partners, for connecting our systems in a service-oriented distributed architecture.

Then came the need for allowing users to manually enter the document structure
in a disconnected manner and submitting it to our service later on; i.e. a human-to-application layer service.

Enter InfoPath 2003 as the occasionally connected client (OCC), a perfect match for our existing data contract (see closing note) and a nice data entry application. Note that InfoPath is not used to submit the entered data directly to the WCF service endpoint, just to fill out a form and submitting it to our server as XML files - by e-mail or to a SharePoint form library when connected. You may think of the form library as the human friendly "message queue".


By the way - having a OCC service consumer is a rather good assessment of whether your services adheres to SOA best practices such as explicit boundaries, message based, share contract, and self-contained business event operations based on the paper form metaphor.


The submitted InfoPath forms must then be processed, typically by a workflow system such as K2.NET or even WWF, monitoring the form library. Each XML form must taken off the "message queue", deserialized using the DataContractSerializer and passed to
our service, invoking the correct operation for the submitted XML data.

This is all quite trivial, but even if the XML generated by InfoPath adheres to the XSD schema, InfoPath adds some processing instructions that the deserializer chokes on:

System.Runtime.Serialization.SerializationException:
There was an error deserializing the object of type DNVS.DNVX.eApproval.DataContracts.ProjectDocuments.
Processing instructions (other than the XML declaration) and DTDs are not supported.

You must remove the InfoPath processing instructions before deserialization, and the easiest way to do this is by using a standard XmlTextReader instead of the WCF XmlDictionaryReader used in the MSDN documentation:

[Test]
public void DeserializeDataContractFromInfoPathXmlFileTest()
{
string fileName = @".\form1.xml";
ProjectDocuments documents = null;

DataContractSerializer xlizer = new DataContractSerializer(typeof(ProjectDocuments));

FileStream fs = new FileStream(fileName, FileMode.Open);
//NOTE: must get rid of InfoPath processing instructions
//XmlDictionaryReader reader = XmlDictionaryReader.CreateTextReader(fs, new XmlDictionaryReaderQuotas());

XmlTextReader reader = new XmlTextReader(fs);

documents = (ProjectDocuments) xlizer.ReadObject(reader, true);

reader.Close();
fs.Close();

Assert.IsNotNull(documents, "Could not deserialize the project documents XML");
}


The ReadObject() method is what actually converts the XML into a WCF data contract instance, ready for processing by the service layer just as if it had arrived through the WCF endpoint.

Note that the duration data type defined in the XSD schemas generated by WCF will cause an InfoPath 2003 parse error. You need to remove it before you start to design a new InfoPath form based on the data contract XSD schemas.