ObjectSharp Blogs

You are currently viewing

DataSet FAQ

The Place to Go for DataSets


DataSet Serialization: Smaller & Faster

DataSets serialize naturally to XML quite well and you have lots of control over that. Typed DataSets have the XSD with some properties that control that (and of course you can do it programmatically too). But one of the common problems with remoting DataSets is that the default binary serialization is actually just the XML Serialization ASCII. Crude. Some people have even used this fact to extrapolate that the DataSet is internally XML - which isn't true.

This is improved in Whidbey. But until then, what's a Serializer to do?

Lots of people have done some great work to do customized binary serialization of datasets. To mention a few:

  • Microsoft Knowledge Base Article 829740 by Ravinder Vuppula where he demonstrates his DataSetSurrogate class which wraps up a DataSet and converts the internal items into array lists which can then be Binary Serialized by the framework by default. It also contains a ConvertToDataSet so that you can reverse the process of a De-serialized surrogate back into a dataset.
  • Dino Esposito has demonstrates a GhostSerializer in his MSDN Article for a DataTable which does a similar ArrayList conversion thing.
  • Richard Lowe's Fast Binary Serialization
  • Dominic Cooney's Pickle
  • Angelo Scotto's CompactFormatter goes a step further with a serializable technique that doesn't rely on the Binary Formatter so it works on the Compact Framework which is event more Compact than the BinaryFormatter
  • Peter Bromberg builds on top of the CompactFormatter to support compression using Mike Krueger's ICSharpCode SharpZiplib in-memory zip libraries

 

Comments

  • datasetfaq June 22, 2004 10:10 AM

    There are two bugs in DataSetSurrogate from KB829740.
    1) DefaultValue in ConvertToDataColumn() is assigned before DataType. Thats why DefaultValue is always converted to System.String.
    2) DefaultValue in IsSchemaIdentical(DataColumn dc) is compared by operator ==. Should use Object.Equal instead.

  • datasetfaq July 13, 2004 3:24 PM

    I think I went about as far as possible with the whole custom serialization of DataSets thing. I ended up using OpenNetCF.org's CSV method, and built some additional code to add compression. I got a 2-minute load time down to about 12 seconds, but on the PC the same DataSet takes less than 3 seconds - via full-sized XML.

    I've done quite a bit of reading and the main difference that I've seen discussed by MS is that CF doesn't include the Reflection.Emit namespace.

    From this, I'm guessing that the DataSet creates a custom XML serializer on the fly, based on the schema of the DataTable it's saving or loading. IMO, this would explain the HUGE disparity in performance between PC and PDA, even factoring in the processor speeds.

    Does anyone know if this is correct? If so, then read on...

    How hard could it be to engineer a similar custom serializer for a given DataTable? I don't know about others, but I know ahead of time what schema I'll need and would happily do some coding around that for the sake of performance. A plug-in for VS.NET would be a perfect way to take a given DataSet (typed or otherwise) and gen the code for a CF-side serializer.

    Any thoughts?

  • datasetfaq September 21, 2004 1:12 AM

    Hi all,

    I dont know is this the right place to post a question? sorry if i am wrong?

    I want to pass data between application layer and business logic layer. what is the best way of passing data. Passing Objects, using dataset or serialized dataset. which is the best and i would like to know why? is there any way even better than all of these.

  • datasetfaq September 21, 2004 8:54 AM

    If we are speaking about "layers" meaning - not physically distributed "tiers" - then we are speaking about passing stuff around in-process. If this is the case, passing an object around is the best approach - no need to serialize/deserialize.

  • datasetfaq August 1, 2005 1:57 PM

    I can't get the CompactFormatter code in http://www.eggheadcafe.com/articles/20031219.asp to work. I keep getting:
    Object reference not set to an instance of an object.
    CompactFormatter: exception raised: System.NullReferenceException: Object reference not set to an instance of an object.
    at Serialization.Formatters.CompactFormatter.Deserialize(Stream Wire, Object& parent) in f:\compactformatterdemo\compressdataset\compactformatter.cs:line 680
    at Serialization.Formatters.CompactFormatter.Deserialize(Stream Wire) in f:\compactformatterdemo\compressdataset\compactformatter.cs:line 144
    at Serialization.Formatters.CompactFormatter.Deserialize(Stream Wire, ArrayList ObjectTable) in f:\compactformatterdemo\compressdataset\compactformatter.cs:line 138
    at Serialization.Formatters.CompactFormatter.Deserialize(Stream Wire, Object& parent) in f:\compactformatterdemo\compressdataset\compactformatter.cs:line 322

    Has anyone had a similar problem?

  • TrackBack August 19, 2005 5:13 PM


    A few notes on speeding up the response time for retrieving large datasets via webservices.....This focuses on determining the slow areas of the calls and then speeding up the relevant bits where appropriate.This is a simplified overview of what goes on

  • datasetfaq August 31, 2005 3:36 AM

    hi,

    I tried to use comapct formatter , but it is not working for me . i was using binary formatter in my code i replaced it by comapct formatter but it is not working it is unable to deserialize the the file which i am opening.
    it generates an exception.

    Regards

    Naveen

  • datasetfaq October 7, 2005 1:56 PM

    I have made an almost trivial change to the surrogate class that reduces the serialization size and appears to improve performance.

    To summarize, in DataTableSurrogate, create a value hash table, and only place hash indexes into the _results array. For our data we see significant reductions in serialization size (due to redundant values in the data set).

    If anyone is interested in the code changes, let me know.

Anonymous comments are disabled