The problem with ORM

20 years ago, this was all the rage : Object Relational modelling. Nhibernate. Entity Framework. I don’t use either. You know why. It takes too damn long to write code, to save a few milliseconds of performance and a few megagbytes of space on a hard drive

Ipersistance. In the Google Webservice Proxies, and eventually in the Cartography Salesman dll, you’re are going to see a Ipersistance Abstraction layer. The reason for this is so I don’t have to map fields to columns or byte offsets or however different ways there are to parse data. I use the runtime serializer for everything. It’s slow, bulky, and takes just a few seconds of cut and paste to work. It has me convinced already.

Problem. It was primary as a way to save a POCO in a big stream of bytes, easily interpreted as (I guess) UTF-8 encoded text. It’s not easily interpreted as database columns. And some applications, it becomes a configuration hassle to have access to the local filesystem. And it is more convenient to save things to a database. In fact, if you’re going to data mine, or re-transmit the data to an upstream data mart, it’s practically a requirement.

So IPersistance, gives a simple abstraction layer to read and save a stream to what is supposed to be a persistent data layer. And we can implement a implementation to save a stream to (in our implementation) a MS SQL Server varbinary(max) column. The table just needs a filename, to mimic a file system. In fact, the naming is more flexible.

Bads. It’s slow. Real f’ing slow compared to a File System. You’re saving a ton of useless XML tags, clogging up your database, that could be used for something else. But unless you work with files a lot, you will never really notice the speed differences. Databases are slower than the file system. It’s just incredibly more convenient to have the data in a database. The space requirements, you can refactor at a later date, if you actually become successful. There is no reason why you can’t implement a custom IPeristance, subtype handler to save specific POCO types and save them to their mapped columns, later. This isn’t necessarily the way to go for an enterprise who knows that their application is CERTAIN to get usage, and field-to-column mapping might as well be a labor cost undertaken up front, rather than reloading xml into the database later, into another table. But to get something running ASAP, as proof of concept, I think this is the way to go.

What you’re going to read next, is how to use IPersistance’s MSSQL implementation, as a low volume persistant data store for your application. And it still can be used, as a upstream system for transmitting data to a data mart, when you decide you want to mine the information, in traditional relational table format, using SQL tools.

Tools. It’s already built into MSSQL. the xml column type, has .nodes(), .exists(), .value() methods which to extract data from it. Though, it feels like a Dali painting sometimes trying to get data out of it. We are going to try to simplify it, if you use IPersistance’s MSSQL implementation, to extract the fields you want into another table.

Step 1. Take a look at the XML text saved by IPersistance. The default database used is XmlFs. The table’s name is dbo.[File]. It has a [filename] column to identify the record. The [Blob] column is the varbinary(max) with the XML data, that the SQL server thinks is unidentified binary data. To read it:

you can cast varbinary(max) to varchar(max)
SELECT [filename], cast(blob as varbinary(max))
FROM dbo.[File]
WHERE [filename] like ‘%.extension’;

Unfortunately, SQL Management studio will truncate the string. But that’s ok. You are just trying to see what the format of the xml is, so you can create a xpath, to parse it. The first tag is the type you serialized usually, so if you named your application with a different extension, and always save the same type, the xml format you encounter should be consistent enough to do a query.

Step 2. Figure out the XPath to the data you want

Step 3. Create a regular table, prefereably in another schema. Create a new schema. We’ll create the table later the easy way… SELECT INTO, then script the CREATE TABLE, and INSERT SELECT statements.

Step 4. the varbinary is casted as xml column, so that you can use the xml functions on it. But, this is the part I don’t understand, but it REALLY helps execution time… You need to extract the XML into a temporary table, before extracting the individual nodes.

–temporary table
SELECT [filename], CAST(blob as xml) as [xmlcol]
INTO #tempxml
FROM dbo.[File]
where [Filename] like ‘%weblog.xml’
AND cast(blob as xml).exist(‘/DeferredLogEntryOfDeferredStringArraySavedSolutionHistoryHtmlTags’)=1
AND [Filename] not in (
SELECT [filename] from TravelingSalesmanAnalysis.AlgInput
UNION ALL (
SELECT [filename] from TravelingSalesmanAnalysis.AlgPerf
) UNION ALL (
SELECT [filename] from TravelingSalesmanAnalysis.MarketResearchDestinations
)
)
);

Step 5. For each table you want to create, run this first

SELECT
–start of data from 1 per record
x.[xmlcol].value(‘(/CLRtype/Filename)[1]’,’varchar(400)’) as filename
, x.[xmlcol].value(‘(/CLRtype/node)[1]’,’varchar(256)’) as [col],

xml.xmldata.value(‘(/ClRtype/node2)[1]’, ‘int’) as [col1],
xml.xmldata.value(‘(/ClRtype/node2)[1]’, ‘int’) as [col2],
–end of data from 1 per record

— start of data from the repeated nodes
xml.xmldata.value(‘(relative path starting from repeated node)[1]’, ‘float(53)’) as [Elapsed],
xml.xmldata.value(‘(node/node1)[1]’, ‘varchar(1024)’) as [Algorithm],
xml.xmldata.value(‘(node/node2)[1]’, ‘int’) as Cost
— end of data from the repeated nodes

INTO [Schema you just created].[the table you want to create]
FROM #tempxml as x

— truthfully, I don’t understand cross-apply intuitively yet. Is it a in-line FROM, as part of a JOIN? But you need to do this to get the multiple nodes in one xml column, to repeat as several records.
CROSS APPLY x.[xmlcol].nodes(‘XPATH of repeated nodes you wish to turn into multiple records – you cannot use a variable in this field, don’t ask why’) xml (xmldata)

The strange syntax of SQL server’s .value() xpath, is because it assumes that when you use an xpath, you can always get multiple instances of that path in an xml. it forces the programmer to put in the index. I assume you are familar with XML and XSL-type XPATH queries , and smart enough to know this is possible and plan accordingly.

Step 6. Once the tables are created, in SQL Management Tool, extract CREATE TABLE scripts. Along with the SELECT INTO #tempxml script, paste that into into another file.

Step 7. All of your SELECT INTO statements, turn that into
INSERT INTO … SELECT statements and paste underneath the SELECT INTO #tempxml and CREATE TABLE statements that were there before.

Step 8. add drop table #tempxml

You should be able to figure out the rest. This pulls you data from your varbinary(max) columns, turns it into a xml column in a temporary table. you query that into multiple records per xml field, and insert into your new data mart tables. Then you can do what you want with them, transmit them, data mine with SQL queries, etc. And the same script can be run nightly. You just need to add WHERE clauses on the [LastModifiedDate], and remove the old records in the data mart before inserting. No such thing as overwrite in databases. It’s called a UPDATE statement. Though you have to decide which is more annoying to implement.

Addendum:
OR you can use MERGE, but you’ll best to turning those INSERT INTO SELECTS back into SELECT INTO #temptable

MERGE [AS TARGET]
USING [AS SOURCE]
ON
[WHEN MATCHED
THEN ]
[WHEN NOT MATCHED [BY TARGET]
THEN ]
[WHEN NOT MATCHED BY SOURCE
THEN ];


Correction, 2019-5-2, Entity Framework will allow Code first approach

https://www.entityframeworktutorial.net/code-first/code-first-conventions.aspx

using their DbContext as a base class.

I was wondering how they attacked the POCO modification problem, with respect to existing data (they give you option to drop the database, and restart a-fresh… uhh, no… even in development… who wants to create all those test cases again). But I assume that these strategies allow them to “hedge that a little”. I’m not sure. I haven’t actually tried it. (Table per Type) seems to allow you to use inheritance, to extend columns. Hell, if that is ok, they can re-factor “code first” and wipe out the table whenever they are ready to finalize. Because then I will be done with the test data.

And I was wondering if they have 2 users. One temporary one that gets erased with ddladmin, and one without, to get the tables (I don’t think they do this). I didn’t do this myself, until I had to deploy it. But I don’t have a staff of dozens. Just dogs. Your really can’t expect much help, there. But they give awesome moral support, usually… when they aren’t barking.


References

https://docs.microsoft.com/en-us/sql/t-sql/xml/xml-data-type-methods?view=sql-server-2017

https://stackoverflow.com/questions/10344553/how-can-i-query-a-value-in-sql-server-xml-column

http://www.brokenwire.net/bw/Programming/125/querying-xml-fields-using-t-sql

https://blogs.msdn.microsoft.com/simonince/2009/04/24/flattening-xml-data-in-sql-server/

SELECT or Query nodes in hierarchial or nested XML

Inserting XML into a temp table

https://www.mssqltips.com/sqlservertip/1704/using-merge-in-sql-server-to-insert-update-and-delete-at-the-same-time/

Leave a Reply

Your email address will not be published. Required fields are marked *