T-SQL Tuesday #74 – Shredding nested XML

TSQLTuesday

It’s T-SQL Tuesday,  the blog party founded by Adam Machanic (blog|@AdamMachanic) and this month hosted by Robert Davis (blog|@SQLSoldier). The topic, Be the Change.

 

Start of the New Year is a great time to Be the Change. You could do that by donating to Mercy Corps (full disclosure: my employer) with their familiar tag line Be the change.

BeTheChange

You could also do it by digging into data changes, how it changes, where it changes and why it changes. One of my more interesting data changes is with an XML shred. It starts off with a download from an exchange rate API, saving the XML file, shredding it into a tabular format and uploading it to a table. The point I find most interesting is the XML shred.

The below XML has data nested in different levels that requires the nodes method to join them together. The nodes method accepts a XML string and returns a rowset. That rowset can then be used with CROSS APPLY to effectively link your way down.

nodes (XQuery) as Table(Column)

The tabular format I need requires data from 3 different levels of this XML gob and I need to wade through 5 “tables” to get there.

So I start with this….

<response>
 <base_currency>USD</base_currency> 
 <meta>
   <effective_params>
     <data_set>oanda</data_set> 
     <date>2016-01-09</date> 
     <decimal_places>5</decimal_places> 
     <fields>
       <field>averages</field> 
     </fields>
     <quote_currencies>
       <currency>AFN</currency> 
       <currency>BRL</currency> 
       <currency>KZT</currency> 
     </quote_currencies>
   </effective_params>
   <request_time>2016-01-11T12:30:04+0000</request_time> 
  </meta>
  <quotes>
   <quote>
     <ask>353.06000</ask> 
     <bid>347.57000</bid> 
     <currency>KZT</currency> 
     <date>2016-01-09T19:00:00+0000</date> 
   </quote>
   <quote>
     <ask>4.02911</ask> 
     <bid>4.02305</bid> 
     <currency>BRL</currency> 
     <date>2016-01-09T19:00:00+0000</date> 
   </quote>
   <quote>
     <ask>68.79000</ask> 
     <bid>68.59000</bid> 
     <currency>AFN</currency> 
     <date>2016-01-09T12:00:00+0000</date> 
   </quote>
  </quotes>
</response>

And want to get here…

baseRate exchangeRate oDate cDate ask bid requestDate
USD KZT 2016-01-09T19:00:00+0000 1/9/2016 7:00:00 PM 353.06 347.57 1/9/2016
USD BRL 2016-01-09T19:00:00+0000 1/9/2016 7:00:00 PM 4.02911 4.02305 1/9/2016
USD AFN 2016-01-09T12:00:00+0000 1/9/2016 12:00:00 PM 68.79 68.59 1/9/2016

Tenge, Real and Afghani. Oh my!

So I have to do this…

SELECT 
 mainT.mainC.value('(base_currency/text())[1]','VARCHAR(3)') AS baseRate
, quoteT.quoteC.value('(currency/text())[1]','VARCHAR(3)') as exchangeRate
, quoteT.quoteC.value('(date/text())[1]','VARCHAR(40)') as oDate
, CONVERT(DATETIME,LEFT(REPLACE(REPLACE(quoteT.quoteC.value('(date/text())[1]','VARCHAR(40)') ,'T',' '),'+','.'),23),120) as cDate
, quoteT.quoteC.value('(ask/text())[1]','FLOAT') AS ask
, quoteT.quoteC.value('(bid/text())[1]','FLOAT') AS bid
, epT.epC.value('(date/text())[1]','VARCHAR(40)') AS requestDate
FROM @xml.nodes('/response') mainT(mainC) 
    /*mainT = main table, mainC = main column*/
CROSS APPLY mainT.mainC.nodes('meta') as metaT(metaC)
CROSS APPLY metaT.metaC.nodes('effective_params') as epT(epC) 
    /*ep = effective_params*/
CROSS APPLY mainT.mainC.nodes('quotes') as quotesT(quotesC)
CROSS APPLY quotesT.quotesC.nodes('quote') as quoteT(quoteC)

In the beginning of the FROM line I am grabbing the “response” line from @xml (a variable that I loaded the XML file into) and returning the main table/column, referred to as mainT(mainC). In the next line I join to that row set via a CROSS APPLY the “meta” table and so on and so for to get through the “effective_parms”, “quotes” and “quote”. Great naming there, watch for the “s”.

You can see back in the SELECT section I used the mainT.mainC.value to pull the base_currency column. Each rowset returned by the nodes function can be used in the SELECT.

There is also this little bit of fun to change a datetime stored as a string with a “T” in it, “2016-01-11T12:30:04+0000” to a more respectable format that will insert nicely.

CONVERT(DATETIME,LEFT(REPLACE(REPLACE(quoteT.quoteC.value('(date/text())[1]','VARCHAR(40)') ,'T',' '),'+','.'),23),120

Happy New Year everyone and I hope everything changes for the best.

Advertisements

5 thoughts on “T-SQL Tuesday #74 – Shredding nested XML

  1. sqlsoldier January 20, 2016 / 1:46 am

    Thanks for participating! It’s not a coincidence that I chose Be The Change for the topic this month. I got saw a tweet from you saying that and decided to use that as my topic.

    Like

  2. Jason Corbett September 2, 2016 / 10:47 pm

    Hi Tim – I have been trying to apply this to my table and have not been successful (I’m fairly new to SQL). Any chance you might take a quick look?

    Thanks

    Like

    • Tim September 3, 2016 / 12:48 am

      Hi Jason,

      Sure. Can you post your XML and desired result set somewhere like SQL Scribble? Stack overflow is another great resource. People will fight over answering your question.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s