Friday, November 6, 2009

XML PATHs Of Glory

In a past blog post, I illustrated how you can use the FOR XML PATH clause to create comma-separated lists of items. In this entry, I’ll go into detail as to how FOR XML PATH can be used for what it was designed for: to shape actual XML output. Finally, I’ll use FOR XML PATH to create some HTML output as well.

The PATH option was introduced in SQL2005 to provide a flexible and easier approach to constructing XML output. I thank my lucky stars that I started with T-SQL at the SQL2005 level, because the SQL2000 method of using the EXPLICIT option looks like a complete nightmare. (If you’re into torture, take a look at Books Online for documentation on how to use the EXPLICIT option. When you're done screaming, then come back and read on).

Let's take a quick look at the output that results with the FOR XML PATH clause. If you pass no specific path name, then it assumes a path of ‘row’:

select ID=ContactID
,FirstName
,LastName
,Phone
from Person.Contact
where ContactID between 90 and 94
for xml path
/*
<row>
<ID>90</ID>
<FirstName>Andreas</FirstName>
<LastName>Berglund</LastName>
<Phone>795-555-0116</Phone>
</row>
<row>
<ID>91</ID>
<FirstName>Robert</FirstName>
<LastName>Bernacchi</LastName>
<Phone>449-555-0176</Phone>
</row>
<row>
<ID>92</ID>
<FirstName>Matthias</FirstName>
<LastName>Berndt</LastName>
<Phone>384-555-0169</Phone>
</row>
<row>
<ID>93</ID>
<FirstName>John</FirstName>
<LastName>Berry</LastName>
<Phone>471-555-0181</Phone>
</row>
<row>
<ID>94</ID>
<FirstName>Steven</FirstName>
<LastName>Brown</LastName>
<Phone>280-555-0124</Phone>
</row>
*/
For the query below, let's supply a specific path name of ‘Contact’. And it’s usually good practice to create XML with a root tag, and we can do that by adding the ROOT directive like so:

select ID=ContactID
,FirstName
,LastName
,Phone
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
<Contact>
<ID>90</ID>
<FirstName>Andreas</FirstName>
<LastName>Berglund</LastName>
<Phone>795-555-0116</Phone>
</Contact>
<Contact>
<ID>91</ID>
<FirstName>Robert</FirstName>
<LastName>Bernacchi</LastName>
<Phone>449-555-0176</Phone>
</Contact>
<Contact>
<ID>92</ID>
<FirstName>Matthias</FirstName>
<LastName>Berndt</LastName>
<Phone>384-555-0169</Phone>
</Contact>
<Contact>
<ID>93</ID>
<FirstName>John</FirstName>
<LastName>Berry</LastName>
<Phone>471-555-0181</Phone>
</Contact>
<Contact>
<ID>94</ID>
<FirstName>Steven</FirstName>
<LastName>Brown</LastName>
<Phone>280-555-0124</Phone>
</Contact>
</Contacts>
*/
You’ll note that the column names were used as the tags for each element in the XML. For example, I renamed the first column to be ID rather than ContactID and therefore the element tag <ID></ID> was created.

You have the ability to shape the XML in whatever ways you wish based on what names you give to your columns. For example, any column that starts with an at-sign (@) will create attributes rather than elements, as illustrated below:

select "@ID"=ContactID
,"@FirstName"=FirstName
,"@LastName"=LastName
,"@Phone"=Phone
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
<Contact ID="90" FirstName="Andreas" LastName="Berglund" Phone="795-555-0116" />
<Contact ID="91" FirstName="Robert" LastName="Bernacchi" Phone="449-555-0176" />
<Contact ID="92" FirstName="Matthias" LastName="Berndt" Phone="384-555-0169" />
<Contact ID="93" FirstName="John" LastName="Berry" Phone="471-555-0181" />
<Contact ID="94" FirstName="Steven" LastName="Brown" Phone="280-555-0124" />
</Contacts>
*/
You can mix attributes and elements together like so:

select "@ID"=ContactID
,FirstName
,LastName
,Phone
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
<Contact ID="90">
<FirstName>Andreas</FirstName>
<LastName>Berglund</LastName>
<Phone>795-555-0116</Phone>
</Contact>
<Contact ID="91">
<FirstName>Robert</FirstName>
<LastName>Bernacchi</LastName>
<Phone>449-555-0176</Phone>
</Contact>
<Contact ID="92">
<FirstName>Matthias</FirstName>
<LastName>Berndt</LastName>
<Phone>384-555-0169</Phone>
</Contact>
<Contact ID="93">
<FirstName>John</FirstName>
<LastName>Berry</LastName>
<Phone>471-555-0181</Phone>
</Contact>
<Contact ID="94">
<FirstName>Steven</FirstName>
<LastName>Brown</LastName>
<Phone>280-555-0124</Phone>
</Contact>
</Contacts>
*/
And you can create nested attributes and elements, as illustrated below:

select "@ID"=ContactID
,"Name/@Title"=Title
,"Name/@Suffix"=Suffix
,"Name/First"=FirstName
,"Name/Last"=LastName
from Person.Contact
where ContactID between 92 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
<Contact ID="92">
<Name Title="Mr.">
<First>Matthias</First>
<Last>Berndt</Last>
</Name>
<Phone>384-555-0169</Phone>
</Contact>
<Contact ID="93">
<Name>
<First>John</First>
<Last>Berry</Last>
</Name>
<Phone>471-555-0181</Phone>
</Contact>
<Contact ID="94">
<Name Title="Mr." Suffix="IV">
<First>Steven</First>
<Last>Brown</Last>
</Name>
<Phone>280-555-0124</Phone>
</Contact>
</Contacts>
*/
In the above query, I introduced a Name element with two attributes (Title and Suffix) and two sub-elements (First and Last). You can also see that some of the contacts had NULL for the Title and Suffix and therefore those attributes were not created for those contacts.

Note that attributes must be introduced first, before the elements. For example, if I tried to do the following, I would get an error:

select "@ID"=ContactID
,"Name/First"=FirstName
,"Name/Last"=LastName
,"Name/@Title"=Title
,"Name/@Suffix"=Suffix
,Phone
from Person.Contact
where ContactID between 92 and 94
for xml path('Contact'),root('Contacts')
/*
Msg 6852, Level 16, State 1, Line 1
Attribute-centric column 'Name/@Title' must not come after a
non-attribute-centric sibling in XML hierarchy in FOR XML PATH.
*/
When you have two adjacent columns with the same name, then their data will be concatenated together in one element, like so:

select Name=Title
,Name=FirstName
,Name=MiddleName
,Name=LastName
,Name=Suffix
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
<Contact><Name>AndreasBerglund</Name></Contact>
<Contact><Name>Mr.RobertM.Bernacchi</Name></Contact>
<Contact><Name>Mr.MatthiasBerndt</Name></Contact>
<Contact><Name>JohnBerry</Name></Contact>
<Contact><Name>Mr.StevenB.BrownIV</Name></Contact>
</Contacts>
*/
Note again that NULL column values are ignored in the concatenation.

If you wanted to construct a nice readable single element consisting of the contact’s full name (Title, FirstName, MiddleName, LastName, and Suffix), you could approach it like this:

select Name=coalesce(Title+' ','')
+FirstName+' '
+coalesce(MiddleName+' ','')
+LastName
+coalesce(' '+Suffix,'')
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
<Contact><Name>Andreas Berglund</Name></Contact>
<Contact><Name>Mr. Robert M. Bernacchi</Name></Contact>
<Contact><Name>Mr. Matthias Berndt</Name></Contact>
<Contact><Name>John Berry</Name></Contact>
<Contact><Name>Mr. Steven B. Brown IV</Name></Contact>
</Contacts>
*/
But look all the logic required to handle possible NULL values in the Title and MiddleName and Suffix columns. Well, good news! You can use the following trick. Incorporate data() into the column name as illustrated below, and it will take care of concatenating it all together with spaces between and eliminating all the NULL values automatically:

select "Name/data()"=Title
,"Name/data()"=FirstName
,"Name/data()"=MiddleName
,"Name/data()"=LastName
,"Name/data()"=Suffix
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
<Contact><Name>Andreas Berglund</Name></Contact>
<Contact><Name>Mr. Robert M. Bernacchi</Name></Contact>
<Contact><Name>Mr. Matthias Berndt</Name></Contact>
<Contact><Name>John Berry</Name></Contact>
<Contact><Name>Mr. Steven B. Brown IV</Name></Contact>
</Contacts>
*/
However, this approach will not work if you were trying to construct a Name attribute as opposed to a Name element:

select "@Name/data()"=Title
,"@Name/data()"=FirstName
,"@Name/data()"=MiddleName
,"@Name/data()"=LastName
,"@Name/data()"=Suffix
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
Msg 6850, Level 16, State 1, Line 1
Column name '@Name/data()' contains an invalid XML identifier as required by FOR XML;
'@'(0x0040) is the first character at fault.
*/
But you can handle that through a sub-query like so:

select "@Name"=(select "data()"=Title
,"data()"=FirstName
,"data()"=MiddleName
,"data()"=LastName
,"data()"=Suffix
from Person.Contact c2
where c2.ContactID=Contact.ContactID
for xml path(''))
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
<Contact Name="Andreas Berglund" />
<Contact Name="Mr. Robert M. Bernacchi" />
<Contact Name="Mr. Matthias Berndt" />
<Contact Name="John Berry" />
<Contact Name="Mr. Steven B. Brown IV" />
</Contacts>
*/
Besides data(), you can also incorporate text() or node() into a column name or give a column a wildcard name (*) and the data will be inserted directly as text. They are all interchangeable, as you can see in the following example:

select "text()"=Title
,"node()"=FirstName
,"*"=MiddleName
,"node()"=LastName
,"text()"=Suffix
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
<Contact>AndreasBerglund</Contact>
<Contact>Mr.RobertM.Bernacchi</Contact>
<Contact>Mr.MatthiasBerndt</Contact>
<Contact>JohnBerry</Contact>
<Contact>Mr.StevenB.BrownIV</Contact>
</Contacts>
*/
Remember, two adjacent columns with names that incorporate data() will be separated by a space, but, as you see above, those named with text() or node() or a wildcard are just concatenated directly with no intervening space.

You only really need to specify text() or node() or wildcard names if you want to insert a text element directly subordinate to the main path element, as we saw in the previous query. If, on the other hand, you are inserting text in a sub-element like so…:

select "Name/text()"=Title
,"Name/node()"=FirstName
,"Name/*"=MiddleName
,"Name/node()"=LastName
,"Name/text()"=Suffix
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
<Contact><Name>AndreasBerglund</Name></Contact>
<Contact><Name>Mr.RobertM.Bernacchi</Name></Contact>
<Contact><Name>Mr.MatthiasBerndt</Name></Contact>
<Contact><Name>JohnBerry</Name></Contact>
<Contact><Name>Mr.StevenB.BrownIV</Name></Contact>
</Contacts>
*/
…then you’ll see that they are really unnecessary, since the following query (which we looked at earlier) does the exact same thing:

select Name=Title
,Name=FirstName
,Name=MiddleName
,Name=LastName
,Name=Suffix
from Person.Contact
where ContactID between 90 and 94
for xml path('Contact'),root('Contacts')
/*
<Contacts>
<Contact><Name>AndreasBerglund</Name></Contact>
<Contact><Name>Mr.RobertM.Bernacchi</Name></Contact>
<Contact><Name>Mr.MatthiasBerndt</Name></Contact>
<Contact><Name>JohnBerry</Name></Contact>
<Contact><Name>Mr.StevenB.BrownIV</Name></Contact>
</Contacts>
*/
You can also incorporate comment() or processing-instruction() into the column names to create those kinds of elements, as illustrated below:

select "@ID"=ContactID
,"comment()"='Modified on '+convert(varchar(30),ModifiedDate,126)
,"comment()"=case when ContactID=92 then 'Here is Contact#92' end
,"processing-instruction(EmailPromo)"=EmailPromotion
,"Name/@First"=FirstName
,"Name/@Last"=LastName
,"Name"='This is inserted directly as text'
,"Name"='...And so is this'
,"*"='This is inserted as text in the main Contact path'
from Person.Contact
where ContactID between 90 and 92
for xml path('Contact'),root('Contacts')
/*
<Contacts>
<Contact ID="90">
<!--Modified on 2001-08-01T00:00:00-->
<?EmailPromo 0?>
<Name First="Andreas" Last="Berglund">
This is inserted directly as text...And so is this
</Name>
This is inserted as text in the main Contact path
</Contact>
<Contact ID="91">
<!--Modified on 2002-09-01T00:00:00-->
<?EmailPromo 1?>
<Name First="Robert" Last="Bernacchi">
This is inserted directly as text...And so is this
</Name>
This is inserted as text in the main Contact path
</Contact>
<Contact ID="92">
<!--Modified on 2002-08-01T00:00:00-->
<!--Here is Contact#92-->
<?EmailPromo 1?>
<Name First="Matthias" Last="Berndt">
This is inserted directly as text...And so is this
</Name>
This is inserted as text in the main Contact path
</Contact>
</Contacts>
*/
You’ll note above that the two adjacent columns named comment() do NOT concatenate together like other adjacent columns with the same name. They are always separate elements. The same is true for processing-instruction() columns.

You can also concatenate whole individual XML documents together, as illustrated below, where we use two scalar subqueries to construct XML data from the Sales.SalesPerson and Sales.SalesReason tables. Since we did not give actual column names to the two subqueries, they are inserted directly as is. (Note that we could have named each of them node() or a wildcard and it would have worked the same. However, it’s important to note that you may NOT use text() or data() in naming true XML datatype columns):

select (select "@ID"=SalesPersonID
,"@Quota"=SalesQuota
from Sales.SalesPerson
for xml path('Person'),root('SalesPeople'),type)
,(select "@ID"=SalesReasonID
,"@Name"=Name
,"@Type"=ReasonType
from Sales.SalesReason
for xml path('Reason'),root('SalesReasons'),type)
for xml path('MyData')
/*
<MyData>
<SalesPeople>
<Person ID="268" />
<Person ID="275" Quota="300000.0000" />
<Person ID="276" Quota="250000.0000" />
<Person ID="277" Quota="250000.0000" />
<Person ID="278" Quota="250000.0000" />
<Person ID="279" Quota="300000.0000" />
<Person ID="280" Quota="250000.0000" />
<Person ID="281" Quota="250000.0000" />
<Person ID="282" Quota="250000.0000" />
<Person ID="283" Quota="250000.0000" />
<Person ID="284" />
<Person ID="285" Quota="250000.0000" />
<Person ID="286" Quota="250000.0000" />
<Person ID="287" Quota="300000.0000" />
<Person ID="288" />
<Person ID="289" Quota="250000.0000" />
<Person ID="290" Quota="250000.0000" />
</SalesPeople>
<SalesReasons>
<Reason ID="1" Name="Price" Type="Other" />
<Reason ID="2" Name="On Promotion" Type="Promotion" />
<Reason ID="3" Name="Magazine Advertisement" Type="Marketing" />
<Reason ID="4" Name="Television Advertisement" Type="Marketing" />
<Reason ID="5" Name="Manufacturer" Type="Other" />
<Reason ID="6" Name="Review" Type="Other" />
<Reason ID="7" Name="Demo Event" Type="Marketing" />
<Reason ID="8" Name="Sponsorship" Type="Marketing" />
<Reason ID="9" Name="Quality" Type="Other" />
<Reason ID="10" Name="Other" Type="Other" />
</SalesReasons>
</MyData>
*/
Note that the ,TYPE directive was used to make sure that the XML subqueries came through as true XML datatypes. This is very important. If we had left off the ,TYPE directive, they would be processed as strings and then when they were incorporated into the main query, the main FOR XML PATH(‘MyData’) would encode all of the less-than and greater-than signs into this ugly mess:

select (select "@ID"=SalesPersonID
,"@Quota"=SalesQuota
from Sales.SalesPerson
for xml path('Person'),root('SalesPeople'))
,(select "@ID"=SalesReasonID
,"@Name"=Name
,"@Type"=ReasonType
from Sales.SalesReason
for xml path('Reason'),root('SalesReasons'))
for xml path('MyData')
/*
<MyData>
&lt;SalesPeople&gt;
&lt;Person ID="268" /&gt;
&lt;Person ID="275" Quota="300000.0000" /&gt;
&lt;Person ID="276" Quota="250000.0000" /&gt;
&lt;Person ID="277" Quota="250000.0000" /&gt;
&lt;Person ID="278" Quota="250000.0000" /&gt;
&lt;Person ID="279" Quota="300000.0000" /&gt;
&lt;Person ID="280" Quota="250000.0000" /&gt;
&lt;Person ID="281" Quota="250000.0000" /&gt;
&lt;Person ID="282" Quota="250000.0000" /&gt;
&lt;Person ID="283" Quota="250000.0000" /&gt;
&lt;Person ID="284" /&gt;
&lt;Person ID="285" Quota="250000.0000" /&gt;
&lt;Person ID="286" Quota="250000.0000" /&gt;
&lt;Person ID="287" Quota="300000.0000" /&gt;
&lt;Person ID="288" /&gt;
&lt;Person ID="289" Quota="250000.0000" /&gt;
&lt;Person ID="290" Quota="250000.0000" /&gt;
&lt;/SalesPeople&gt;
&lt;SalesReasons&gt;
&lt;Reason ID="1" Name="Price" Type="Other" /&gt;
&lt;Reason ID="2" Name="On Promotion" Type="Promotion" /&gt;
&lt;Reason ID="3" Name="Magazine Advertisement" Type="Marketing" /&gt;
&lt;Reason ID="4" Name="Television Advertisement" Type="Marketing" /&gt;
&lt;Reason ID="5" Name="Manufacturer" Type="Other" /&gt;
&lt;Reason ID="6" Name="Review" Type="Other" /&gt;
&lt;Reason ID="7" Name="Demo Event" Type="Marketing" /&gt;
&lt;Reason ID="8" Name="Sponsorship" Type="Marketing" /&gt;
&lt;Reason ID="9" Name="Quality" Type="Other" /&gt;
&lt;Reason ID="10" Name="Other" Type="Other" /&gt;
&lt;/SalesReasons&gt;
</MyData>
*/
Now that we’ve learned so much about FOR XML PATH, let’s put our knowledge to use. Let’s say that you want to construct a webpage or an e-mail that incorporates a table in HTML format. Using our knowledge of FOR XML PATH, we will construct all the HTML between the <table></table> tags. That can then be incorporated into the correct spot in the webpage or e-mail.

Note that this query below uses most of what we learned in this article. You’ll see the following:
  • We create ALIGN and VALIGN attributes to align the table headers correctly.
  • We put a <br /> tag into the Phone Number header to split it into two lines.
  • We use data() to construct the Full Name of the contact.
  • We create a hyperlink for the E-Mail Address
  • We subtly color the E-Mail Address in a pale yellow color if EmailPromotion is equal to 1.
  • We use the ,TYPE directive in our XML CTEs so that we can concatenate them in subsequent CTEs.

Here’s the query, which creates a single NVARCHAR(MAX) variable called @TableHTML:

declare @TableHTML nvarchar(max);

with HTMLTableHeader(HTMLContent) as
(
select "th/@align"='right'
,"th/@valign"='bottom'
,"th"='ContactID'
,"*"=''
,"th/@valign"='bottom'
,"th"='Full Name'
,"*"=''
,"th"='Phone'
,"th/br"=''
,"th"='Number'
,"*"=''
,"th/@valign"='bottom'
,"th"='Email Address'
for xml path('tr'),type
)
,
HTMLTableDetail(HTMLContent) as
(
select "td/@align"='right'
,"td"=ContactID
,"*"=''
,"td/data()"=Title
,"td/data()"=FirstName
,"td/data()"=MiddleName
,"td/data()"=LastName
,"td/data()"=Suffix
,"*"=''
,"td"=Phone
,"*"=''
,"td/@bgcolor"=case when EmailPromotion=1 then '#FFFF88' end
,"td/a/@href"='mailto:'+EmailAddress
,"td/a"=EmailAddress
from Person.Contact
where ContactID between 90 and 94
for xml path('tr'),type
)
,
HTMLTable(HTMLContent) as
(
select "@border"=1
,(select HTMLContent from HTMLTableHeader)
,(select HTMLContent from HTMLTableDetail)
for xml path('table') /*No TYPE because we want a string */
)
select @TableHTML=(select HTMLContent from HTMLTable);
Remember the rule that if two adjacent columns have the same name, their data will concatenated? I had to prevent that from happening with adjacent columns that I named th and td by inserting a blank column with a wildcard name between them to force them to come out as discrete elements.

And here are the contents of that variable as a result of that query:

/*
<table border="1">
<tr>
<th align="right" valign="bottom">ContactID</th>
<th valign="bottom">Full Name</th>
<th>Phone<br />Number</th>
<th valign="bottom">Email Address</th>
</tr>
<tr>
<td align="right">90</td>
<td>Andreas Berglund</td>
<td>795-555-0116</td>
<td>
<a href="mailto:andreas1@adventure-works.com">andreas1@adventure-works.com</a>
</td>
</tr>
<tr>
<td align="right">91</td>
<td>Mr. Robert M. Bernacchi</td>
<td>449-555-0176</td>
<td bgcolor="#FFFF88">
<a href="mailto:robert4@adventure-works.com">robert4@adventure-works.com</a>
</td>
</tr>
<tr>
<td align="right">92</td>
<td>Mr. Matthias Berndt</td>
<td>384-555-0169</td>
<td bgcolor="#FFFF88">
<a href="mailto:matthias1@adventure-works.com">matthias1@adventure-works.com</a>
</td>
</tr>
<tr>
<td align="right">93</td>
<td>John Berry</td>
<td>471-555-0181</td>
<td>
<a href="mailto:john11@adventure-works.com">john11@adventure-works.com</a>
</td>
</tr>
<tr>
<td align="right">94</td>
<td>Mr. Steven B. Brown IV</td>
<td>280-555-0124</td>
<td>
<a href="mailto:steven1@adventure-works.com">steven1@adventure-works.com</a>
</td>
</tr>
</table>
*/
Our webpage template looks like this, with a placeholder where we want to insert our table:

/*
<html>
<head>
<title>HTML Table constructed via FOR XML PATH</title>
</head>
<body style="font-family:Arial; font-size:small">
<span style="font-size:x-large">
<b>Selected Contacts:</b>
</span>
<!-- Insert Table Here -->
</body>
</html>
*/
And here is the final result, with the table data inserted in the placeholder position, when we look at the web page in Internet Explorer:

HTML Table constructed via FOR XML PATH

I hope this article gave you a tantalizing look at the possibilities of things you can accomplish with the FOR XML PATH clause. In future blog entries, I’ll explore some other aspects of XML.

9 comments:

  1. Very cool post. I never thought to use SQL Servers xml functionality to generate HTML

    ReplyDelete
  2. Wow, this covers a lot of stuff. Too much to absorb in one session.

    What i got here was an understanding of some of the elements, so next time i look at it, i won't be as confused.

    Thanx Brad!

    ReplyDelete
  3. Excellent post, congrats! I learned a lot. Today I'll go to sleep less silly. Thanks Brad

    ReplyDelete
  4. this is the most well explained blog for xml path. did really learned stuffs here. thanks!

    ReplyDelete
  5. Anonymous and Anonymous:

    Thanks for the great feedback! I'm glad you found it useful!

    ReplyDelete
  6. I spent hours searching for something this good. Well done my good sir.

    ReplyDelete
  7. Very good information thanks.

    ReplyDelete
  8. Great ! Thanks for sharing

    ReplyDelete
  9. Thank you for taking the time to share this knowledge with everyone. I plan on using it to put together an audit document.

    ReplyDelete