My solution started with another great open source XSLT template that converts XML Nodesets to strings. However, I could not use this template as is for a number of reasons. First, I wanted to include my HTML snippet in the XSLT template itself as valid XML. So I created a template that served as a “helper function” that allowed me to store the HTML as a variable and call the
nodetostring template to convert it to a string. This way, I could call my helper function anytime I needed to convert HTML to a string while building my JSON response. I had to store the HTML in a variable because there is no way to call a template with the
mode attribute. So I have to save it as a variable and then use
<xsl:apply-templates select="$html" mode="nodetostring"/>. The problem is, when you store XML as a variable, it is not stored as a Nodeset. It is stored as a Result Tree Fragment. So you need to use an XSL Extension function to convert the Result Tree Fragment to a Nodeset. Since I am doing all this in Java, I was able to use the
xalan:nodeset($html) function for this, as shown in the example below (full details omitted for brevity).
<!-- HTML SNIPPETS -->
<!-- JSON OUTPUT -->
<!-- some arbitray xml -->
<!-- more arbitray xml -->
<!-- HTML snippet serialized as json string -->
<!-- HELPER FUNCTIONS -->
The second issue was the
nodetostring XSLT template did not encode the HTML as JSON exactly the way I wanted it. There were four gotchas that I found that I had to correct in the open source template.
- I had to escape quotation marks used around HTML attributes like so
- I had to escape the forward slash in my HTML end tags.
<div>Hello World!</div>. This is an ancient artifact of an old HTML spec that didn’t want html parsers to get confused when putting strings in a
<SCRIPT>tag. For some reason, today’s browsers still like it.
<li>tag as a child of an unordered list that is formatted like so:
<ul/>, it won’t work. It gets added to the DOM after the ul tag. But, if the code looks like this:
<ul />(notice the space after the /), everything works fine. Very strange indeed.
- I had to be sure to encode any quotation marks that might be included in (bad) HTML. This is the only thing that would really break the JSON by accidentally terminating the string early. I used the
escape-quot-stringtemplate from xml2json.xsl (link in first paragraph) to search for ” and convert it to
Hopefully anyone attempting to serialize HTML as JSON will find these lessons-learned helpful. There was one other issue I had to deal with involving a Java bug, but more on that later.