ADF: Querying JSON documents

In my previous blog post I talked about how to read from an XML Webervice and use xpath to query the XML on the expressions side of things. You can read the XML article here (https://the.agilesql.club/2021/02/adf-xml-objects-and-xpath-in-the-expression-language/). Now, what if we don’t have XML but have JSON? Well well indeed, what if there was a way to query JSON documents using a query, imagine if you will a JSONQuery where you can pass a similar query to an xpath query to retrieve specific values from the JSON document. That would be awesome wouldn’t it? Well it isn’t available here, the is no support fpr JSONQuery in ADF. However, we can create an xml document from a JSON document and then query that, if we are very very careful about how we pass the JSON document to the xml function.

In this article, we will again go back to the UK government website where they have a JSON document that describes all the holidays in Scotland, England and Wales (note my use of comma’s is correct, the English and Welsh holidays come in one section which is important here). This is a sample of the file:


{
   "england-and-wales":{
      "division":"england-and-wales",
      "events":[
         {
            "title":"New Year’s Day",
            "date":"2016-01-01",
            "notes":"",
            "bunting":true
         },
         {
            "title":"Good Friday",
            "date":"2016-03-25",
            "notes":"",
            "bunting":false
         },
...
    "scotland":{
      "division":"scotland",
      "events":[
         {
            "title":"New Year’s Day",
            "date":"2016-01-01",
            "notes":"",
            "bunting":true
         },
         {
            "title":"2nd January",
            "date":"2016-01-04",
            "notes":"Substitute day",
            "bunting":true
         },

The full document is available: https://www.gov.uk/bank-holidays.json (it is a real beut).

Now, the xml function in ADF is a finickety little bugger, you can’t just pass any old JSON document through to it and if we tried with this we would get an error about too many top level documents. Instead we need to do some massaging and there are two ways, the first and bad choice is to make a massively complictaed disgusting blob of an expression. The second choice, that we will choose here, is to use variables to split the string parsing into little sections:

Variables on variables on variables

The approach:

  • Get just the “Scottish” events
  • Create a JSON document the xml function will be happy with
  • Convert the JSON document into the XML document
  • Run an xpath query to get the date of one specific holiday

Why can’t you just pass the whole document to xml and get that to create an xml document? Well, let’s try it and see - if you have a webservice activity that calls the url and pass the output to xml you get this (think @xml(string(activity('Get All Holiday Dates').output.value))):

The function 'xml' parameter is not valid. The provided value cannot be converted to XML: 'Data at the root level is invalid. Line 1, position 1

Now, if you look at the image above you can see it is all about the variables - you could do this with a single string and not have all these variables but this helps to debug what is going wrong. Also, as I mentioned in the last post, if you can use something like a function to a webservice call to do any logic like this - only use ADF if you are a bit crazy and life hands you a bad hand.

First, we use the output from the lookup activity to get the scottish events, but because the regions are within an array, we need to use first to get all the regions and then navigate to just the scottish events. We then convert the valid JSON document into a string because we are using variables and variables can’t be JSON objects, so we need to effectively serialize our objects as strings to pass between variables:

@string(first(activity('Get All Holiday Dates').output.value).scotland.events)

Next, we create a well formed JSON document that the ADF xml function does like:

{ 
 "blah":
 { 
  "event":
                @{variables('scotlandEvents')}
 }
}

No idea why it has to be like this, but it does. If the value of scotlandEvents wasn’t an array then you don’t need the second level of nesting.

We then convert to an XML document using xml but because we had to convert the JSON document into a string so that we could store it in a variable we need to undo that and turn it back into a JSON document first. When we are done we are saving it in a variable so switch it back to a string:

@string(xml(json(variables('jsonWithSingleParent'))))

Finally, we can read the variable, convert it back to an XML document and run our xpath query:

@xpath(xml(variables('xmlDocument')),'string(//*[name()="title"][text()="Good Friday"]/../*[name()="date"][contains(text(), "2021")])')

The output of this is the date of the good friday holiday in 2021 (which happens to be very close to my birthday)

Now, the “england-and-wales” version is slightly more complicated because england-and-wales causes problems all over the shop, so we additionally do a string replace to rid ourselves of those nasty -’s:

@string(first(json(replace(string(
       activity('Get All Holiday Dates').output.value)
   , 'england-and-wales', 'englandandwales'))).englandandwales.events)

Good luck, I genuinely hope no one has to ever go through this pain - the point of this blog is to show that you can do this stuff but it isn’t fun and if you are ever in the situation where you need to parse a JSON document in pure ADF expressions that there is some hope.

The full code is:

https://github.com/GoEddie/XmlParsingInADFExpressions/blob/main/pipeline/Parsing%20JSON%20via%20XML.json