If you write code to SQL Server then you might be interested in this: I have written a tSQLt tdd training course which has helped over 500 people learn both tSQLt and how to apply TDD practices to their SQL Server T-SQL development, you can join the course at https://courses.agilesql.club.

Passing status messages and results back from Databricks to ADF

When we use ADF to call Databricks we can pass parameters, nice. When we finish running the Databricks notebook we often want to return something back to ADF so ADF can do something with it. Think that Databricks might create a file with 100 rows in (actually big data 1,000 rows) and we then might want to move that file or write a log entry to say that 1,000 rows have been written.

So, in the Notebook we can exit using dbutils.notebook.exit('plain boring old string') and in ADF we can retrieve that string using @activity('RunNotebookActivityName').output.runOutput, that is runOutput, in this case, will be “plain boring old string”.

Now, sometimes we want to do more exciting things like return a few pieces of information like a file name and a row count and while you could do something yucky like: dbutils.notebook.exit('plain boring old string, some other value, yuck, yuck, and yuck') and then do a string split somewhere else down the line. Now, we can do better than this - there is a quirk of ADF, I won't say feature because if it was intentional then it is the only feature of ADF that isn't completly underwhelming. If we return JSON data in our string, ADF thinks that it is an object that we can query:

dbutils.notebook.exit('{"an_object": {"name": {"value": "exciting"}}}')

In ADF we can retrieve “exciting” using:

@activity('Run Notebook - JSON Response').output.runOutput.an_object.name.value

Now that is exciting, imagine if we had a dataset we wanted to return, we could use:

dbutils.notebook.exit(spark.sql('select id from range(100)').toJSON().collect())

** NOTE there is a 2 MB limit here so don't go over that **

This got me wondering what else we can do, how about returning a list?

dbutils.notebook.exit(['first', 'second', 3])

Wouldn't you believe that works like a charm?

@activity('Run Notebook - String List Response').output.runOutput[2]

I then wondered what would happen with a dict, now at this point I was beyond excited for what might happen:

dbutils.notebook.exit({'a_value': 'yo yo'})

However, ADF didn't like that, not one bit so returning a dict failed - however all is not lost, if you need to return a dict then you can use json.dumps() and we already know we can access a JSON string as an object in the output of a notebook.

Hope it helps, this part .output.runOutput. alone took me a little while to find out so hopefully no one else has to waste their time fiddling with ADF and notebooks.

Subscribe

* indicates required

Please select all the ways you would like to hear from Agile Sql Club:

You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please visit our website.

We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices here.