ADF: Publish suddenly includes everything where it used to be incremental changes since the last publish
I recently encountered an interesting issue with ADF where the publish feature suddenly attempted to republish every single object, claiming they were new, despite having incrementally published changed objects for some time.
We were using the publish feature where you work on a branch until you are happy, then you raise a PR to main, merge to main, and then switch back to ADF and click publish to push the changes to the adf_publish
branch.
TL;DR:
I started writing a TL;DR, and then I realized that the solution is more nuanced and that people shouldn’t use it without understanding the context, so I decided against it. Sorry - if you really wanna cut to the chase, ask your favorite LLM to summarise.
Initially, I went with it and republished all the changes, but we encountered errors during publishing, and something didn’t feel right.
I did some searching and found one stack overflow post where someone described something similar:
There was a single answer (now two, with this added suggestion) with some suggestions, but none of them worked. The issue affected multiple people, so it wasn’t a browser or cache issue or even a user issue but something else.
API Calls
The next step was to determine what happens when you click the “Publish All Changes”. Either there would be an API call, and the ADF service would handle everything, or the client would work it out. If it were the first case, the ADF service would work out the changes, and then we could see what information is passed to the API to generate the changes. That in itself might lead us to the root cause.
Using the Network tab in Chrome dev tools, I saw these requests leave my browser:
- Git Provider - Get Commits on the collaboration branch (main) x 3
- Git Provider - Download publish branch:
/ADF-Factory-Name
as a compressed stream - Git Provider - Download publish branch:
/ADF-Factory-Name/linkedTemplate
as a compressed stream - Git Provider - Download publish branch:
/ADF-Factory-Name/globalParameters
as a compressed stream - Git Provider - List all items under “/” on the collaboration branch, including the name and commit ID, but not the contents
- Git Provider - get the publish branch description
- Git Provider - get the collaboration branch description
- Call Azure Management API and get a summary of the ADF
- Attempt to get ADF/arm-template-parameters-definition.json
So this looks promising. There were no API calls that returned a list of changed files, so logically, the changes must be calculated locally.
The next step was to try and find the code that was doing the comparison. I examined the caller’s network requests, but it was a typical obfuscated JavaScript stack trace with numerous callbacks, providing little help. I then went through all the JavaScript files downloaded as part of the ADF studio. Although the JavaScript was obfuscated, luckily, some names were not changed, and I found a function called getChangesToPublish
. When I set a breakpoint here, it was hit when I clicked on the “Publish all changes”, excellent.
The getChangesToPublish
did two things, first, it called a function called getLatestCommit
that went to the collaboration branch and got a list of all the commits and how many changes in those commits, which was the first network call that we saw “Git Provider - Get Commits on the collaboration branch (main)”, if there were no commits then it would return a message to say there were no changes to commit. The second thing that getChangestoPublish
did was then call getLatestPublishedCommit
, which made the call to the Azure API “8. Call Azure Management API and get a summary of the ADF.”
The call to the Azure Management API returned a list of properties from ADF that you can see in the Azure portal when you click the “JSON View” button on the ADF resource. This is an example of the Azure JSON from my test subscription:
{
"name": "adf99111",
"id": "/subscriptions/xxxxxxxxxxxxxxxxxxxx/resourceGroups/na-adf/providers/Microsoft.DataFactory/factories/adf99111",
"type": "Microsoft.DataFactory/factories",
"properties": {
"provisioningState": "Succeeded",
"createTime": "2025-02-19T13:49:03.726962Z",
"version": "2018-06-01",
"factoryStatistics": {
"totalResourceCount": 0,
"maxAllowedResourceCount": 0,
"factorySizeInGbUnits": 0,
"maxAllowedFactorySizeInGbUnits": 0
},
"repoConfiguration": {
"type": "FactoryVSTSConfiguration",
"accountName": "xxxxxxxxxxxxxxxxxxxx",
"repositoryName": "xxxxxxxxxxxxxxxxxxxx",
"projectName": "Projects",
"collaborationBranch": "main",
"rootFolder": "/",
"lastCommitId": "6647766ee20b7544249e8d96fff442e2feaa5ab7",
"tenantId": "xxxxxxxxxxxxxxxxxxxx",
"disablePublish": false
},
"publicNetworkAccess": "Enabled"
},
"eTag": "\"39033534-0000-1100-0000-xxxxxxxxxxxxxxxxxxxx\"",
"location": "uksouth",
"identity": {
"type": "SystemAssigned",
"principalId": "xxxxxxxxxxxxxxxxxxxx",
"tenantId": "xxxxxxxxxxxxxxxxxxxx"
},
"tags": {},
"apiVersion": "2018-06-01"
}
The interesting part is the repoConfiguration
:
"repoConfiguration": {
"type": "FactoryVSTSConfiguration",
"accountName": "xxxxxxxxxxxxxxxxxxxx",
"repositoryName": "xxxxxxxxxxxxxxxxxxxx",
"projectName": "Projects",
"collaborationBranch": "main",
"rootFolder": "/",
"lastCommitId": "6647766ee20b7544249e8d96fff442e2feaa5ab7",
"tenantId": "xxxxxxxxxxxxxxxxxxxx",
"disablePublish": false
},
In this example, you can see the “lastCommitId,” but when I was looking at our ADF that had the issue of trying to publish every single commit, the “lastCommitId” property was not there. It was entirely missing!
I looked at reasons why it might be empty and I have an idea but nothing I can really point to but what I do know is that if I compare an ADF that tries to republish everything to an ADF that just publishes changes, is this one property that seems highly relevant - without it, the client can’t figure out which commits contain changes that need to be deployed.
In the “getLastPublishedCommit” function, I set a breakpoint, and the return was undefined
so I was pretty sure that it was the cause of my issue. Digging into the code, “getLastPublishedCommit” did two things: first, call the API and get the “lastCommitId,” but if that was undefined, then look at the tags on the ADF resource and see if there was a tag called __LAST_PUBLISHED_COMMIT_ID___
.
So if “lastCommitId” is not available, then there is a check for a tag, so I looked at what the last commit that we published was, added it as a tag, and then clicked the “Publish all changes” button, and the list of changes was the expected, incremental, changes for the pipelines changed in the commits since the tag value. I published the changes and then removed the tag again because if this issue happens in the future, it won’t try to publish all changes but instead publish changes since the value of the tag, which would be really confusing!
So, to summarise, if ADF publish forgets what the last commit it published was and tries to publish everything, check if the portal “Json View” for repoConfiguration
includes a “lastCommitId”. If it doesn’t, then you can likely fix it by setting __LAST_PUBLISHED_COMMIT_ID___
although this is undocumented, and Microsoft may change it at any time, so use it at your own risk.