I had a customer who was rolling out a new Anti-Virus product across tens of thousands of machines via the classic deployment tool ZenWorks 7. However the deployment success was quite low, around 80% success rate.
Initial diagnostics by support teams suggested the cause of the problem was a corrupt “NALCACHE” The NALCACHE is a folder ZenWorks uses to cache all downloaded packages. Across the internets you can find all kinds of crazy recommendations on how to avoid NALCACHE problems such as deleting the folder and creating a blank file called NALCACHE. (Please don’t do this the consequences are not worth it!)
They believed they had proven the NALCACHE was corrupt by the following test:
1) Anti-Virus fails to deploy to machine
2) Delete NALCACHE and relevant registry keys
3) Restart computer
4) Anti-Virus installs fine.
However herein we see a common failing in troubleshooting processes. Deleting NALCACHE and restarting computer does not prove the NALCACHE is corrupt necessarily. As often in my job I’m providing “IT support for IT support” I see this approach all the time where too many assumptions are made about a certain “hack fix” instead of trying to understand what the fix is affecting.
Through testing we found any package did not deploy to affected machines, and in fact they were missing some updates to Flash/Adobe Reader that should have been already deployed. A quick look in Process Explorer and the true culprit was easily found. In the Process Tree view we saw Novell Workstation Manager Service (C:\Program Files\Novell\ZENworks\wm.exe) launched a process called Helper DLL Processor (C:\Program Files\Novell\ZENworks\WMRUNDLL.EXE) which in turn launched Windows Script Host (wscript.exe) with a .bat file as the argument:
This WScript was not doing anything, go anywhere…simply because if you try to launch a batch file with wscript you get the error message There is no script engine for file extension “.bat”
Because the service that launched this process is not marked as Interactive you do not see the message box, and ZenWorks stops processing all further actions until the invisible OK button gets clicked. (If in Vista or later not only would service need to be interactive, you would also need the Interactive Services Detection (C:\Windows\system32\UI0Detect.exe) service started.
As soon as we killed this wscript.exe process packages galore came gushing down like a waterfall.
Without being able to see WScript window we can still use Process Explorer to see the process is waiting on a message box, by selecting the process properties and selected the relevant thread and clicking ‘Stack’
To find the faulty package we simply looked at the NALCACHE folder to see what packages came down most recently (Censored version)
Sure enough several affected machines had this package. Even better is this poor logic had been copied into some other NAL objects, which had to be removed.
I close with one of my original songs I dedicate to whoever did this (and didn’t even test their own work properly):