Case of the DllHost.exe Crash

A problem case had been going on for sometime about DllHost.exe crashing, aka COM Surrogate Host across many Citrix Servers. There were about 1200 crashes a week.

We set up a server setup to capture dmp files on application crash. Due to a previous case where 10,000 instances of werfault.exe had been running on a Citrix Server however werfault.exe had been disabled  from launching via this method by setting under

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\Werfault.exe

REG_SZ value Debugger to NUL

image

 

Due to this being disabled we couldn’t use the Windows in-built app dumping  here: http://msdn.microsoft.com/en-us/library/windows/desktop/bb787181(v=vs.85).aspx

To work around this issue we set Debugger under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug to a little PowerShell script with %ld as the first parameter. (Process ID)

image

The main point of this script was to do some additional check on free disk space, as in this case D:\ also handled SCCM cache and we needed to ensure plenty remained available.

The way this is handled is first check if disk has 10GB free minimum, if it doesn’t we don’t do any dmp files.

Next after dmp file is written we check if the folder has 10GB worth of dmp files, if so the oldest ones are deleted until 10GB of files remain.

Warning: Quick & Dirty Ugly scripting follows.

 

Param( [Parameter(Mandatory=$true,Position=0)] [string]$procID) [bool]$ok=$false; # force a single instance at a time # just to limit load, paranoia/etc $m = New-Object System.Threading.Mutex($true, "AutoProcDump.Debugger", [ref]$ok); if (!$ok) { "Another instance is already running." return 0 } #Get D: Drive freespace $driveData = Get-WmiObject -class win32_LogicalDisk -filter "Name = 'D:'" | select "FreeSpace" $driveDataSize = ([int]($driveData.FreeSpace/1GB)) # Check if D: is Less than 10GB if ($driveDataSize -le 10) { return } $Folder = "D:\Tools\Dumps" $psi = New-Object System.Diagnostics.ProcessStartInfo $psi.FileName="D:\Tools\ApplicationCrash\procdump.exe" $psi.UseShellExecute=$false $psi.WorkingDirectory=$Folder $psi.Arguments="-accepteula -ma $($procID)" Write-Host $Proc $psi.Arguments $p=[System.Diagnostics.Process]::Start($psi) $p.WaitForExit(60000) if (!$p.HasExited) { # running too long $p.Kill() } #put our folder we want to check here $folder2 = "D:\Tools\Dumps" #now we need to see how big that folder is $foldersize = (Get-ChildItem $folder2 | Measure-Object -property length -sum ) #and convert it to GB's $GBsize = "{0:N5}" -f ($foldersize.sum/ 1GB) #now, let's check to see if it's over 10 GBs If ($GBsize -gt 10) #if it is, we want to DO the following {do #Let's get the 1st file (sorted by lastwrite time and remove it {dir $folder2\*.dmp | sort lastwritetime | select -first 1 | remove-item -force #now let's recheck the folder size $foldersize = (Get-ChildItem $folder2 | Measure-Object -property length -sum ) $GBsize = "{0:N5}" -f ($foldersize.sum/ 1GB) #print the folder size for testing $Gbsize } #is the folder less than 10gb? Yes, we are done. No, go back and delete another file until ($GBsize -lt 10) Write-Host "Deletes Done" } else {"No deletes Needed"} return

However after all this, no dumps were collected. The reason – the issue was not occurring on the test server, even after a week.

Then almost by accident, when I was looking for some dmp files I had run

dir *.dmp /s

And found a hidden cache of hundreds upon hundreds of mini-dump files in the D:\EdgeSight\EdgeSight folder on the Citrix server. Better than nothing, I’ll take what I can get.

So I got a list of all citrix servers, and stole all the minidumps I could find.

FOR /F %i IN (server_list.txt) DO ( xcopy \\%i\d$\EdgeSight\EdgeSight\FaultReports C:\support\minidumps /s /q )

 

However these dump files had funny random looking names.

To fix this I ran an automated script against all the dmp files, based on one from Volume 1, http://www.patterndiagnostics.com/ultimate-memory-analysis-reference

.symfix C:\symbols .reload vertarget r kv 100 !analyze -v r kv 100 ub eip u eip uf eip dps esp-3000 esp+3000 dpu esp-3000 esp+3000 dpa esp-3000 esp+3000 lmv ~*k q

I then saved the above in a file C:\support\autodbg.txt and ran a single command line to process all dmp files in current folder (Cdb.exe was accessible from this DIR)

If running from a batch file change %I to %%i

FOR /f "delims=/" %i IN ('dir *.dmp /b') DO ( cdb -z "%i" -command "$$><C:\support\autodbg.txt" > "%i.txt" )

Then I used this process to rename all the dmp files, this added a prefix to our dmp files of process name _ bucket ID from the !analyze –v output:

$files=Get-ChildItem -Path c:\support\minidumps -Filter *.txt ForEach ($file in $files) { $sr = New-Object System.IO.StreamReader($file.FullName) $text=$sr.ReadToEnd() $sr.Close() #ignore invalid dmp files if (!$text.Contains("Could not open dump file")) { $proc_start=$text.IndexOf("PROCESS_NAME:")+"PROCESS_NAME:".Length $proc_end=$text.IndexOf("`n",$proc_start) $bucket_start=$text.IndexOf("BUCKET_ID:")+"BUCKET_ID:".Length $bucket_end=$text.IndexOf("`n",$bucket_start) $proc=$text.Substring($proc_start,$proc_end-$proc_start).Trim() $bucket=$text.Substring($bucket_start,$bucket_end-$bucket_start).Trim() $dmpFileName=$file.FullName.Replace(".txt",".dmp") $dstFileName=[String]::Format("{0}_{1}_{2}.dmp",$proc,$bucket,$file.BaseName) Rename-Item $dmpFilename $dstFileName } }

This converted a folder looking like

image

to

image

Unfortunately these are all mini-dumps. But we still have some important information. Looking at !analyze –v output

FAULTING_IP: ole32!CStdMarshal::CreateStub+8c 000007fe`fe0dc170 498b0424 mov rax,qword ptr [r12] EXCEPTION_RECORD: ffffffffffffffff -- (.exr 0xffffffffffffffff) ExceptionAddress: 000007fefe0dc170 (ole32!CStdMarshal::CreateStub+0x000000000000008c) ExceptionCode: c0000005 (Access violation) ExceptionFlags: 00000000 NumberParameters: 2 Parameter[0]: 0000000000000000 Parameter[1]: 00000000027621c8 Attempt to read from address 00000000027621c8 DEFAULT_BUCKET_ID: INVALID_POINTER_READ PROCESS_NAME: dllhost.exe ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s. EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s. EXCEPTION_PARAMETER1: 0000000000000000 EXCEPTION_PARAMETER2: 00000000027621c8 READ_ADDRESS: 00000000027621c8 FOLLOWUP_IP: esint+2190 00000000`6bc22190 ?? ??? DETOURED_IMAGE: 1 MOD_LIST: <ANALYSIS/> LAST_CONTROL_TRANSFER: from 000007fefe0dc063 to 000007fefe0dc170 FAULTING_THREAD: 0000000000001d70 PRIMARY_PROBLEM_CLASS: INVALID_POINTER_READ BUGCHECK_STR: APPLICATION_FAULT_INVALID_POINTER_READ STACK_TEXT: 00000000`03cdeaa0 000007fe`fe0dc063 : 00000000`00000001 00000000`002de9f0 00000000`03cdeb98 00000000`03cdeb60 : ole32!CStdMarshal::CreateStub+0x8c 00000000`03cdeb30 000007fe`fe0dbf32 : 00000000`002fb4a8 00000000`00000000 00000000`002f97b0 00000000`00000000 : ole32!CStdMarshal::ConnectSrvIPIDEntry+0x2f 00000000`03cdeb80 000007fe`fe0e21ef : 00000000`00000000 00000000`002fb4a8 00000000`03cdeca0 00000000`00326780 : ole32!CStdMarshal::MarshalServerIPID+0xb6 00000000`03cdec20 000007fe`fe0e209f : 00000000`00000001 000007fe`fe0e2018 00000000`00000002 00000000`00000001 : ole32!CStdMarshal::MarshalIPID+0x34 00000000`03cdec60 000007fe`ffa0ff85 : 00000000`00000006 00000000`03cdf140 00000000`03cded60 00000000`00000001 : ole32!CRemoteUnknown::RemQueryInterface+0x2f5 00000000`03cded30 000007fe`ffabb68e : 00000000`00000006 00000000`002d8840 000007fe`fe247da8 00000000`002f0090 : rpcrt4!Invoke+0x65 00000000`03cdeda0 000007fe`ffa12496 : 00000000`77859fc0 00000000`0000ffff 00000000`00000000 00000000`77859fd0 : rpcrt4!Ndr64StubWorker+0x61b 00000000`03cdf360 000007fe`fe220883 : 00000000`00000000 00000000`00000000 000007fe`fe253870 00000000`002de320 : rpcrt4!NdrStubCall3+0xb5 00000000`03cdf3c0 000007fe`fe220ccd : 00000000`00000001 00000000`00000000 00000000`02b96850 00000000`00000000 : ole32!CStdStubBuffer_Invoke+0x5b 00000000`03cdf3f0 000007fe`fe220c43 : 00000000`002f0090 00000000`002e07d4 00000000`00000000 000007fe`fe2371e0 : ole32!SyncStubInvoke+0x5d 00000000`03cdf460 000007fe`fe0da4f0 : 00000000`002f0090 00000000`002e6980 00000000`002f0090 000007fe`fe0d1b00 : ole32!StubInvoke+0xdb 00000000`03cdf510 000007fe`fe0ed551 : 00000000`00000000 ab08e781`00000001 00000000`002d6450 00000000`002de320 : ole32!CCtxComChnl::ContextInvoke+0x190 00000000`03cdf6a0 000007fe`fe22347e : 00000000`002e6980 00000000`00000000 00000000`002d8840 00000000`00000000 : ole32!STAInvoke+0x91 00000000`03cdf6f0 000007fe`fe22122b : 00000000`d0908070 00000000`002e6980 00000000`002ee330 00000000`002d8840 : ole32!AppInvoke+0x1aa 00000000`03cdf760 000007fe`fe223542 : 00000000`002f0000 00000000`00000400 00000000`00000000 000007fe`fe0bb3c4 : ole32!ComInvokeWithLockAndIPID+0x52b 00000000`03cdf8f0 000007fe`fe0ed42d : 00000000`002de320 00000000`00000000 00000000`002d7fc8 00000000`002f0000 : ole32!ComInvoke+0xae 00000000`03cdf920 000007fe`fe0ed1d6 : 00000000`002e6980 00000000`002f0008 00000000`00000400 00000000`00000000 : ole32!ThreadDispatch+0x29 00000000`03cdf950 00000000`775c9bd1 : 00000000`00000000 00000000`00000000 00000000`00000000 53d9b361`91bf321e : ole32!ThreadWndProc+0xaa 00000000`03cdf9d0 00000000`775c98da : 00000000`03cdfb30 000007fe`fe0ed12c 000007fe`fe285780 00000000`00806d70 : user32!UserCallWinProcCheckWow+0x1ad 00000000`03cdfa90 000007fe`fe0ed0ab : 00000000`02cd0606 00000000`02cd0606 000007fe`fe0ed12c 00000000`00000000 : user32!DispatchMessageWorker+0x3b5 00000000`03cdfb10 000007fe`fe213e57 : 00000000`002e6980 00000000`00000000 00000000`002e6b60 000007fe`fe0d3032 : ole32!CDllHost::STAWorkerLoop+0x68 00000000`03cdfb70 000007fe`fe0c0106 : 00000000`002e6980 00000000`002d6350 00000000`00000000 00000000`00000000 : ole32!CDllHost::WorkerThread+0xd7 00000000`03cdfbb0 000007fe`fe0c0182 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ole32!CRpcThread::WorkerLoop+0x1e 00000000`03cdfbf0 00000000`7729652d : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ole32!CRpcThreadCache::RpcWorkerThreadEntry+0x1a 00000000`03cdfc20 00000000`7782c541 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd 00000000`03cdfc50 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d SYMBOL_STACK_INDEX: 2 SYMBOL_NAME: esint+2190 FOLLOWUP_NAME: MachineOwner MODULE_NAME: esint IMAGE_NAME: esint.dll DEBUG_FLR_IMAGE_TIMESTAMP: 5385f6ee STACK_COMMAND: .cxr 0000000000000000 ; kb ; ~4s; .ecxr ; kb FAILURE_BUCKET_ID: INVALID_POINTER_READ_c0000005_esint.dll!Unknown BUCKET_ID: X64_APPLICATION_FAULT_INVALID_POINTER_READ_DETOURED_esint+2190 WATSON_STAGEONE_URL: http://watson.microsoft.com/StageOne/dllhost_exe/6_1_7600_16385/4a5bca54/ole32_dll/6_1_7601_17514/4ce7c92c/c0000005/0002c170.htm?Retriage=1 Followup: MachineOwner

 

We can see esint.dll pointed to by !analyze –v is from Citrix:

Loaded symbol image file: esint.dll
Image path: c:\program files (x86)\Citrix\system monitoring\Agent\edgesight\esint.dll
Image name: esint.dll
Timestamp:        Thu May 29 00:47:10 2014 (5385F6EE)
CheckSum:         00013F89
ImageSize:        00013000
File version:     5.4.16.19
Product version:  5.4.16.19
File flags:       0 (Mask 3F)
File OS:          40004 NT Win32
File type:        2.0 Dll
File date:        00000000.00000000
Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

However I didn’t suspect this component was likely to be at fault.

Almost always dllhost.exe crashes are caused by 3rd party viewers/codecs/printer drivers/etc. So I carefully examined the output of lmv

and found this…

000007fe`dcc00000 000007fe`dd9ab000   npdf       (deferred)            
    Image path: C:\Program Files\Nitro\Pro 9\npdf.dll
    Image name: npdf.dll
    Timestamp:        Tue Jun 24 12:13:04 2014 (53A8DEB0)
    CheckSum:         00C96AB8
    ImageSize:        00DAB000
    File version:     9.5.19.13
    Product version:  3.9.0.0
    File flags:       28 (Mask 3F) Private Special
    File OS:          40004 NT Win32
    File type:        2.0 Dll
    File date:        00000000.00000000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

 

I then checked DllHost.exe on several machines, including the DllHost.exe generated when printing, and none of them had this DLL loaded, even though the application was installed.

So this suggested to me this DLL was loaded only on tasks related to this plugin.

However I had several hundred minidumps, did they all have this module loaded?

To check this I ran another quick PowerShell script against the output text files:

$files=Get-ChildItem -Path c:\support\minidumps -Filter *dllhost*.txt $hasNitro=0 $noNitro=0 ForEach ($file in $files) { $sr = New-Object System.IO.StreamReader($file.FullName) $text=$sr.ReadToEnd() if ($text.Contains("npdf")) { $hasNitro++ } else { $file.FullName $noNitro++ } }

 

Checking the value of HasNitro vs noNitro at the end, we saw 153 had the DLL loaded, 2 didn’t. Because these were different I output their filename for further manual analysis. (The final 2 were related to Photo Preview Handler )

What was it doing to crash? Due to various reasons we were restricted and weren’t able to talk to users to find out what they were doing leading up to crash . So I checked dllhost.exe on it’s own, and the DLLs and compared them further with the crash dumps. (Note when printing, you will also see a dllhost.exe launch – this will have different DLLs again)

image

All the crash dumps also had Microsoft Thumbnail Cache loaded

000007fe`f4010000 000007fe`f402f000   thumbcache   (deferred)            
    Mapped memory image file: C:\symbols\thumbcache.dll\4CE7C9D01f000\thumbcache.dll
    Image path: C:\Windows\System32\thumbcache.dll
    Image name: thumbcache.dll
    Timestamp:        Sun Nov 21 00:14:56 2010 (4CE7C9D0)
    CheckSum:         00022DBA
    ImageSize:        0001F000
    File version:     6.1.7601.17514
    Product version:  6.1.7601.17514
    File flags:       0 (Mask 3F)
    File OS:          40004 NT Win32
    File type:        2.0 Dll
    File date:        00000000.00000000
    Translations:     0409.04b0
    CompanyName:      Microsoft Corporation
    ProductName:      Microsoft® Windows® Operating System
    InternalName:     thumbcache.dll
    OriginalFilename: thumbcache.dll
    ProductVersion:   6.1.7601.17514
    FileVersion:      6.1.7601.17514 (win7sp1_rtm.101119-1850)
    FileDescription:  Microsoft Thumbnail Cache
    LegalCopyright:   © Microsoft Corporation. All rights reserved.

So from this we can guess – the crashes occurred when building thumbnails of PDFs

From http://msdn.microsoft.com/en-us/library/windows/desktop/cc144118(v=vs.85).aspx we can see thumbnail preview handlers have GUID E357FCCD-A995-4576-B01F-234630154E96

Looking up .PDF in HKEY_CLASSES_ROOT we see (Default) is set to NitroPDF.Document9image

We then look up HKEY_CLASSES_ROOT\NitroPDF.Document.9

And sure enough we can see it has a thumbnail handler installed under ShellEx\{e357fccd-a995-4576-b01f-234630154e96}

Note: {8895b1c6-b41f-4c1c-a562-0d564250836f} is for the preview handler. http://msdn.microsoft.com/en-us/library/windows/desktop/cc144144(v=vs.85).aspx

image

We can remove the preview handler by deleting this registry key.

We contacted the vendor and they confirmed their product caused this issue, and the latest version of the product had fixed the bug.

About chentiangemalc

specializes in end-user computing technologies. disclaimer 1) use at your own risk. test any solution in your environment. if you do not understand the impact/consequences of what you're doing please stop, and ask advice from somebody who does. 2) views are my own at the time of posting and do not necessarily represent my current view or the view of my employer and family members/relatives. 3) over the years Microsoft/Citrix/VMWare have given me a few free shirts, pens, paper notebooks/etc. despite these gifts i will try to remain unbiased.
This entry was posted in Citrix, ProcExp, WinDbg and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s