Case of the Disappearing PDF

Yesterday a colleague showed me an interesting problem. Opening a PDF from Internet Explorer would result in the following error message

There was an error opening this document. This file cannot be found.

(Apologies to the world; we were unfortunately forced to be using Windows XP and IE6 in this scenario)

To make it more interesting this only occurred for certain PDFs. Even more interesting it only occurred for certain PDFs when opened with Exchange 2010 Outlook Web Access. Way more interesting is fact the exact same PDF would work when opening the attachment if sent from certain people. On top of all this – logging onto the machine as another user, the exact same attachments all opened fine.

First tried clearing cache and trying IE with add-ins disabled. Unfortunately Adobe still failed spectacularly so we started ProcMon. (http://live.sysinternals.com/ProcMon.exe)

A ProcMon a day keeps the bugs at bay.

So with my filter set to include only items with Path contains .pdf we were ready for action

So the file is there and then suddenly disappeared. How did that happen?

I think this is a perfect example of even if you are not a Win32 developer some basic knowledge of how Windows APIs work is very helpful. With calls like CreateFile and CloseFile you might think deleting a file will show up as DeleteFile…but no. A lot of people I notice get confused by this so some explanation here.

Firstly CreateFile can be used to create files, but it is also the API used to open files. (http://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx)

If we look at the Details column in ProcMon for the CreateFile event we can use Delete access was requested

While Windows does have DeleteFile API (http://msdn.microsoft.com/en-us/library/windows/desktop/aa363915(v=vs.85).aspx) you will never see DeleteFile in ProcMon. Instead we will see SetDispositionInformationFile event with Delete: True in the details column.

A single DeleteFile call will result in the following (or similar) operations appearing in ProcMon

OK enough going off track.

Now we know why Adobe can’t find the file. Internet Explorer deleted the file, but why?

I selected the CreateFile event, right clicked Properties and in vain checked the Stack tab for some clues…

No luck there so I now turned to my favourite web debugging proxy – http://www.fiddler2.com – Don’t leave home without it

I now wanted to compare request/response headers of working/broken versions, and also to confirm the PDF downloaded correctly.

As these sites were https:// based I had to enable Decrypt HTTPS option in Tools | Fiddler Options…

At this stage you will be prompted if you want to install the Fiddler Root Certificate so you won’t get https warnings when logging with Fiddler. If logging on a users machine remember to Remove Interception Certificates when done.

Enabling this feature requires restarting Fiddler.

So we logged our first broken/working scenario – opening the attachment from same email; but using different Outlook Web Access server.

I used Ctrl+F and searched for PDF. This highlights all web request/responses containing text PDF

The broken PDF

I first checked the file itself downloaded correctly. I saved the PDF by right clicking the event and saving the Response Body. PDF opens fine. So this at least rules out some kind of download/file corruption…

Next I compared the response headers from the working/broken PDFs

Broken Response Header
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: application/pdf; authoritative=true;
Expires: Mon, 13 Feb 2012 23:40:06 GMT
Server: Microsoft-IIS/7.5
X-OWA-Version: 14.1.323.3
Content-Disposition: attachment; filename=”Tide_Tables.pdf”
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
X-UA-Compatible: IE=EmulateIE7
Date: Tue, 14 Feb 2012 23:40:06 GMT
Connection: Keep-Alive
Content-Length: 3216993
Vary: Accept-Encoding

Working Response Header
HTTP/1.1 200 OK
Cache-Control: private
Content-Length: 3216993
Content-Type: application/pdf
Expires: Mon, 13 Feb 2012 23:42:43 GMT
Server: Microsoft-IIS/7.0
X-AspNet-Version: 2.0.50727
X-OWA-Version: 8.2.234.1
Content-Disposition: attachment; filename=”Tide_Tables.pdf”
X-Powered-By: ASP.NET
Date: Tue, 14 Feb 2012 23:42:43 GMT
Connection: keep-alive

A quick search around the internet found many people complaining about issues opening PDFs from IIS/7.5. Microsoft had a hot-fix available here

http://support.microsoft.com/kb/979543

But unfortunately our scenario didn’t match that of the KB article…

So we then tried our next scenario – the same PDF, opened with same Outlook Web Access, just sent by different people.

In this case the only difference we saw was broken PDF had

Content-Type: application/pdf; authoritative=true;

Where as working PDF had

Content-Type: application/octet-stream

It seems when email had been attached by different email programs it may have resulted in a different content type being used.

But this didn’t explain … why did it work as a different user?

If it worked as a different user I suspected the Request Header must be different; thus resulting in a different Response

The broken user had HTTP request that looked like this:

GET /owa/attachment.ashx?attach=1&id=RgAAAAD6A0B1FqqoQ69T0wAFn9y3BwBqT9ej25owTbmDHT%2bj1249AAAANkksAAAr69Y%2fPZnDSY%2f2aPbTHl1AAAAAAHjAAAAJ&attid0=BAAAAAAA&attcnt=1 HTTP/1.0
Accept: */*
Referer: https://webmail.somecompany.com.au/owa/?ae=Item&t=IPM.Note&id=RgAAAAD6A0B1FqqoQ69T0wAFn9y3BwBqT9ej25owTbmDHT%2bj1249AAAANkksAAAr69Y%2fPZnDSY%2f2aPbTHl1AAAAAAHjAAAAJ
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)
Host: webmail.somecompany.com.au
Connection: Keep-Alive
Cookie: s_vi=[CS]v1|279D7657051D31A9-4000012880032DB2[CE]; s_cc=true; s_sq=%5B%5BB%5D%5D; BIGipServerH_Exchange_2010__single_owa_pool=691176970.47873.0000; OutlookSession=88af8ba7095647cd9ba6e1f4e43812c6; PBack=0; sessionid=3c6fef26-a905-4db0-adb7-4c5a17324685; cadata=”4SpfNuAR3tXBNnDdHk2D8oOV4S6+gG68tDjP3s1QSd7LDfky1FBC0d8vA64/RP18oLHmFAxn+9Z1v6J04QlPueJ/UO4fhHtdfAA5gOmG1lT0=”; UserContext=df3e2c771f5844ebada88e3c33f78d04; tzid=AUS Eastern Standard Time
X-NovINet: v1.2

The working user had HTTP request that looked like this:

GET /owa/attachment.ashx?attach=1&id=RgAAAAD6A0B1FqqoQ69T0wAFn9y3BwBqT9ej25owTbmDHT%2bj1249AAAANkksAAAr69Y%2fPZnDSY%2f2aPbTHl1AAAAAAHjAAAAJ&attid0=BAAAAAAA&attcnt=1 HTTP/1.1
Accept: */*
Referer: https://webmail.somecompany.com.au/owa/?ae=Item&t=IPM.Note&id=RgAAAAD6A0B1FqqoQ69T0wAFn9y3BwBqT9ej25owTbmDHT%2bj1249AAAANkksAAAr69Y%2fPZnDSY%2f2aPbTHl1AAAAAAHjAAAAJ
Accept-Language: en-au
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)
Host: webmail.somecompany.com.au
Connection: Keep-Alive
Cookie: s_vi=[CS]v1|279D89BF851D0ADC-40000102000171FB[CE]; s_cc=true; s_sq=%5B%5BB%5D%5D; BIGipServerH_Exchange_2010__single_owa_pool=691176970.47873.0000; OutlookSession=b8a42aafc1d7412dab3ec1bd16a5c4e0; PBack=0; sessionid=6af1def4-40c6-4f0b-94fa-38c2066d45b5; cadata=”49TqgrxdNz9t7TdPNGwNXhyqNxcFpTsqdxi7qx2AVgwN/SM3SbGxNefHDKnjac15+cyh3hNjfJLHeF6L13gNG0W5OafxFtd41FtBeOwozv3w=”; UserContext=570d6a25896947b980468b2e0a9d5bed; tzid=AUS Eastern Standard Time
X-NovINet: v1.2

So did you spot the difference? The working user’s GET command is specifying HTTP/1.1 which also allows for the use of the Accept-Encoding: gzip, deflate. The broken user’s GET command was HTTP/1.0. For a detailed explanation on key differences between HTTP/1.0 and HTTP/1.1 refer to http://www8.org/w8-papers/5c-protocols/key/key.html

Checking user’s Internet Options confirmed HTTP/1.1 was disabled; enabling Use HTTP 1.1 and Use HTTP 1.1 through proxy connections all PDFs opened happily.

As for why IE chose to delete the PDF even though it downloaded fine; that is still a mystery to me. If you have any idea let me know Smile (I would have spent more time figuring out why on IE9/10 but IE6 my care factor dropped to zero)

3 Responses to Case of the Disappearing PDF

ramonhimera says:

July 10, 2012 at 2:34 am

Very interesting seeing your modus operandi – thx!
I did some rooting around and found that this was the thing that worked for me…

header(“Cache-Control: private”);

Apparently the browser tries to cache the file, finds it cannot so deletes it, then Acrobat tries to open it anyway!

aks says:

July 16, 2012 at 7:19 pm

Very good article….Thanks for sharing…

Teo Ortega says:

July 27, 2012 at 1:52 am

Excellent article, it help to understand how PROCMON shows the delete files. . . .thanks