Troubleshooting 101

I love going deep and trying to look into memory dumps and gigabyte sized ProcMon logs, and through pages of API traces, etc. That being said there is a case for the old fashioned classic troubleshooting methods that pretty much apply in any industry & technology. The great thing about classic troubleshooting methods is without even knowing the technology/product you can often make more progress then a random attack at mountains of data. (OK it doesn’t help I’m no Raymond Chen on WinDbg – I have no choice but to start at basics Winking smile)

I say this having seen many such examples throughout my career but write this because today I saw another great example.

A large organization had Outlook crashing all over the place. It was unstable. It froze. It took 5 minutes to send an email. It was consistently reproducible problem. Microsoft was engaged in a support case, as was the Anti-Virus vendor as that was suspected as a potential cause. Now Microsoft support asked for tests to be run. These are valuable tests, and what I would run myself. However this is where they started:

1) Reproduce problem 5 times, 1 minute apart, each time running ADPlus –hang to generate 5 user mode dumps.

2) Reproduce problem, generate a FULL memory dump on the client.

3) Reproduce problem generate a ProcMon log.

All great steps in troubleshooting application hangs, I don’t doubt that. So on 3 machines these tests were run and 16 GB of logs produced to transfer to Microsoft support. Two days later (possibly overwhelmed by all the data) Microsoft support asked for even more tests running additional tracing tools. Again perfectly valid for identifying application hangs.

However through this period I managed to get involved and proud to say despite Microsoft having a week head start on me I was able to very rapidly identify root cause & resolve the issue without examining any logs or memory dumps. In fact total troubleshooting time was probably about 15 minutes of testing.

So my conversation with a technical resource at customer site is something like this…

When did this start happening?

Since they migrated from GroupWise to Outlook.

How frequently does this occur?

Every time they send an email.

What version of Outlook / OS are you using?

Office XP + Office 2007, Windows XP SP3. We think the crashes might be caused by the different office versions.

Have you tried running Outlook without add-ins?

No.

Can you please run Outlook /safe?

OK. Just a moment…

5 minutes later…

OMG! The problem doesn’t occur anymore.

What add-ins do you have?

Three add-ins….

Please enable one at a time to rule out which add-in is causing problem. Make a table like this:

Add-In #1 Add-In #2 Add-In #3 Issue Occurs?
Disabled Disabled Disabled No
Enabled Enabled Enabled Yes
Enabled Disabled Disabled Yes
Disabled Enabled Disabled No
Disabled Disabled Enabled No
Disabled Enabled Enabled No

Quick look at this table and you see Add-in #1 is the culprit. OK…

What version of add-in #1 do you have?

Version 7.5.

And a quick google search later I found that this version of the add-in was not supported on Outlook 2007, and latest version was 9.2.

Contacted the vendor – the customer is entitled to free client upgrade. Upgraded 3 users to test and the problem instantly disappeared. All without touching a log file…

Moral of the story : Advanced troubleshooting techniques are great, but don’t use these techniques as a replacement for the basics. Check the basics first. What’s changed? When did it start happening? How many users affected? Happens at home/in office/etc? Version of software? Event log? Another machine? 3rd-party add-ins/etc…

EDIT: 3 weeks later Microsoft came back and confirmed what I found.

About chentiangemalc

specializes in end-user computing technologies. disclaimer 1) use at your own risk. test any solution in your environment. if you do not understand the impact/consequences of what you're doing please stop, and ask advice from somebody who does. 2) views are my own at the time of posting and do not necessarily represent my current view or the view of my employer and family members/relatives. 3) over the years Microsoft/Citrix/VMWare have given me a few free shirts, pens, paper notebooks/etc. despite these gifts i will try to remain unbiased.
This entry was posted in Office, Random, Troubleshooting and tagged . Bookmark the permalink.

2 Responses to Troubleshooting 101

  1. Pingback: Case of the Outlook 2010 Crash on Startup … even Outlook /Safe | chentiangemalc

  2. iisbrasil says:

    Very good!! Congrats!

    Hugs.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s