Exchange 2010 MSExchangetransport service crashed, Forefront Mail Pickup service error, Event ID:4999, Event ID:10003, Event ID:5167, Event ID:17007


Yesterday I have experienced problem in one of our customers Exchange 2010 SP2. We have 8 node DAG in 2 datacenters and 4 Win NLB balanced combined CAS /HUB servers there. In one of CH servers the transport service crashed with the following Event IDs in sequence:

  • Scan Error of the poison message Event ID: 10003

initial error

  • IO Exception on the disk Event ID:4999

initial-4999

  • ForeFront Scan Error due to EdgeTransport.exe shutdown Event ID:5167

initial 5167

Problem:

  • Transport service crashed on the server NODE1
  • ForeFront Mail Pickup service crashed Messages were delayed
  • some of them lost due to content failure, but returned to sender with NDR so sender can send them again.

Symptoms:

  • Mail submission queue has started to fill with messages before Transport service crashed
  • Microsoft ForeFront Server Protection Mail Pickup service crashed (This service is sending e-mail generated by ForeFront)
  • All users, who got connection load balanced to this server were not able to send e-mail messages immediately
  • Restart of MSExchangeTransport service or affected server didn´t help

Root cause:

  • IP Filter database was logically corrupted. Root cause of logical corruption cannot be determined. Possible reason is, that MDS disk could be unreacheable for short time or performance could be low.
  • Due to this fact Transport service could not be started / perform service generating the following Event ID sequence (from bottom to top):

IPFilter database problem log sequence

  • The root cause was detected later due to Event ID 17007

17007

Workaround:

  • Server was removed from Windows load balancer and transport service is stopped, because event logs and types of errors are pointing to HW or logical failure on one of the disks or MDS (E:)
  • Crashing ForeFront has been disabled on affected server: From FF PowerShell console run
.FSCutility.exe /disable
  • All Exchange services has been stopped (This is very important task so Exchange Store.Exe cannot contact faulty server, MSExchangeADTopology service needs parameter -Force to have it stopped)
Get-Service MSE* | Stop-Service -Force
  • Transport database and IP Filter database has been moved to D drive temporarily. From Scripts direktory in Exchange install path run:
.Move-TransportDatabase.ps1 -IPFilterDatabasePath <IPDBPath> -IPFilterDatabaseLoggingPath <IPDBPath> -QueueDatabasePath <TransportDBPath> -QueueDatabaseLoggingPath <TransportDBPath>
  • Almost all messages were delivered with delay not more than few hours. 23 messages in poison queue are lost due to integrity failure
  • Server is designed as FSW for DAG. FSW has been temporarily moved to NODE2
Set-DatabaseAvailabilityGroup DAG1 -WitnessServer Node2 -WitnessDirectory D:FSW_DAG_TMP

Solution:

  • Files from E: drive were copy to another server
  • All Exchange services and Patrol agent has been stopped
  • E: drive was formatted to prevent logical corruptions on MDS disk
  • Transport database and IP Filter database has been moved to E drive again
.Move-TransportDatabase.ps1 -IPFilterDatabasePath <IPDBPath> -IPFilterDatabaseLoggingPath <IPDBPath> -QueueDatabasePath <TransportDBPath> -QueueDatabaseLoggingPath <TransportDBPath>
  • Transport service has been started automatically
  • Transport Service has been tested by sending 250 messages with 350kB attachment within short time
$i= 0
do{$i++;
$i;
Send-MailMessage -From testmail@domain-com -To testmail@domain-com -SmtpServer Node1.domain.com -Subject "Test $i" -Attachments "D:asdevicestats.csv" }
until($i -ge 250)
  • ForeFront protection agent has been integrated again
.FSCutility.exe /enable
  • Transport service has been started and tested by sending 250 messages with attachment within short time again
  • All other Exchange services have been started
Get-Service MSE* | Start-Service
Test-ServiceHealth
  • Server has been added to LB again
  • Other services have been tested (OWA, EAS and so on, since this is combined server role)
  • FSW has been moved again to Node1
Set-DatabaseAvailabilityGroup DAG1 -WitnessServer Node1 -WitnessDirectory D:FSW_DAG

Hopefully this helps you to save some time.

Advertisements

One thought on “Exchange 2010 MSExchangetransport service crashed, Forefront Mail Pickup service error, Event ID:4999, Event ID:10003, Event ID:5167, Event ID:17007

  1. Pingback: ForeFront Protection 2010 for Exchange Server integration failure after installing of Exchange 2010 SP3 RU2 on hybrid server EventID:1007,EventID:1008, EventID:9581, EventID:9564 | FICILITY.NET

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s