Sandheep Unnikrishnan

My photo
“Senior System Administrator – Windows Server & Messaging.” in Qatar University Since early 2004 I started working as Computer Hardware Engineer in a very small organization and later I worked with many other MNC such as Wipro, Sumeru Software Solutions, Symphony Services, Saggezza Technologies, Alcatel-Lucent, Hewlett-Packard (HP). . . .. .. ... ... .... ....

Thursday, November 17, 2011

Backup fails along with Storage Group Consistency Check failure Exchange Server 2010 Enterprise

Environment Details:
====================
Application: Exchange 2010 Enterprise SP1 RU4v2
High Availability: DAG with 2 MBX servers and 2 Copy for each DB.
OS: Windows Server 2008 R2 SP1

Backup Solution:  Symantec Netbackup 7.0
=========================================================================
=========================================================================


Problem Definition:
=========================================================================
Backup fails intermittently.
NetBackup shows that backup is partially successful.
We checked with Symantec netbackup Product Support team and confirmed there is nothing wrong from their end.
Backup run from passive copy of DAG

Whenever backup fails we can see following events in Application log.


----------------------------------
Log Name:      Application
Source:        Storage Group Consistency Check
Date:          9/19/2011 9:19:22 PM
Event ID:      403
Task Category: Termination
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      MBX2.mydomain.com
Description:
Instance 1: The physical consistency check successfully validated 15134 out of 13668672 pages of database '\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy307\DB02\DB02.edb'. Because some database pages were either not validated or failed validation, the consistency check has been considered unsuccessful.
-------------------------------------------

Log Name:      Application
Source:        Storage Group Consistency Check
Date:          9/19/2011 9:19:22 PM
Event ID:      401
Task Category: Termination
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      MBX2.mydomain.com
Description:
Instance 1: The physical consistency check has completed, but one or more errors were detected. The consistency check has terminated with error code of -106 (0xffffff96).

-------------------------------------------

Log Name:      Application
Source:        Storage Group Consistency Check
Date:          9/19/2011 9:21:35 PM
Event ID:      204
Task Category: Database Header Validation
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      MBX2.mydomain.com
Description:
Instance 1: An attempt to read the database header of '\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy309\DB03\DB03.edb' failed with error code -1811 (0xfffff8ed). Database header validation failed with this error.


-------------------------------------------


Log Name:      Application
Source:        Storage Group Consistency Check
Date:          9/19/2011 9:21:35 PM
Event ID:      405
Task Category: Termination
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      MBX2.mydomain.com
Description:
Instance 1: The physical consistency check did not successfully validate the transaction log files in '\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy310' with a base name of 'E03'. Because some (or all) log files were either not validated or failed validation, the consistency check has been considered unsuccessful.


-------------------------------------------

Log Name:      Application
Source:        ESE BACKUP
Date:          9/20/2011 1:01:25 AM
Event ID:      914
Task Category: General
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      MBX2.mydomain.com
Description:
Information Store (7768) The surrogate backup by MBX2 has stopped with error 0xFFFFFFFF.

---------------------------------------

Log Name:      Application
Source:        ESE
Date:          9/20/2011 1:01:25 AM
Event ID:      215
Task Category: Logging/Recovery
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      MBX2.mydomain.com
Description:
Information Store (7768) DB07: The backup has been stopped because it was halted by the client or the connection with the client failed.

----------------------------------------------

Workaround/Solution
-------------------------------
run 'vssadmin list writers' and make sure that there is no write errors. if yes its problem with VSS writer, can try to update the VSS version.

Other possibility for this is Antivirus Software.
AV might be scanning the DB and log files.
Make sure that AV exclusion configured correctly. it should exclude DB and log files (exclude exchange bin directory as well)
==========================================================================
##########################################################################

Wednesday, April 20, 2011

Run Soft Recovery with eseutil - Mailbox DB failed to mount.

While we were trying to restore the MailboxDB (to original location, without Recovery Database), DB failed to mount.

Couldn't mount the database that you specified. Specified database: test100; Error code: An Active Manager operation fa
iled with a transient error. Please retry the operation. Error: A transient error occurred during discovery of the data
base availability group topology. Error: Database action failed with transient error. Error: A transient error occurred
 during a database operation. Error: MapiExceptionJetErrorFileIOBeyondEOF: Unable to mount database. (hr=0x80004005, ec
=-4001)
 [Database: test100, Server: MBX2.dom.com].
    + CategoryInfo          : InvalidOperation: (test100:ADObjectId) [Mount-Database], InvalidOperationException
    + FullyQualifiedErrorId : BDB1C812,Microsoft.Exchange.Management.SystemConfigurationTasks.MountDatabase

Even I tried to mount the DB forcefully, (Mount-Database -force), There was no luck.

Troubleshooting

Ran ‘eseutil /mh’ against the DB file and found it is in dirty shutdown, and it need couple of log files to mount DB successfully.
Verified all logs are present in Log drive.
Tried to replay the logs (Eseutil.exe /c /f), it failed because it missing the base log file (E0B.log)

Here, I thought to ran soft recovery, using eseutil.exe /r


PS C:\Program Files\Microsoft\Exchange Server\V14\Bin> eseutil /r E0B  /d X:\test100\ /l X:\test100\test100 /s X:\test10
0\test100

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 14.00
Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating RECOVERY mode...
    Logfile base name: E0B
            Log files: X:\test100\test100
         System files: X:\test100\test100
   Database Directory: X:\test100\

Performing soft recovery...

 &nbs
base availability group topology. Error: Database action failed with transient error. Error: A transient error occurred
 d seconds.


It failed L…..
But I see there is option for loosy recovery, it’s good to have something than nothing….


C:\Program Files\Microsoft\Exchange Server\V14\Bin> eseutil /r E0B  /d X:\test100\ /l X:\test100\test100 /s X:\test10 0\test100 /a

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 14.00
Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating RECOVERY mode...
    Logfile base name: E0B
            Log files: X:\test100\test100
         System files: X:\test100\test100
   Database Directory: X:\test100\

Performing soft recovery...
                      Restore Status (% complete)

          0    10   20   30   40   50   60   70   80   90  100
          |----|----|----|----|----|----|----|----|----|----|
          .......................................................

Operation completed successfully in 397.600 seconds.

Soft recovery completed successfully and DB mounted successfully.