SQL Server database mail问题诊断一例

产品环境sql server database的mail发不出邮件,影响客户的业务,在数据库中进行诊断

诊断sql:

EXEC msdb.dbo.sp_send_dbmail

@profile_name = ‘DB Messaging’,

@recipients = ‘jamesz@gmail.com’,

@body = ‘James_test_2019’,

@body_format = ‘HTML’,

@subject = ‘James_test_2019’

执行过后并没有收到邮件,查询event log表,也没有任何记录:

SELECT * FROM msdb.dbo.sysmail_event_log

ORDER BY log_id DESC

检查邮件的状态,都是unsent:

SELECT top 100 * FROM msdb.dbo.sysmail_allitems 

ORDER BY send_request_date DESC

根据以下链接的建议进行诊断:

所有诊断项均正常。

同时检查sql server errorlog, windows event log和windows cluster log,发现周末出现的storage异常和mail功能失效关联端倪

The brief analysis for mail can’t sent problem:

sql errorlog显示sql instance的恢复启动时间是 2019-10-25 11:04:50.050 CDT

但是在 windows cluster log,  clussvc.exe 在时间段 2019/10/25-07:54:02.903 to 2019/10/25-16:04:18.499并没有记录

恰好在这段时间,storage 存储异常

之后sql instance在cluster service有问题的情况下启动工作

并且获取了本地local的信息而不是cluster service的信息.

windows event log显示,database mail通过连接. 到db而出现错误证明上述猜测

Server Name: ., Database Name: msdb

解决方案就是重启备用节点后,做一个干净状态的failover,让sql instance在正常的cluster service基础之上。

 SQL Server Errorlog:

2019-10-25 11:04:50.050  Server  Microsoft SQL Server 2014 (SP3-CU2) (KB4482960) – 12.0.6214.1 (X64)    Feb  2 2019 01:10:18    Copyright (c) Microsoft Corporation   Enterprise Edition: Core-based Licensing (64-bit) on Windows NT 6.3 <X64> (Build 9600: )  

2019-10-25 11:04:50.050  Server  UTC adjustment: -5:00

2019-10-25 11:04:50.050  Server  (c) Microsoft Corporation.

2019-10-25 11:04:50.050  Server  All rights reserved.

2019-10-25 11:04:50.050  Server  Server process ID is 5700.

2019-10-25 11:04:50.050  Server  System Manufacturer: ‘Cisco Systems Inc’, System Model: ‘UCSB-B200-M3’.

2019-10-25 11:04:50.060  Server  Authentication mode is MIXED.

2019-10-25 11:04:50.060  Server  Logging SQL Server messages in file ‘D:\MSSQL12.MSSQLSERVER\MSSQL\Log\ERRORLOG’.

2019-10-25 11:04:50.060  Server  The service account is ‘TAN\svc.DBAHFDN01VS.sql’. This is an informational message; no user action is required.

2019-10-25 11:04:50.060  Server  Registry startup parameters:     -d D:\MSSQL12.MSSQLSERVER\MSSQL\DATA\master.mdf    -e D:\MSSQL12.MSSQLSERVER\MSSQL\Log\ERRORLOG    -l D:\MSSQL12.MSSQLSERVER\MSSQL\DATA\mastlog.ldf    -T 3226

2019-10-25 11:04:50.060  Server  Command Line Startup Parameters:    -s “MSSQLSERVER”

2019-10-25 11:04:50.450  Server  SQL Server detected 2 sockets with 10 cores per socket and 20 logical processors per socket, 40 total logical processors; using 40 logical processors based on SQL Server licensing. This is an informational message; no user action is required.

2019-10-25 11:04:50.450  Server  SQL Server is starting at normal priority base (=7). This is an informational message only. No user action is required.

2019-10-25 11:04:52.180  spid9s  The NETBIOS name of the local node that is running the server is ‘DBAHFDN01N1’. This is an informational message only. No user action is required.

 cluster log:

00002750.0000251c::2019/10/25-07:54:02.903 DBG   [RCM] rcm::RcmResource::Control: (QUORUM_NEW, SWITCH_MONITORS)

00002750.00000ed8::2019/10/25-07:54:02.903 DBG   [RCM] QUORUM_NEW: Ignoring reset because resource flag 2 is not set.

00002750.00000ed8::2019/10/25-07:54:02.903 DBG   [RCM] QUORUM_NEW: Ignoring reset because resource flag 32 is not set.

00002750.0000251c::2019/10/25-07:54:02.903 DBG   [RCM] Switch monitor call for QUORUM_NEW when there were pending controls, enqueuing the switch

00002750.0000251c::2019/10/25-07:54:02.903 DBG   [RCM] Pending controls for QUORUM_NEW are empty now but we’re not in a state that can switch monitors; removing SWITCH_MONITORS control.

00002750.00000ed8::2019/10/25-07:54:02.903 DBG   [RCM] rcm::RcmGroup::ComputeFailoverThreshold=> (Cluster Group, 0, computed)

00002750.00000ed8::2019/10/25-07:54:02.903 DBG   [RCM] rcm::RcmResource::StmPreOnlineCheck(QUORUM_NEW)

00000b4c.00000b50::2019/10/25-16:04:18.499 INFO  —————————–+ LOG BEGIN +—————————–

00000b4c.00000b50::2019/10/25-16:04:18.499 INFO  [CS] Starting clussvc as a service

00000b4c.00000b50::2019/10/25-16:04:18.499 INFO  [CS] cluster service logging level is 5

00000b4c.000009cc::2019/10/25-16:04:18.546 INFO  [CS] Creating cluster node <vector len=’1′>

00000b4c.000009cc::2019/10/25-16:04:18.546 INFO      <item>ClusSvc</item>

00000b4c.000009cc::2019/10/25-16:04:18.546 INFO  </vector>

00000b4c.000009bc::2019/10/25-16:04:18.546 INFO  [StartupConfig]: Initializing.

Windwos event log:

10/30/2019 12:05:03 PM

1) Exception Information

===================

Exception Type: Microsoft.SqlServer.Management.SqlIMail.Server.Common.BaseException

Message: There was an error on the connection. Reason: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 – Could not open a connection to SQL Server), connection parameters: Server Name: ., Database Name: msdb

Data: System.Collections.ListDictionaryInternal

TargetSite: Void OpenConnection(Microsoft.SqlServer.Management.Common.SqlConnectionInfo)

HelpLink: NULL

Source: DatabaseMailEngine

StackTrace Information

===================

   at Microsoft.SqlServer.Management.SqlIMail.Server.DataAccess.ConnectionManager.OpenConnection(SqlConnectionInfo connectionInfo)

   at Microsoft.SqlServer.Management.SqlIMail.Server.DataAccess.DataAccessAdapter.OpenConnection(String dbServerName, String dbName, String userName, String password, String appName, Int32 connectionTimeout)

   at Microsoft.SqlServer.Management.SqlIMail.Server.DataAccess.DataAccessAdapter.OpenConnection(String dbServerName, String dbName, String userName, String password, Int32 connectionTimeout)

   at Microsoft.SqlServer.Management.SqlIMail.IMailProcess.QueueItemProcesser.ProcessQueueItems(String dbName, String dbServerName, Int32 lifetimeMinimumSec, LogLevel loggingLevel, Byte[] encryptionKey, Int32 connectionTimeout)

2) Exception Information

===================

Exception Type: System.Data.SqlClient.SqlException

Errors: System.Data.SqlClient.SqlErrorCollection

Class: 20

LineNumber: 0

Number: 2

Procedure: NULL

Server: 

State: 0

Source: .Net SqlClient Data Provider

ErrorCode: -2146232060

Message: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 – Could not open a connection to SQL Server)

Data: System.Collections.ListDictionaryInternal

TargetSite: System.Data.ProviderBase.DbConnectionInternal GetConnection(System.Data.Common.DbConnection)

HelpLink: NULL

StackTrace Information

===================

   at System.Data.ProviderBase.DbConnectionPool.GetConnection(DbConnection owningObject)

   at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)

   at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)

   at System.Data.SqlClient.SqlConnection.Open()

   at Microsoft.SqlServer.Management.SqlIMail.Server.DataAccess.ConnectionManager.OpenConnection(SqlConnectionInfo connectionInfo)

发表评论

电子邮件地址不会被公开。 必填项已用*标注