今天测试反应问题,性能测试环境一个脚本执行了大约3个小时没有出结果,期间其他dba已经建立了一些索引但是没有效果。
语句:
DELETE T from License T WHERE exists (select 1 from #i_RollbackTrans r where r.LicenseID= T.LicenseID)
表 License数据量4千万
表 #i_RollbackTrans数据量5万
最终删除数据量5万
部分执行计划:

可以发现表上有很多索引, 临时表经过distinct之后和索引进行nested loop,问题不大。

通过以下sql查询会话的read write和cpu 变化:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263 | ; WITH task_space_usage AS ( -- SUM alloc/delloc pages SELECT session_id, request_id, SUM (internal_objects_alloc_page_count) AS internal_alloc_pages, SUM (user_objects_alloc_page_count) AS user_alloc_pages FROM sys.dm_db_task_space_usage WITH (NOLOCK) WHERE session_id <> @@SPID GROUP BY session_id, request_id ) SELECT r.session_id ,r.percent_complete --,r.query_hash --,r.sql_handle --,r.plan_handle ,r.start_time ,getdate() ,datediff(ss,r.start_time,getdate()) duration , DB_NAME(r.database_id) AS dbname ,[Parent Query] = st.text -- ,[Individual Query] = SUBSTRING (st.text, (r.statement_start_offset/2) + 1, --((CASE WHEN r.statement_end_offset = -1 --THEN LEN(CONVERT(NVARCHAR(MAX), st.text)) * 2 -- ELSE r.statement_end_offset -- END - r.statement_start_offset)/2) + 1 ,[Individual Query] = SUBSTRING (st.text, (r.statement_start_offset/2), (( CASE WHEN r.statement_end_offset = -1 THEN DATALENGTH(st.text) ELSE r.statement_end_offset END - r.statement_start_offset)/2)) ,r.wait_type ,r.cpu_time, r.reads, r.writes ,r.blocking_session_id,r.session_id --,dbo.udf_GetHeadBlocker(r.session_id) AS 'blockChain' ,r.percent_complete ,r.wait_resource,r.status, r.command, r.transaction_isolation_level ,s.host_name, s.login_name , case when s.program_name like 'SQLAgent - TSQL JobStep%' then --substring(s.program_name,30,34) ( select name from msdb.dbo.sysjobs where cast (job_id as binary (16)) = convert (varbinary(16), substring (s.program_name,30,34),1)) else s.program_name end as program_name , s.host_process_id, c.client_net_address --,deqp.query_plan ,blocking_cache.text AS blocking_text , waitstats.wait_duration_ms ,TSU.internal_alloc_pages * 1.0 / 128 AS [internal object MB space ] ,TSU.user_alloc_pages * 1.0 / 128 AS [ user object MB space ] FROM sys.dm_exec_requests AS r left JOIN task_space_usage AS TSU ON TSU.session_id = r.session_id AND TSU.request_id = r.request_id left JOIN sys.dm_exec_sessions AS s ON s.session_id=r.session_id left JOIN sys.dm_exec_connections AS c ON s.session_id=c.session_id LEFT JOIN sys.dm_exec_connections AS blocking ON blocking.session_id = r.blocking_session_id CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS st OUTER APPLY sys.dm_exec_sql_text(blocking.most_recent_sql_handle) blocking_cache --OUTER APPLY sys.dm_exec_query_plan(r.plan_handle) deqp LEFT OUTER JOIN sys.dm_os_waiting_tasks waitstats ON waitstats.session_id = r.session_id WHERE r.session_id>50 AND r.session_Id NOT IN (@@SPID) ORDER BY r.cpu_time DESC ,r.reads DESC , r.blocking_session_id DESC GO |
观察结果发现,read很少只有300,write到30万,但是执行期间一直没有变化,cpu_time持续增加,会话状态是running并且没有等待事件。
通过分析得出,执行计划查询部分和上面分析一样不存在问题,虽然表上有很多索引,但是删除数据占比表整体数据较少0.1%, 而且writes始终没变化,证明不是阻塞再索引的删除上面。那么是什么消耗cpu_time但是没有任何IO变化?
下面看被表引用的外键关系:

可以看到相当多的外键引用该表,并且是强制检查状态,其中一部分还是自引用。
最终通过临时禁用外键检测,删除,再启用的方式将3个小时没有执行完的sql优化再20分钟完成。