Run the below query to find failed/hung jobs in your SQL Server Instance:
SELECT sjh.server as 'Server Name' ,sj.name as 'Failed Job Name', sjh.message as 'Error Description', sjh.run_date, sjh.run_time as 'Job Run Time/Job Hung Time'
FROM msdb..sysjobhistory sjh
JOIN msdb..sysjobs sj on sjh.job_id = sj.job_id
JOIN (SELECT job_id, max(instance_id) maxinstanceid
WHERE run_status NOT IN (1,4)
GROUP BY job_id) a ON sjh.job_id = a.job_id AND sjh.instance_id = a.maxinstanceid
WHERE DATEDIFF(hh, CONVERT(datetime,CONVERT(VARCHAR(8),sjh.run_date),12), GETDATE()) <= 24
select job.originating_server as 'Server Name', job.Name as 'Failed Job Name', 'Warning: Job Hung for More than 1 Day.' as 'Error Description',
CONVERT(char(10), activity.run_requested_Date,112) as run_date,
datediff(minute, activity.run_requested_Date, getdate()) as Elapsed
from msdb.dbo.sysjobs_view job
inner join msdb.dbo.sysjobactivity activity on (job.job_id = activity.job_id)
where run_Requested_date is not null
and stop_execution_date is null
and datediff(hour, activity.run_requested_Date, getdate()) >= 24
NOTE: Ideally, you would want to know if a job is hung as soon as possible.
Hence, you can replace the underlined code above with the one below to reduce the time-interval for Hung Jobs from 1 day to 1 Hour:
datediff(minute, activity.run_requested_Date, getdate()) >= 60
Use a Power Shell script/Windows Scheduled Task to have this scheduled & run on multiple SQL Server Instances in a jiffy!