Four part strategy...
1) LOGON AND CHECK FOR A SCHEDULER DAEMON
Log onto TSM Backup/Archive (TSM B/A) client host via SSH. To find your target IP use the following..
q node xxxx f=d
Look for the TCP/IP Address: value
If the IP address does not show up here, try this command:
q actlog begind=-3 endd=today msgno=0406 search=xxxx
Look for the IP address that the host is talking to TSM server with
Once SSH’ed into host, sudo up to root:
Sudo su -
Find out what type of Unix OS you are dealing with:
uname -a
Find out if the TSM B/A scheduler daemon is running:
NON-Linux unix OS'es:
ps -ef | grep dsm
Linux unix OS'es:
ps -ef | grep tsm
You should see something similar to the following output:
NON-Linux unix OS’es:
root 2608 1 0 Aug 13 ? 58:58 /opt/tivoli/tsm/client/ba/bin/dsmc schedule
Linux unix OS'es:
You may see multiple daemons returned from your ‘ps –ef | grep tsm’ command. This is ok, as there should be one ‘master’ daemon and 4-5 ‘child’ daemons
2) CHECK FOR A HUNG SCHEDULER DAEMON
View the dsm.sys config file to see where dsmerror.log and dsmsched.log files are being written (on AIX, replace /opt/ with /usr/):
more /opt/tivoli/tsm/client/ba/bin/dsm.sys
Find the SCHEDLOGName entry - this typically points to /var/adm/dsmsched.log
Find the ERRORLOGName entry - this typically points to /var/adm/dsmerror.log
Wherever the two log files point to, cd to that directory:
cd /var/adm
Find out when files were last updated:
ls -ltr | grep dsm
Find out the current time of this host:
date
If the dsmerror.log file has a timestamp pretty close (within a couple of hours) to the current host time, look at the last few entries to see what’s going on:
tail -500 dsmerror.log
Sometimes this shows the TSM B/A client continuously trying to establish a connection to TSM server, but unable to do so. If this is the case, the scheduler daemon is probably hung, and needs to be killed/restarted
If no errors appear to indicate that agent is hung, move onto next check
Check dsmsched.log for last few entries
tail -100 dsmsched.log
If the last few entries seem to indicate that a backup is still running, yet the date/time stamps are old (ie. not near the current time), the scheduler daemon is probably hung and needs to be killed/restarted.
3) KILLING/RESTARTING A HUNG SCHEDULER DAEMON
Get the daemon process ID of the TSM B/A scheduler daemon that is running:
NON-Linux unix OS'es:
ps -ef | grep dsm
Linux unix OS'es:
ps -ef | grep tsm
You should see something similar to the following output:
NON-Linux unix OS’es:
root 2608 1 0 Aug 13 ? 58:58 /opt/tivoli/tsm/client/ba/bin/dsmc schedule
Linux unix OS'es:
You may see multiple daemons returned from your ‘ps –ef | grep tsm’ command. This is ok, as there should be one ‘master’ daemon and 4-5 ‘child’ daemons
Kill the TSM B/A scheduler daemon:
kill -9 2608
The number 2608 is the PID in this example command is based on the output from the above ps –ef commands.
In reality, your PID number will be different from the above example.
Be sure you are killing the correct PID!
Verify that the daemon automatically restarted itself:
NON-Linux unix OS’es:
root 3512 1 0 Aug 13 ? 58:58 /opt/tivoli/tsm/client/ba/bin/dsmc schedule
You may see multiple daemons returned from your ‘ps –ef | grep tsm’ command. This is ok, as there should be one ‘master’ daemon and 4-5 ‘child’ daemons
If you do see output similar to the above example, verify that TSM B/A scheduler daemon successfully retrieved next job from TSM server:
tail -20 /var/adm/dsmsched.log
You should see output similar to:
08/17/06 13:41:12 Querying server for next scheduled event.
08/17/06 13:41:12 Node Name: server
08/17/06 13:41:12 Session established with server server: AIX-RS/6000
08/17/06 13:41:12 Server Version 5, Release 2, Level 2.0
08/17/06 13:41:12 Server date/time: 08/17/06 12:24:57 Last access: 08/17/06 12:16:03
08/17/06 13:41:12 --- SCHEDULEREC QUERY BEGIN
08/17/06 13:41:12 --- SCHEDULEREC QUERY END
08/17/06 13:41:12 Next operation scheduled:
08/17/06 13:41:12 ------------------------------------------------------------
08/17/06 13:41:12 Schedule Name: 0000MST
08/17/06 13:41:12 Action: Incremental
08/17/06 13:41:12 Objects:
08/17/06 13:41:12 Options:
08/17/06 13:41:12 Server Window Start: 00:00:00 on 08/18/06
08/17/06 13:41:12 ------------------------------------------------------------
08/17/06 13:41:12 Command will be executed in 11 hours and 36 minutes.
4) MANUALLY STARTING A SCHEDULER DAEMON
Ensure TSM B/A client can communicate with TSM server:
dsmc query sched
If command completes successfully, and output returns no errors, TSM B/A client can communicate with TSM server, proceed to starting TSM B/A scheduler daemon
Start TSM B/A client scheduler daemon (on AIX, replace /opt/ with /usr/):
/opt/tivoli/tsm/client/ba/bin/dsmc schedule >/dev/null 2>&1 &
Ensure TSM B/A client scheduler daemon is running:
ps -ef | grep -v grep | grep dsm
NON-Linux unix OS'es:
ps -ef | grep -v grep | grep tsm
You should see output similar to:
root 4561 1 0 Aug 13 ? 58:58 /opt/tivoli/tsm/client/ba/bin/dsmc schedule
No comments:
Post a Comment