Monday, May 21, 2007
Well firstly sorry for not blogging for such a long time. I was really busy at work as Microsoft released the Security patches on the 2nd week of May and as I am in the Patch Mgmt team, I was fully involve of patching all the servers in my Company via Msoft SMS. Had so much of problems at work this time around that I even missed meeting up an old friend who was down in KL and of course missing Malacca Barsi as well.

Basically we had this problems:
1) We had HUGE backlog in DDR on one of our central servers. I had to stop all backup services to allow the backlog to be cleared and of course keeping a constant watch on all the corrupt ddr that was creating the backlog. The backlogs were not clearing off resulting that I had to move all the Backlogs to another temp new folder to allow new DDRs to be processed.

2) After one of our primary servers for sms were rebooted, there was error on the transaction replication with the error: "The agent is suspect. No response within last 10 minutes." I tried increasing the polling interval to increase to remove the error msg but it did not work. Tried connecting to our NLB but it kept showing that I am not authorised. Even the engineer on pager holder duty was not able to sort this out. At last found out that the permission for SMS admins were removed. Added that and tadaaa it worked like charm

3) Well on Monday we found out we were not getting any report back to indicate that if our servers were successfully patched or not. So we had to generate the report from all of our primary sites; which resulted that one of our primary were not even showing any reports. After some troubleshooting we found that the primary site had 50gb of data in Dataldr. And most of the data that was being process were of March and April. (well clearly the engineer who was suppose to keep an eye on the infrastructure of this server was not doing a good job). So another horrid day of moving data out (eventually I just deleted the data for March and April) and in the mean time was transferring data for the month of MAY from dataldr SMS\inboxes\\process folder and we actually found the root cause of the problem; another engineer created a script and advertisement for clients to report back full hardware inventory to SMS everyday in the month of April!!!

4) And of course another Primary server NLB was down for 4 hours. (this is still being checked as to why etime the Primary servers for SMS are rebooted there will be NLB down on those servers).

5) Central server is having around 50gb of data in dataldr that is being processed. So all hardware and software inventory were stopped in all the primary to allow this data to be processed over the weekend.

This was just the some of the problems we faced this time arnd. There were of course other problems that were faced and of course loads of other nitty gritty work that we had to do & of course my daily operation work that I have to do.

Well it is fun at times when you can figure out what is going on but when you cant thats when things get really frustrating. Well the % of successful patching via SMS increased this time around (thank GOD!). The remaining of servers which failed to be patched were patched manually. We have around 2300 servers :)

Basically thats what happened for the past 2 weeks.. Next week I am gonna go and hunt a new digital camera for myself b4 my team trip to tioman. It will be either a Sony or Canon.. still have not made my mind up. Okay ta!

Labels: , ,

posted by Jagjit Kaur at 12:00 AM | Permalink |