April 06, 2005

SQL Server issue: sysfiles1 have more records than sysfiles

A few days ago, I found a newly added log file was not displayed in
Enterprise Manager. That shocked me as we did move the databases
between production server and staging server on Mar 17 and the day
after. Obviously something went wrong.

Before the move, the staging server has the same data folder as what
of the production server three months ago. Therefore basically these
two servers are identical except that I added secondary data and log
files to certain databases. And the staging server was running on
VMWare GSX Server.

As far as I can remember, here were what we did that night,
1. stop sql server service on both servers
2. copy non-MS databases from the production server to the staging server
3. start sql server service on the staging server(I guess this is the
root of the issue)
At the moment my team leader reminder me I should copy all the
database files in the SQL Server data folder in order to eliminate the
configuration chores which I agreed. So I wonder whether I should copy
the databases from the very beginning again after deleting all the
database files on the staging server. After the second thought, also
confirmed by my team leader, we thought that copying the MS databases
would suffice since copying the whole non-MS database will take us
another two or three hours to 11:00PM.
4. stop sql server service on the staging server
5. copy MS databses including master, msdb, model, tempdb from the
production server to the staging server
6. start sql server service
7. miscellonous testing

The most tricky part is even though at the 4th step the SQL Server was
actually running with inconsistent configuration data, SQL Server
didn't complain. And at the 7th step, all the tests returned
successfully.

Now facing the issue of missing entried in Enterprise Manager, I
started out to google. It turned out nothing useful, neither does
Yahoo and MSN. The only similar case is at
http://dbforums.com/archive/68/2002/06/1/397365 which someone said
sysfiles1 had more entries than sysfiles. As I checked later, I was at
the same situation.

For the time being, the issue successfully defied all my attempts. I
tried to asked on experts-exchange, no response, csdn and Microsoft
newsgroup, people had no advice on how to fix the issue but all
reminded me that we shouldn't do this and that etc. And MS said we
were not eligible for support since our SQL Server 2000 was bought on
a global dicount scheme. Now I can only rely on myself.

I tried to repeat the issue on another SQL server with all the steps
mentioned above, nope, the issue refused to appear again. Then I
modified the status field in sysfiles1 to values other than 32784 and
32770, Enterprise Manager went into similar erratic state. But this
was not exactly the same situation, for if so the affected
databasewon't be able to attached or restored any more while affected
databse are recoverable on the production server by either attaching
or restoring even though the mismatch went on with the recovery.

I could update the status field in sysfile1 back to 32770 and 32784
then the problem will be gone but we are afraid of any unknown risk in
modification to undocumented system tables.

Just have no way out.

No comments: