SQL Dumbass

Fighting dumbasses, one query at a time…

slash?

We have a sql 2005 database server in mexico.  It has a couple dbs on it.  Backups should have been setup on it, right?  Well, no.  No backups.  No maint plan, no backup to tape.

So I go to the db, and create a maint plan.  Simple… let’s backup to the local drive, with sub folders.  Let’s then go and make the cleanup steps for all the db’s.  Simple.  Schedule it, and viola! we have backups!

Couple days or weeks go by, and I check back with the system.  No backups.  Hum… odd…

I look at the job history, and the maint job has failed each and every time.  It simply says it could not start step 1. 
I look into rights?  Is the sql agent configured correctly, running under a good account? Is the owner of the job correct?  Are the steps in the job setup right?  Linked to each other properly?  Anything…

After a while, i give up on that maint plan, and create a new one.  A simple one.  Backup 1 db only.  It fails too.  Same error. 

I finally turn to google with the error message, and come across someone that had a similar problem.  The job step has a property for the package.  It was missing a slash at the beginning of the value.  If i edited the step, and added the slash, all was well…  backups going, job succeeding, universe was righted.

Who’s the dumbass now?  yeah, it was me.

I geta call from Microsoft today

Gentleman says he is from Microsoft and has a couple questions about Microsoft databases.  He asks if I use the database…. I say yes, and in fact, i am editing a dts as we speak. 

He asks me what versions I have used.  We have both 2000 and 2005.  But nothing yet using 2008, or developers even using a CTP. 

He proceeds to ask me what i like most about the new version, SQL 2007.  No i didn’t typo the 2007.  He asked me that…. I correct him and say 2008.  He makes a crack about how funny the name is, that its 2008, when its being released in 2007… i sigh…

So, now, I am painfully aware of this individuals knowledge of the product…

He continues to ask me weird questions, and i provide apparently confusing answers.

At one point he asks me what my title is.  I respond, DBA.  He tries to guess what the letters stand for… and give a couple good tries, but fails.  As i explain the acronym, my wife cracks up in the background.  Even she knows what it stands for, and was mystified that this person had no idea. 

From there, the call went downhill.  Fun.

Lets join on two fields, wait, lets send those two fields to a functino to scrub them first, then join

Clients db is runing slow.  I hear about it thru the normal grapevine.  I look into Spotlight to see what is going on.  Sure enough, seems like its getting hit hard.  But before i can investigate more, it clears itself up. 
Few minutes later, slow again.  Clears up again.  Something up…
So start digging.  Notice a job is always running when slowness is reported.  Start asking questions.  Turns out some new code was rolled just the other day, that is called from this job. 
Lets look…
First proc i look into has an odd query.  Seems we are joining an invoice_no field between tables, but we dont know if this data has been scrubbed, so its being sent to a function to scrub it.
  WHERE  DBO.UDF_REMOVE_BADCHARS(H.INVOICE_NO) = DBO.UDF_REMOVE_BADCHARS(P.INVOICE_NO) 
Wow, nice code there… lets do a table scan… and this is done in multiple places, in multiple procs, which are all called from this job, that runs every minute… wooohooo

rollback.  rewrite.

(I know what you are saying now… how did this get by me?  well, we are not part of the code review process.  We just roll it.  ARGH!!)

I need the install key

I’m minding my business.  Working on this and that.  And behind me, near me, i hear someone asking for the install key for sql 2005.    I think to ask whats going on, but don’t and get engaged again in what i was doing.
Later that day, i hear someone else asking for the same thing.  Again, i get distracted with real work.
Later even, i overhear someone else asking for it via email and IM.  Then i finaly get an IM from someone, and in passing, they ask for a key…  I ask them why they need it?  Has your install prompted you for the key?  Im my memory, ive never put in a key… I know i have a key on the cd i have at home…  but my memory is fuzy.  ITs been a while since i installed it myself… So i ask the silly question.  Has the installation prompted you for a key?
The dumbfounded response was, “No, I have not yet tried the installation.  I just assumed…”
how many man hours spent on this today?

Change control process and roll documents

Developer creates a roll document.  Listing out all objects that need to be rolled.
When its time to roll to production, we review the document with multiple groups, assuring ourselves and our process that we have covered all the bases.  DBA takes roll document, and performs a sync between UAT environment and Production, syncing only the objects that are listed in the roll document.
Fast forward 1 day….
Production support realizes, thru a belated testing, that some functionality is not present.  Tempers flare.  Developers are blamed.  DBA’s brought in too help troubleshoot the problem….
Fast forward some time….
The roll doc that the DBA used was not the correct version.  While on the group call, the developer made last minute changes to the roll doc.  These changes never made it to the one the DBA used.  Some objects were missed. 
Yeah.  Another dollar.  Another broken process that is blamed on someone other than the creators of the problem.  Makes me want to truncate all data, and delete all users.  Fix that one…

Database log is full…hey database log is full… hey, hey… did you hear me??!!?!?

Database server has little space on 2 drives.  Database is rather large.  Backups can bee way too big.  We use Litespeed to shrink the database backups down.  Even so, backup files are large, compared to the size available.
Something happens and the log file grows way too big.  Backups are kept for 6 days, and hard drive fills up.  Of course this happens Sunday around 10am…
I spend some time removing files, freeing up space.  Enough to get the server limping to Monday.
Monday, i start freeing up space again.  Truncating the log file.  Getting the db ready for the day. 

Production support individual puts in a ticket.  Then stops by my desk.  Tells me, guess what…the log file is full, could i look into it. 

What do i do?  I snap at him.  Inform him ive been working on it all morning… and was working on it Sunday….
Then i go back to cleaning up the mess….

I love replication. I love senseless tickets.

We have a system that has some simple replication going. Used mainly for reporting.  Some data is gathered from 4 dbs, and stored in a reporting database.  Some of the things that are stored are quantities of different doc types.  These are displayed on the main menu.  For some reason, some of the menus are from the real db, while others are from the report db.  This can lead to a disconnect in the values, when replication has been turned off…

Fast forward.  System needs an upgrade of the product and db.  So replication is disabled.  Upgrade proceeds.  This takes a day plus of multiple man hours.  Luckily, the DBA is not needed.  So he goes on to other tasks. 
A day or so passes, and production support individual creates a ticket that the numbers are off… please investigate.  This finally reaches the DBA, and the DBA is reminded that replication was disabled, and never reenabled after the upgrade.  For some reason, this step is always skipped by the upgraders…
DBA reminds that the upgraders need to tell him when they are done, and replication can be enabled.
DBA reenables the replication.  Asks prod support to review to verify accurateness.  While DBA is requesting review… guess who puts in another ticket… yup… prod support guy lets us know again… numbers are off…

Oh, and this is about the 10th time this has happened.  Exact sequence.  Each time, DBA requests further knowledge of completion, so replication can be reenabled.  Never occurs.  Repeat.

Please help me… something needs to be done…but im not giving you any more information…

Production Support person creates a ticket for DBA.  Ticket tells DBA that system needs some process looked into and fixed.  Ticket is received by DBA at 5:45pm.  Ticket is a level 3 priority.  Production support person is long gone home.  Poor DBA looks at ticket, thinking to himself/herself, I wonder which DB Server this ticket refers too… which client it refers too… 
DBA happens to support 13 clients, some of which have 10+ DB servers. 
Thanks Prod Support person… this ticket will have to wait further information from you… completely counteracting the ticketing system and any hope of a timely solution from the DBA. 

Please Fix

The other day I get a frantic email from a developer. Seems that our team was asked to migrate a table from production to test yesterday and in doing so we did not preserve an identity column. As such, their load process was failing. So, rather than notify us that we had made a mistake, the developer decides to play “Junior DBA” and re-add the identity himself. And after it ran for about two and a half hours an error message appeared:

The log file for database ‘databasename’ is full. Back up the transaction log for the database to free up some log space.

At the end of the email were two words: Please fix.

Okay, we should be able to do that. Of course, we have other issues now. See, the log drive for this box was about 70Gb. He did not just fill up his log, he filled up the disk. Do you know what happens to other databases that cannot use the disk to write their logs? Yeah, I know you do. Too bad he did not. Of course, this junior DBA figured that with the database in simple recovery mode that the log would not grow, and assumed we should be able to fix things in about two minutes.

Why the hell not call us at the start? As a result, the server was out of service for a few hours as we tried to get things back in order. All the while he complains that if we had done things right to begin with, none of this would have happened. My thinking? Yeah, if we had never hired you to begin with, then none of this would have happened.

Anyway, this snowballs into more problems. Now people want to know why we are not monitoring the available free space on our disk drives. Huh? We are! We knew the drive had filled up! “Well, why didn’t you do anything?” Do what? Go to the box and start killing spids? I am not the one that kicked this thing off, you did. How about having some understanding of the basic technology you are being asked to work with? Once we saw the issue we starting working to fix it. Any idea how long it takes to undo a transaction that filled up a 70Gb drive? It takes more than a few minutes.

Get Adobe Flash player