SQL Dumbass

Fighting dumbasses, one query at a time…

Why Code Review?

A colleague stops by to tell me their web app is running slow, some days are slower than others. I know the app a little bit and also know that he is right, it is slow in spots. So, I offer to take a look. I happen to know of a few stored procedures it uses, so i execute them and return results immediately. After poking around a bit more, I decide to fire up profiler and tune the settings to make certain only my connection is being traced.

Then, I go to the page that is causing the problem. The trace shows…nothing.

No activity for almost one full minute. Then, a bunch of lines, each query taking about a half second, and there are a few dozen of them, followed by more queries except these appear to be the same statements run three times, each taking a half second. then, and only then, do i see the procs called and the final results come back.

And what are all those lines? Well, they go something like this:

SELECT last_name FROM linkedserver_connection WHERE NAME = ‘domain\person’
SELECT first_name FROM linkedserver_connection WHERE NAME = ‘domain\person’
SELECT full_name FROM linkedserver_connection WHERE NAME = ‘domain\person’

SELECT last_name FROM linkedserver_connection WHERE NAME = ‘domain\person1′
SELECT first_name FROM linkedserver_connection WHERE NAME = ‘domain\person1′
SELECT full_name FROM linkedserver_connection WHERE NAME = ‘domain\person1′

and so on. This page today was going to display a dozen people, but some days it could be a lot more. The more people, the slower. Keep in mind, that for some reason the app did not even connect to the database for almost a full minute before this crap was even executed.

So, my friend wants to know what I can do to speed things up. My response? “Fire the person that built this, hire a six year old and a chimpanzee and have them work in an agile structure for about a week.”

He did not seem to think that would help, but I do not see how it could be any worse than what they have.

Shrinkage

So there I am, trying to get back in the flow of things during a short work week, when I get an email that brings a smile to my face. It would seem that a certain developer with many, many initials after their name (MCSD for example) needed some help. Apparently there was an issue with the server preventing them from running a query. The email read:


Please truncate the log file for tempdb on (server).  We’re getting an error message:


The log file for database ‘tempdb’ is full. Back up the transaction log for the database to free up some log space.


Now, my first thought is that perhaps, just perhaps, there is a chance that my query is causing this issue. But this rocket scientist has already decided that there is no way his query is the culprit, and in fact it is a problem with tempdb. Here’s an idea…why not tell us what you were doing when you got this message? Perhaps we could help you find out where the problem is? No? Okay then.

So, we respond back, tell them that tempdb is fine, there is lots of free space (naturally, since his trx was rolled back), and to try it again. It fails. Think he would make the connection yet? Unfortunately, no. So we go back and forth.

Them – “Please truncate again.”

Us – “Sure, done.” (It’s easy to get this done quickly since his trx keeps rolling back!)

Them – “Please truncate again.”

Us – “All set.”

Them – “Please try it again, the truncate does not seem to be working.”

Me – Oh, the truncate is not working? Let me get right on that for you.

Us – “Can you tell us what you are trying to run, perhaps we can figure out what it is you are doing that is filling up the log”.

Them – “My code has not changed, it must be something else. Are you certain the log is being truncated?”

Dumbass. Let’s see…your code has not changed…but you keep filling up the tempdb log file…could it be…the data has changed? Anything? Nothing? After a few more exchanges we get:

Them – “Well, we did migrate this code from a different server, so I guess you could say that something has changed.”

Brilliant. I can see how you earned all those letters after your name. Too bad none of them contain “DBA”.

Okay, so now we know that something has changed. Oh, and as it turns out, they may or may not be using the right version of the migrated code. And certainly the data is different, but why should that matter, right?

Thankfully I have no shortage of incidents like this one to keep me amused. It must be one of the reasons I come to work every day, the entertainment value here has got to be as good as anywhere else.



Turbo Button

This is an old story and one I could have repeated many times over the years.

There was an application developer who was testing one of his queries (amazing) and while he was waiting, asked if there was anything I can do to help the query run faster that he was running.  So instead of giving the usual answer of looking at the execution plan and optimizing it, I told him he should hit the turbo button.  This would help speed up his query on the server by 50%.  After a brief 30 seconds of trying to show him the fake key strokes, I said, ‘There is no turbo button smart developer (different words were used), try performance tuning your query.

It is amazing that a developer would think he could hit a button to increase the speed of their code.  It makes you really think about what they view as reality when it comes to optimizing the code.

Is Code Review Necessary?

I was reviewing procedures from one of our dev teams today when I saw this (names to protect someone and there were a bunch more parameters, and yeah, I doubt this, as typed would compile, work with me):


CREATE PROC
@PK int
@Val1 varchar(50) OUTPUT
@Val2 datetime OUTPUT
@Val3 int OUTPUT
AS


IF EXISTS (SELECT 1 FROM Table1 WHERE pk = @PK)
BEGIN
                SELECT @Val1 = Val1 FROM Table1 WHERE PK = @PK
                SELECT @Val2 = Val2 FROM Table 1 WHERE PK = @PK
                SELECT @Val3 = Val3 FROM Table 1 WHERE PK = @PK
END
ELSE
BEGIN
                RAISERROR(‘We don’t have any rows that match that pk’)
END
 
If I could guarantee a jury of DBA’s, I know I’d be acquitted… Instead, I rewrote it and tried to educate them
 
SELECT @Val1 = Val1
                ,@val2 = Val2
                ,@val3 = Val3
FROM Table1
                WHERE PK = @PK
                And Val3< > 42
 
IF @@ROWCOUNT = 0
                RAISERROR(‘We don’t have any rows that match that pk’)
END
 
The real query went from more than a second to less than 50ms and the reads against the table went from several hundred to 2. All that and I still have to justify to developers why they send their code through for review…

Why is the systems so slow???

We have a system that stages some data.  A job fires off and picks up the data from the stage db, pushes it to the real db, performs a few edits on it, and then goes onto the next record.  Sounds simple.

Customer calls to complain that the system is too slow.  10s of hours waiting for items to reach them…

I start digging to get better details of the system.  The above is a generalization.
What happens is the job calls a proc.  This proc says, get me the top 1 item from stage db.  Then calls another proc with this ID.  This proc in turn does some insertion into multiple tables.  The last of which has a trigger.  The trigger sends this ID thru a series of edits.  Once done, there is some clean up procs called for this ID and eventually the stage data is removed.  Thus returning to the original proc.  It then says, are there any records left in the stage table?  if so, get me the top 1 item from the stage table.  And repeats.
The trigger turns out to process some 139 rules.  53 of which are rules comparing one field to be equal to a value.  the rest are comparing one field to be like a value.  There is a priority to these rules.  If you look at the history, the last rule, is the one most picked, eventually, by the system.  So, 139 rules must be traversed before it can perform an action.  139 checks on values.  And keep in mind that this is in a trigger, while we are moving data from stage to real db.  If it takes 2 minutes to process all this, and we have received 50 new items in those 2 minutes, you can imagine the backlog. 

Thus the delay.  Poor job, on occasion, was running for 9 hours straight.  Trying to get everything thru the system.
So, a little research reveals that the priority of the edits could be drastically altered, and the most picked edits moved up the priority list.  This requires some work on the operations staff.  They are duely informed.  Nothing happens. 

Last night, same process occurred.  Same slowness.  I’m just waiting for the email to complain about slowness….

Where Am I?

The following story is true. The names have been changed to protect the incompetent.

Developer asks me to redeploy the database, but to be sure and deploy it to the “other SQLDEVSERVER/INSTANCE1.” I kind of didn’t hear this at first, so I went and deployed to the server and instance specified, SQLDEVSERVER/INSTANCE1. He calls me back and says that he can’t see the changes. I check my scripts and connections and the create date on the database. All good. I let him know. Well, he says, it’s still wrong. So I redeploy. It’s still wrong. He then reminds me, “Make sure you’re deploying to the other SQLDEVSERVER/INSTANCE1.”

At this point, I pick up that key word and tricky phrase. “What do mean, the other one. There’s only one instance name per server name.”

“Oh no,” he assures me, “all you have to do is put in SQLDEVSERVER/INSTANCE1,10001 and it connects to the other copy.”

I look, and sure enough, genius has been deploying his code to two different instances and didn’t know it. I explain how, substituting port numbers for instance names, changes the instance so that he was connecting to INSTANCE0, not INSTANCE1.

It took four passes through the explanation and I’m pretty sure he still doesn’t get it.

Thanks For Nothing

Working in a dot.com as a dba. This particular dot.com was extremely concerned with getting as much code out the door as humanly possible. They weren’t terribly concerned with the quality of the code or any of the data. We were collecting over a gb of data every day in SQL Server 7.0. To say the least, the system required serious babysitting, monitoring disk space, log space, growth, etc. The DBA’s spent a considerable amount of time, figuring out how best to monitor & respond to the crazed behavior we got in the system. But that didn’t sit well with the business majors running the company (“all from top schools”).

So we got the word one day, stop spending time messing with maintenance and spend more time churning out code with the developers. Less than a week later, one night a strange spike in activity (right after a new code release that we didn’t monitor) filled the log drive. We tried recovering the database, but nothing worked. We spent three straight days in the office, first attempting to get all the data back, then restoring an older backup, then fixing the code that kept breaking the system. After three days without sleep, we had the system back online. Our glorious kid managers didn’t send a thank you or an atta-boy, but they did send an email instructing us to begin monitoring the systems, especially during and after code releases.


All three dba’s quit within three months of that incident and the company went the way of the dodo less than a year later.

Make Certain It Is DBO

What a nice email today, a developer sent along a request to have some databases migrated and included some details regarding a sql login.

It seems that these databases are part of a vendor application. And it would also seem that this vendor requires this sql account to be a sys admin (of course!)

Well, the developer then goes out of their way at the end of the email to state: “Please also make sure that this sql login is assigned dbo privileges on the databases”.

Sure thing, I would hate to skip that step and have the app cease to function for you. While I am at it, let me add the login to all of the fixed server roles, and not just sys admin, just in case it needs that as well. In the meantime, perhaps you could glance through some documentation regarding database security. After all, you were hired to work with this technology, you might as well have an understanding of what it means to be a sys admin.

Locked out

We have an upgrade to an application being deployed and as part of the upgrade there is a database piece. The app team decides (rightfully) to make certain that no one tries to connect to the database while the upgrade is happening, and they tell their users to not use any apps that connect to the database. So far, so good.

Well, one account continues to open a connection, and no one has any idea about the account. I recognize it and point towards the person on a different team, as it is a service account used by a web pool to connect to the database. Unfortunately, it cannot be turned off anytime soon, as we do not host the servers making the connection. So, what to do?

I decide to put the db in single user mode, kill all connections, and have them run the upgrade. It fails, because the upgrade opens about thirty connections to the database. Why? Good question.

I decide to switch the database to dbo use only, take it out of single user mode, and have them run the upgrade. Before the upgrade starts, that service account connects again. Turns out the account is a sys admin. Why? Good question.

We remove the service account from sys admin and try to get things going again. It reconnects. Since it is not a sys admin, and not a dbo, and not an alias, we are stumped as to how it can connect. You would think that I would know the answer as to how it is getting in, but I don’t. Suddenly, my colleague realizes that we could lock the accounts.

Lock the account! Why did i not think of that! We lock a few accounts, just to be safe, and the upgrade gets underway, only a few hours later than desired.

slash?

We have a sql 2005 database server in mexico.  It has a couple dbs on it.  Backups should have been setup on it, right?  Well, no.  No backups.  No maint plan, no backup to tape.

So I go to the db, and create a maint plan.  Simple… let’s backup to the local drive, with sub folders.  Let’s then go and make the cleanup steps for all the db’s.  Simple.  Schedule it, and viola! we have backups!

Couple days or weeks go by, and I check back with the system.  No backups.  Hum… odd…

I look at the job history, and the maint job has failed each and every time.  It simply says it could not start step 1. 
I look into rights?  Is the sql agent configured correctly, running under a good account? Is the owner of the job correct?  Are the steps in the job setup right?  Linked to each other properly?  Anything…

After a while, i give up on that maint plan, and create a new one.  A simple one.  Backup 1 db only.  It fails too.  Same error. 

I finally turn to google with the error message, and come across someone that had a similar problem.  The job step has a property for the package.  It was missing a slash at the beginning of the value.  If i edited the step, and added the slash, all was well…  backups going, job succeeding, universe was righted.

Who’s the dumbass now?  yeah, it was me.

Get Adobe Flash playerPlugin by wpburn.com wordpress themes