Rude and Passive Aggressive SQL Error Messages

First off, some people are just “rude-deaf” It doesn’t matter what rude language or actions one complains about and they say, “No that wasn’t rude, the error message only said ‘Fuck you and your grandmother’”

Second, you don’t have to have an error message that says “Fuck you and your grandmother” to be rude. Most error messages crimes are being unhelpful and passive aggressive (i.e. hostility through doing nothing, like just watching someone on crutch struggling to open a door)

Incorrect syntax near ‘%.*ls’.
This message typically is filled in with something like , or ‘ A typical query is choc full of commas. Only a passive aggressive human would tell someone, “There is a spelling mistake in your term paper near one of the spaces” SQL error messages tend to identify the location of a syntax error by a chunky measure, maybe the statement, so the error could be anywhere inside a 1000 line SQL statement. And if the syntax error could provide the previous 100 and succeeding 100 non-blank characters with a pointer at where SQL first realized something went wrong, that would be helpful.

Warning: Fatal error %d occurred at %S_DATE. Note the error and time, and contact your system administrator.
First off, the people who read this either are the administrator, or there isn’t an administrator. You might as well swap in some other imaginary figure, like god. “Fatal error. Pray to your gods, fucktard”

The type ‘%.*ls’ already exists, or you do not have permission to create it.
Oh SQL, you know why this failed. Surely there isn’t a single method called ExistsOrLacksPermissions and the implementers just can’t decide why that exception was thrown. I think this error is rude or fucked up. You decide.

Finally, be helpful.
Is it really so hard to write a suggested fix? “Permission denied, execute “GRANT” command to grant permissions”

Google exists, who ever is working on MS-SQL ought to google all their own messages and just put a sentence worth of the internet’s collective advice into their error message.

Using unit testing principles to write better integration tests

My application is a database application. There is some logic that is easy to move into an application server where it is easy to do dependency inversion and unit testing.  That leaves untested all that code in the database, the stored procedures, triggers, tables and so on.

Compilation Tests. Sql Server allows for SET NOEXEC ON, which compiles and checks many things for a batch of SQL commands, but doesn’t execute of it.  I have integration tests that run almost all my data layer code with NOEXEC ON and it reveals malformed parameter lists and other issues that would normally only show at runtime.  The tests run very fast, do not change the state of the system, and they don’t interfere with each other.  There isn’t anything to assert except that no exception was thrown.

Indempotent Tests. If your code has working transactions, then you create a root object (store), the child object (books in stock), etc as if the database was empty.  Then when you have enough objects to record a book sale, record the sale.  Assert something useful like numbers of rows affected, size of primary key etc.  Then roll it all back.

These tests can run fast, but might not depending on the particulars of the transaction.  For example, a stored procedure to calculate the payroll might be slow if the database has a 100,000 employees.  A good idempotent test will skip over any of these slow tests

Idempotent tests leave the database unchanged as long as your transaction code isn’t buggy.

Additive Integration Tests. Additive tests look a lot like the idempotent tests, except they don’t roll back.  Read world database code fails on account of how the data looks just as often as it fails for how the code works.  Not all code failures should be fixed by writing more code.  If common sense is violated by a column containing nulls 1% of the time, then it isn’t possible to write code to do something other than fail when that column is null.  But we can’t discover that our code and our data aren’t in harmony unless we run this test.

A good additive test is mostly inserts and driven by existing, real or highly realistic data.  An example would be taking the entire inventory of books and re-adding them with slightly different names.  If a row that exists now can’t be inserted with existing code, then one of them is wrong.

Additive tests leave the database modified, but because the transactions are mostly inserts, tests aren’t likely to be competing for the same rows and tables.  If performance characteristics warrant it, consider running it in a transaction and rolling back.  Better databases can rollback large numbers of transactions rapidly, worse database will take huge amounts of time to rollback or hang the server.

Destructive Integration Tests. Destructive tests operate on the existing data in a way to make it unsuable for other tests.  Destructive tests primarily do deletes and updates.  For example, if we imagine a test that looped through all books in inventory and recorded a sale, we might be able to discover some books fail to be sold when run through the sales code.  Either we have a code problem or a data problem and that is something we’d like to know.

At the end of the test, the database may be logically correct given the transactions you’ve run, but all the books have been sold, all the employees have been fired, all the book descriptions overwritten with lorem ipsum.  Like additive tests, these can be rolled back if your database can do so without performance problems.  Otherwise,  you will want to let these transactions commit and then restore the database from a backup.

Destructive tests are the least like unit tests and the most like the integration tests your unit test text book warned you about.

All of these proposed tests are, except the destructive ones, in my opinion, safe for putting in a build server nightly build.  Obviously none of these tests are safe for running on production and better integration testing is no substitute to refactoring, dependency inversion and writing true unit tests that are independent of the peculiarities of the persistent data store.

SQL Express on Windows Home Server

SQL Express 2008 Hates WHS. So I have Windows Home Server. Finally, I think, I can run a publicly accessible Windows 2003 box with SQL from my bedroom and the licensing is all pretty legit. I try to install SQL2008 from the Web plantform installer. It barfs. I read this blog post and try again, without the platform installer. It appears to install, but then it fails to start. The error logs claim that the master DB is on an E: drive. I try to uninstall. The un-installer says the attributes on the folders are different, so it refuses to uninstall. Reported bug on MS connecte.

Ok, 1/2 GB of space on my WHS burned with no way of deleting it. I even tried zipping the folder and then deleting the program files\Microsoft SQL folder, but it says some file is in use. Sigh.

SQL 2005 Reporting Services Hates SSL certs on IIS 6 on WHS. So I try to install SQL Express. Reporting services barfs on install (“SQL Server Setup failed to retrieve the SSL Certificate Name”) because it doesn’t like the perfectly valid SSL certificate. The web says rip out your SSL certificate and SSRS will install. Right, like I’d want to rip out my left eyeball. WHS doesn’t have a louse selfssl cert, it has a green ssl cert from homeserver.com. Green! Do you know how much it costs to get a green certificate? So I try again without RS. This time, the SqlNativeClient fails. It says it can’t find an MSI, sqlncli.msi. (follow link for the real fix, which is to uninstall the snc from previous attempts) And the install fails. I suspect it is because the installer automatically unpacks to the D:\guid folder. But the D: drive is some super weird NTFS configuration that most programs don’t know how to read or write to, unless they use the UNC or shared folders.

I finally succeeded before I had to try MSDB or MySQL. Next time, I’m going to seriously consider MySQL probably with XAMPP because XAMPP hasn’t sold their soul to the registry, the team that uses MSI installers and other evil things.

The best SQL Stored Procedure Header EVAR

/******************************************************************************
** File:		my_stored_procedure
** Author: 		John Doe Programmer
** Copyright:		ABC Corp 2009
** Creation Date:	1/1/2009
** Description: 	Return data
** Version:		2.9
** Build:		291
** Number of bugs:	13
** Specification #:	13.123.123.12.312.3.123.234.34.5.2345.324.5.345.23.45.2345.23.45.3.45.3245.345.3.45.2345.3245.34.5.3245.2345
** 	Please refer to the requirements tracibility matrix to see how this beautifully links to the following requirements:
**      13.123.123.12.312.3.123.234.34.5.2345.324.5.345.23.45.2345.23.45.3.45.3245.344
**      13.123.123.12.312.3.123.234.34.5.2345.324.5
**      13.123.123.12.312.3.123.234.34.5.2345.324.5.345.23.45.2345.23.45.3.45.3245.342
**      13.123.123.12.312.3.128.234.34.5.2345.324.5.345.23.45.2345.23.45.3.45.3245.342
** Bug tracker ref:	Http://bugzilla/proc=12
** Temperature:		70
** Cubicle Size:		12x8
** Number of characters of text:
** Percent ratio of whitespace to letters 'e':
** List of people who think this is buggy:
** My favorite colors:
** My favorite pony:
** My boss's name:
** My boss's wife's name:
** Birthday of Dr. Codd:
** Description of what we think Dr. Codd would have to say about this stored procedure:
** Lawyers to call if you need to ask questions about the copyright notice:
** Number of colums referenced:
** Acronym list: WTF, STFU, TANSTAFL
** Abreviation list:  
** Misspellings we can't fix because there are too many dependencies on them:
** 10 ways to improve your code through comment headers:
** Phone number to call in case of emergency:
** Phone number of 911 service:
** Numer of time executed:
** Comment template:
**
** /***********
** ** File:		my_stored_procedure
** ** Author: 		John Doe Programmer
** ** Copyright:		ABC Corp 2009
** ** Creation Date:	1/1/2009
** ** Description: 	Return data
** ** --TODO: Finish copying template into all headers
**
**
** Db Objects affected list:  (this will be out of date on first edit)
** Called by: (This will be out of date, um tomorrow)             
** Return values: (This will be out of date, um today)
** Parameters: (because we should repeat source code in comments, its's good for you)
** Input					
** ----------					
** @ID  This is the ID. ID is short for identifier. Can you say 'aye dee'? I knew you could.
** @NEW_ID  This is the NEW_ID. This means "New Id".  Don't be confused by the underscores!
** Output
** -----------
** 0 or 1 or both.
** TODO: (INSERT ASCII ART HERE TO SUPPLEMENT BEAUTIFUL BOXES)
*******************************************************************************
** Change History
*******************************************************************************
** Date:		Author:				Description:
** --------		--------			-------------------------------------------
** 1-1-2009		Larry Smoe			Added header
** 1-2-2009		John Doe			Removed header
** 1-3-2009		Larry Smoe			Restored header
** 1-4-2009		John Doe			Removed !$!#$ header
** 1-5-2009		Larry Doe			Copied header to new stored procedure, 
will update someday. with new text 
** 1-6-2009		Larry Doe			Fix row of astrixes, there were 4 extra!
** 1-7-2009		John Doe			Removed all ***** ascii art
** 1-8-2009 8AM		Larry Doe			Started adding unicorn ascii art
** 1-8-2009 9AM		Larry Doe			Coffee break
** 1-8-2009 1PM		Larry Doe			Lunch break
** 1-8-2009 3PM		Larry Doe			Finished adding unicorn ascii art, golf time!
** 1-9-2009		John Doe			Removed all ***** ascii art
** 1-8-2009 8AM		Larry Doe			Added "--Get Rows" comment to all select statements
** 1-9-2009		John Doe			Removed all !#$#$ "--Get Rows" comments
*******************************************************************************/

Business Intelligence, Small, Medium and Venti

The world “business” in the field of IT is heavily overloaded with surprising meanings.  For example, the fact that an street address has a maximum number of characters of 50, is sometimes called a business rule.  And so is the algorithm for calculating what your life insurance premium.

“Business Intelligence” is just as surprising.  Business intelligence is using computers to gather actionable information to be used in business decisions.  Often, this works with some privative technology, such as summing or sorting.

Small Business Intelligence
In short, small business intelligence is anything that can be achieved easily with ordinary SQL.

Finding your best customer, is a matter of  summing and sorting.

Finding the worst deadbeats, is a matter of sorting your accounts receivable by age and size.

Ratios and Dashboards. A businessman wants to know, am I making money?  That’s easy, subtract costs from revenues.  But it doesn’t adjust for scale.  Am I making a lot of money compared to other bigger and smaller companies?  Profits/Revenue gets us a ratio that can be tracked over time, benchmarked against other similar organization and son on.  The challenge of these ratios is that while the math is simple, assembling that much data is complicated, mostly by the fact that the more data you are moving around, the more problems you run into.  Deciding how to present the results as something other than a sea of numbers is also complicated– it is the user experience challenge of the database world.

Ratios analysis can quickly turn into my next category, especially if you want to have the profit rate for every department, office, district, country, etc in your organization– a trick that requires cross tabs and lots of them.

Business Intelligence is also confused with making the above look pretty, so often reporting products call themselves Business Intelligence, even if they really are only meant to deal with small “business intelligence”

Medium Business Intelligence
Crosstabs, also called pivots are the next level.  If you have 2 discrete variables, such as product category and state, you can find out what states are enjoying the full range of your products and which ones aren’t.   If one cell looks like every other, then we don’t have much to say, except the variables are probably not related to each other.  If they cells are very different, then you have a non-random patter and it probably means something, like widget X isn’t being advertised enough in Idaho.  In stats class this would have been Chi-Square statistics–surprisingly, chi-square statistics are never reported with cross tabs in any common database tool I’ve used.

There are two technical challenges with working with crosstabs.  First, SQL was never meant to support crosstabs. Relational algebra (which SQL is a type of), is all about two dimensional tables with columns, not multidimensional cubes of data.  Hence the need for MDX, yet another SQL-like query language that concisely queries cubes to return particular cross tabs of data, after all, we don’t have a very good way of visualizing crosstabs of more than two dimensions.  MDX also has an expression language which has the feel of excel formulae, especially in the way that

The final thing that makes business intelligence base on cubes and crosstabs hard is the combinatorial explosion of report types.  Cubes are often searched for unusual patterns by successively querying similar queries, sometimes at increasingly granular levels (drilling down), or some other ad hoc pattern.  This represents more canned queries than can hoped to be created by a finite IT department, so various cube browsers have been invented to allow non-database experts to craft an MDX query without realizing that they are crafting an MDX query.  Not unlike SQL query designers, this has mixed success as MDX is more expressive than what most cube browsers are capable of illustrating.

Big Business Intelligence
Big business intelligence is mostly the domain of  professional mathematicians and statisticians and includes data mining, multiple regression, and every other advanced statistics modeling technique that has ever been imagine by feverish mad professors.  The professional mathematicians are the only ones that have the credentials the business community wants to see before they will trust a mathematical model they don’t understand.

Microsoft has attempted to introduce “black box” data mining.  It is so black box that it is hard to figure out how to put data into these models and harder yet to understand what we are getting back.  This is a real tragedy because Microsoft could have figured out how to bring the power of advanced college statistics to the database world without resorting to proprietary black box algorithms.

By advanced college statistics, I mean things like ordinary least squares regression, monte carlo optimization and other models that can be understood with out a PhD.  Right now, doing this requires using the analysis features of excel, SAS, Mathematica, Gretl, etc.

Sadly, this level of business intelligence is likely to remain in the far future.

Embedded databases for speed and simplicity

(blog entry brought to you by time snapper.  This entry would have been lost to an electricity outage if not for my screen recorder.)

When I want a relational database, I might want something that can handle a billion transactions per second, never loses data, and can be updated by 1000 users simutaneously without corrupting data.  If my application just needs a single small table, the overhead of a relational DB is overkill.

[likewise for ETL scenarios-- writing SQL based ETL code in a full-blown RDBMS for an ETL process incurs a lot of overhead that just slows data processing down.  On the otherhand, dropping back to using text isn't very good either as you forego the power of SQL]

In particular I want an embedded database that is:

  • free
  • works in ASP.NET & ADO.NET
  • supports all important SQL statements
  • is very fast
  • does’t require setting up a secure service (i.e. daemons or windows services)
  • Support for bi-directional databinding in .NET
  • Doesn’t require an installer/COM component registration, etc. i.e. nothing that would be a barrier on a hosted account
  • Runs in medium trust
  • Secure against executable code (should be able to call OS shell functions like one can in MS-SQL, MS-Access and the like)
  • Can zip up the data with the source code and send in an email

MS-SQL 2005, MySQL, etc

These require an install, administrator rights to set up, some non-trivial know how to configure the database and users.  Nope.  Too much work.

Text and Excel

Database drivers for text files tend to be pretty primitive, low performance and often don’t support updating.  Excel has limitations on the number of rows you can deal with.

MS-SQL SQL Compact/Mobile Etc

Doesn’t allow for hosting inside of an ASP.NET worker process. End of story.

XML

Not supported for bidirectional databinding with default ASP.NET controls.  I’m pretty sure this works in medium trust

MS-Access/Fox Pro

If MDAC is available, you probably can use the Microsoft.Jet.OLEDB4.0 driver. 

Berkley DB

Berkley DB was bought by Oracle.  Oracles distribution is slanted towards support of the Java world, but a .NET driver does exist.

SQLite

See my blog entry on SQLite.  Of all my options, this is the one I liked the most.  It runs well in ASP.NET, can support large numbers of users, has Visual Studio support (better in the full version than the express versions).

SQL2005, Windows 2003 and Licensing and Technological Choices

CPU. Licensing gives you a strong incentive to stick to single socket CPU designs. Multiple socket motherboards have hefty penalties for SQL 2005, as you need a license per CPU.
Memory. At the 2GB cutoff you have to move to Windows Server Standard from Wed Edition. After the 4GB cutoff you have to move to Windows Server Enterprise. Similar issue with SQL2005– after 4GB, you need to swith to Enterprise Edition. On the otherhand, the free SQL Express edition merely doesn’t use the extra memory.
CAL Counting. Some CAL counting scenarios are easy. Others are mind bogglingly complicated. So far, most people tend to search for licenses that don’t have legal or technological CAL limits because it is too hard to count users. Terminal Service and RDP servers are one are where CAL counting is enforce by license and technology, so CAL counting is important there. Everywhere else, there isn’t a CAL technological constriant. So when faced with counting users in a complex environment with some users using file shares, some using RDP occasionally, some printer users, some intranet IIS users, some internet IIS users, some people with local accounts, some with domain accounts, some with neither…I suspect most administors throw up their hands in frustration and stop trying to count unless there is a technological contraint.
Authentication. Your choice of authetication method can trigger CAL counting. This old technet page implies if you use integrated authentication and local users, you will trigger CAL counting. This means it is cheaper to write applications that avoid integrated authentication! This Oct 2000 article agrees:

“One more item of interest: Microsoft requires that you purchase a Client Access License (CAL) to enable a client to log on to a Microsoft server. Microsoft waives this requirement for Anonymous access to IIS servers, but if you authenticate a user against an account in User Manager, you need a CAL. If you plan to have many users authenticate to an IIS server, you can end up making a significant investment in CALs. If you use a non-Microsoft means of authentication, either custom or third party, you bypass this requirement but at the expense of not integrating with NTFS permissions.”

I couldn’t find anything that specificly mentioned domain accounts, but I supposed by analogy, if you use integrated authentication for your intranet application that will trigger CAL counting.

Named SQL 200 Instances

SQL 2005 uses SQL Browser to route traffic to the various instances on a box.

SQL 2000 uses “SQL Server Resolution Protocol” It listens on 1434.  If someone forgot to open this port, you migh thave a hard time seeing the named instance.  Unlike SQL Browser, there is no obvious way to see if SSRP is running, except running portqry (on the client) and netstat -an (on the server). Also, it appears that SSRP dynamically determines the port of the ports of the instances, so they might be hard to predict if you are trying to communicate through a firewall.

You can sidestep the issue by:

Statically setting the port of the instance, say to something that likely is open, like 1433. Do this using the SQL 2000 “Server Network Utility” on the server side.

You will need to update your connection string to use a COMMA to specify the port.  This is highly non-intuitive, as most protocols use semicolon.

E.g.

MYSERVER,1433\MYINSTANCE

This would probably work for other ports, like port 80.

Here is the KB article I used. And this one too.

Pervasive Data Integrator: Title pending…

[Update]

I finally found Fernando Lambastida‘s blog.  I called him, he apparently is associated with Pervasive.  I told him who I was, a lowly techie in a big software consulting company that needs to evaluate/work with/play with some ETL bits.  I said that I saw the offer for a 14 day ‘free demo software‘ on his blog and he offered to help me work with Pervasive to get a trial key.  If your calls to Pervasive are channellings you through the sales sewer pipeline, instead try Fernando, he’s a good chap.

[Okay back to what had transpired before]
So I got a job interview coming up. They want a Pervasive ETL expert. Pervasive is a relatively uncommon ETL package in a fairly narrow vertical market: ETL. So I do what seems reasonable, I download the free evaluation trial. I install it, read the documentation. I almost grok it, so I try to open a designer. The designer asks me for a license key. Hmm. I check the email. Yup, it says I have to contact them to get a key. Website says so, too. Trial keys require a phone call to Pervasive. Okay. I’m a developer, I understand that companies need to defend their IP. So I call.

I call the company and eventually get transfered to a Chris, who said if I was interested in pitching my skills to a company who had a copy of Pervasive, I had to get a license key from that company or that company’s sales representative!!! This is very weird, he can’t possibly mean I need to illegally obtain a license key from an existing customer, does he? And Chris says under no circumstance can I be given a key.

So I call them back and ask not to be transfered to Chris and now I get a Howard. Howard likewise said no, no demo key.

So if anything, Pervasive‘s website is highly misleading about the availability of an evaluation version.

If you came to this post researching Pervasive, I recommend that you look into something else. There are too many choices available, from ad hoc, to RhinoETL, Jasper ETL, DTS, SSIS, Oracle DataIntegrator, Informatica–why settle for dicey customer service?

Data is the lifeblood of an organization. If the data fails to flow, you’re out a job and the company will be losing money.

Buying tools from companies that are actively hostile to developers is like loading a metaphorical gun and pointing it to your company’s head.