Unions vs Outter Joins

If you have one table of things, say bills that need to linked to a large number of subsidiary tables that have a wide variety of structures, you can:

Do pairwise joins between your fact table and your subsidiary table then union all of these together. This runs the risk of losing rows or doubling rows if you have data quality problems (having a fact described in two subsidiary tables or in no subsidiary tables).

The better solution is to have the fact table link by an outter join to all of the subsidiary tables. This will result in a bunch of columns with either a null or the description of the row in the fact table. COALESCE can be used to pick the first non-null value. This way, you never lose fact rows, you get a description picked at random if there are more than 1 descriptions in the subsidiary tables and you get a null if the description can’t be found.

Things DTS needs

1) STOP-GO A task that can be quickly set to always succeed, or always fail
2) Run Access ‘action queries’ in batches (10 or so access queries executed in a row, graphically)
3) Send table (turn an access or SQL view or table into an excel spreadsheet and send it)
4) Drop table (which always succeeds)
5) Execute Access ‘action queries’
6) Create Excel spreadsheet

Currently all these very common tasks require a big hassle.

DTS also needs to:
Make the graphical taks bigger and make them visably change when properties change, for example, the icon should have a big red circle with line through it if the task is disabled.

Droping tables in T-SQL

Ideally, we want to have a function that will drop a table if it exists, do nothing if it doesn’t exist. This can be done in T-SQL or DTS. In DTS we create a single DROP TABLE statement into SQL task and then have the exit arrow set to ‘on completion’ so the code goes on even if the table didn’t exist. This can create a lot of clutter on the DTS drawing board.

Writing pure T-SQL makes this a pain, but with a T-SQL function, at least you can shorten the code.

IF dbo.TableExists('myTable')=1 -- THEN
drop table myTable
-- END IF

The DDL for the above function is :


FUNCTION dbo.TableExists(@tableName varchar(250))
RETURNS bit
AS
BEGIN

DECLARE @OUT bit

IF EXISTS(SELECT TABLE_NAME FROM DATA.INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = LTRIM(RTRIM(@tablename)))
SET @OUT = 1
ELSE
SET @OUT = 0

RETURN @OUT
END

The database name needs to be changed for each database where the function is deployed. Also, notice I’m returning a bit and not a boolean, (which I’m not sure if it is possible), so you have to check for equality to 1 or 0 when using the function.

TSQL programming with MS-Access

Pros
Fast
Doesn’t corrupt
Can handle a lot of data (unless you are using MDBE)
Some DDL chores are easier in access than enterprise manager
Gives a MS-Access front end to SQL6.5
TSQL is a more expressive SQL dialect than JET SQL

Cons
Hard to use VBScript functions, VB functions can’t be called from inside a stored procedure.
Impossible to figure out how to change connection timeout
Impossible to figure out how to globally turn off the ‘rowcount 10000′ feature.
On account of the previous 2 problems I run almost all of my queries in Query Analyzer
Requires extra step to graphically update joined tables
Screws up the formatting of TSQL– default formatting isn’t very readable
Can’t graphically view common constructs like CASE/WHEN/THEN
Defaults to creating a table valued function instead of a stored procedure
Have to split data accross lots of tables if you are using MSDE
Heterogenous joins between local and linked tables have lousy performance (as compared to joining local tables to linked tables in an Access .mdb)
Linked tables have cumbersome names, like SERVER_DATABASE_DBO_TABLENAME, which is a pain when the “SERVER_DATABASE_DBO_” part never changes! Why couldn’t they have used a shorter alias?
Can’t see date/time a table or procedure was created (although the data can be seen in enterprise manager!)

Tricks
To graphically update the join of 2 tables, create a view and then update the view.

Visual Programming

Visual programming includes DTS, all GUI and form builders, SQL query (also called query by example or QBE).

It is faster and easier, especially for forms and SQL queries with lots of joins.

1) Code off the screen holds hard to find bugs.
2) Exact positioning is a productivity killer (automatic layout tends to make a mess)

W3 Standards

If you want to use XML & XHTML forms you will need to know:

The XML standard– XML repurposes & cleans up HTML so it can be use for arbitrary knowledge domains. No prerequisites, but to use it well, you need to set up DTDs & Schemas.
XML Schema standard– Defines what a document should look like and what the data types are.
The XSLT standard– Allows you to transform XML into different XML or into non-XML like fixed width files or HTML. Requires knowing the XPath standard
XPath standard– This is like SQL WHERE clauses, except for the data is in XML. Takes getting used to because unlike SQL WHERE clauses, the data being queried is heirarchial.
The XHTML standard– This is the set of rules to be applied to a HTML standard document so that it is tidy enough to be used as XML. It requires knowing HTML4.01
HTML4.01– This standard is huge– 380+ pages.

XSL Pains

Usually you want to format either the text inside of a set of elements or the value of an attribute in a set of elements (usually a repeating element). This poses two problems, what is the best way to identify the the set of elements and what indicates which values you want.

  1. To get the data, try : value-of select="." or try value-of select="text()" or for an attribute you use value-of select="@attributename"
    I haven’t figured out how to get the data inside an attribute value that was indicated by the XPath expression (the way value-of select=”.” works)

  2. To point at the data you want use an XPath express. First try just the bare tag name template match="myTag"
    If that doesn’t work, try absolute naming template match="/root/row/myTag"
    If that doesn’t work try relative naming, template match="row/myTag"
    If that doesn’t work try wildcard naming, template match="row/*"

  3. For some reason, I can’t get /@* to work. (Gee that expression kinda looks like a comic book curseword)

SQL Server & XML Forms

If you are in a hurry and can’t build heavyweight rich user interface forms, (like VB6.0 forms) then you can uses a combination of XSL, XHMTL, and update grams.

Tricky parts:

  1. Set up the virtual directories per BOL. Try :

    .../myVirtualDirectory?sql=SELECT * FROM myTable&root=sometagname

    If your leave out the root part, you won’t get a valid XML document back.

  2. To get the xml back transformed, you have to put it in a ‘updategram’ with the following in the root tag:

    xmlns:sql='urn:schemas-microsoft-com:xml-sql'
    sql:xsl='myStyleSheet.xsl'

  3. You have to put call your update grams like this:
    myupdategram.xml?contenttype=text/html
    If you don’t the browser ignores the META tag & treats the document as XML.

  4. The XML document you get back from a successful update is blank. If not, you get a processing instruction- which is hard to transform into a pretty XHTML document. This kind of match will show the error message, but not clean it up.

    <xsl:template match='processing-instruction()'>
    <xsl:value-of select="."/>

More Thoughts on Using XML

One use for XML is for config files– but they are plaintext and thus a bad place to put passwords. The XML file can be encrypted, but I have discovered, config files are useful because they are easy to edit (as compared to registry, which has the world’s scaries warnings, or DB which require a SQL statement to update). Encrypting the file makes them a real pain to edit. Create two config files, encrypt the one with passwords, put everything else into the other config file.

XML Things I’ve learned

  1. If you are using an XML api, no problem, you’re XML documents will be valid.
  2. If you are using string concatenation, you’ll hit these problems: XML data needs to be escaped for the & MS Access has a XML import which works nicely, but it will look like the ‘tables’ it finds in your XML file are not available for import because they are greyed out. This is actually means they are available for import, but you have to import all of them or none of them and you can’t pick and choose
  3. MS Excel will open XML files, but it picks rather awful column names, made up of the full path (\\root\myxml\mytag) MS Access will use just the tag name for a column name. Access preserves the white space from CDATA stuff.
  4. Creating a fixed width XML layout requires: a two line header, a one line footer and alternating columns holding constants that could be tricky for the other programmer to fill in. Also, one must assume that the other programmer is using unescaped ASCII. For example:
    (This looks weird because I’m too lazy to manually escape my greater thans and less thans for blogger to display them correctly)

    ^?xml version=”1.0″^
    ^table^
    ^row^^col1^^![[CDATA ]]^^/col1^^col2^^![[CDATA ]]^^/col2^^/row^
    ^row^^col1^^![[CDATA ]]^^/col1^^col2^^![[CDATA ]]^^/col2^^/row^
    ^/table^

    From this we can see that the XML file can be created by concatinating

    tag + data + tag + data, etc.

    The programmer creating the interface file need not give up the beloved fixed with layout and need only include some cryptic & variable separators inbetween the data.