Learning about Medium Trust

Remove High Powered APIs
No WMI calls, no SMO calls. The whole point of CAS is that you don’t trust your users to not figure out how to call a potentially dangerous API like WMI or SMO, so you mark the application as untrusted. This makes it impossible for you (or anyone else) to upload code to your website that calls these potentially dangerous API’s

WebPermission

My site was using a component that pulled and RSS feed from a third site and merged it into my page. That blew up on medium trust, but started working again when added:

<trust level=”Medium” originUrl=”http://del\.icio\.us/.*”/>

Presumably more URLs could be added to the list using the regex operator | (meaning or), e.g.

<trust level=”Medium” originUrl=”http://del\.icio\.us/.*|http://www\.yahoo\.com/.*”/>

Custom web.config Sections
My blowery HTTP component blew up on medium trust, but worked again when I added requirePermission=”false” to the appropriate place under configSection

<configSections>
<sectionGroup name=”blowery.web”>
<section name=”httpCompress”
type=”blowery.Web.HttpCompress.SectionHandler, blowery.Web.HttpCompress”
requirePermission=”false”
/>
</sectionGroup>

Under medium trust, you can’t read most web.config entries directly, except for the above noted pattern.

Response.Flush/Response.Close yeilds SecurityPermission error
Replace with Response.End and hope for the best

Learing about ASP.NET output caching

How it works
You add a directive, something like

<%@ OutputCache Duration=”3600″ VaryByParam=”none” %>

The page then puts the HTML output of the page or user control into the Cache object.  Next time you get a request, the rendering engine loads a cache control instead, something like an HTML literal that gets it’s data from cache.

To test your caching strategy, load your pages twice.

Gotcha: When the control come from cache, you can’t reference that control. 
So all the code behind that references the control need to check to see if the control exists (is equal to null or Nothing)

Gotcha: When you dynamically load a control, it isn’t the same data type if it came from cache

myUC = LoadControl(“~/MyUC.ascx”)

If the control came from cache, it isn’t of type, MyUC, it is a cached control.

Gotcha: Code behind in the user control doesn’t run when a user control is loaded from cache.  For example, if your control needs to register a JavaScript or CSS file (which requires modifying the parent pages <head> tag), it won’t run.

Gotcha: Tilda URL’s stop working. 
If you reference an image by something like:

    ImageUrl=”~/images/mypic.png”

The first time this is rendered, it will calculate the correct relative path.  If this is called again from say a subfolder, it will use the relative path that was correct for the page that called it first.

Gotcha: Programmatically clearing the cache requires adding a dependency to EVERY page and user control.

(I think changing web.config will also clear caches, but this is heavy handed)

General pattern for a cache clearing system: (Ref. http://aspalliance.com/668)
    Create a dependency object on application start.
        HttpContext.Current.Cache.Insert(“Pages”, DateTime.Now, Nothing, _
        System.DateTime.MaxValue, System.TimeSpan.Zero, _
        System.Web.Caching.CacheItemPriority.NotRemovable, Nothing)

    Make all pages inherit from a custom page class,  e.g.

        Public Class SmartPage
            Inherits Web.UI.Page
   
    In the load event add this:
       Response.AddCacheItemDependency(“Pages”)
      
    On an admin page, put this behind a button:
       HttpContext.Current.Cache.Insert(“Pages”, DateTime.Now, Nothing, _
            System.DateTime.MaxValue, System.TimeSpan.Zero, _
            System.Web.Caching.CacheItemPriority.NotRemovable, _
            Nothing)

When to Use
So far, given these gotchas, Output caching is best for:

Simple pages (easy to see what the dependencies are)
Almost static pages (especially if the dynamic parts are derived from slowing changing files, like web.config)
Completely static pages, such as document oriented pages.

When the page is complex, you may need to switch to data caching, where you cache smaller objects, like dataset that holds a state list.  Caching at an user webcontrol or page level might cause some of the gotchas I’ve mentioned, especially if the state list is on an otherwise highly dynamic page.

License Mixing from a CDDL standpoint

Wow I can’t believe how long this document got.  I just wanted to be able to get $25 bucks from ordinary people who use my software, but can afford to pay, to get code contributions from developers who want to use my software but don’t want to pay, and not give my code away to people so that they could sell it for $24.50 and discourage anyone from buying from me.   And I don’t want anyone suing me for relying on my product and suffering some dread fate.

So far dual licensing with CDDL looks the best.

Contributing new code is okay under the CDDL license, just follow the relevant rules to hold harmless, etc.

But what if I find a cool library, API, code snippet, DLL, control, JAR, etc that someone has released for free on the web? Can it be used with the open source project I’m currently working on?

Often…no!

Viral licenses, GPL, CDDL require careful consideration of what code you mix in with existing code.  In fact, a viral license may prevent change an existing code base from proprietary to open, depending on what licenses are already mixed in.  The FSF has a long list of GPL incompatible licenses.

For example, the Free Software Foundation says that CDDL and GPL are incompatible:

Common Development and Distribution License (CDDL)

This is a free software license. It has a copyleft with a scope that’s similar to the one in the Mozilla Public License, which makes it incompatible with the GNU GPL. This means a module covered by the GPL and a module covered by the CDDL cannot legally be linked together. We urge you not to use the CDDL for this reason.

Also unfortunate in the CDDL is its use of the term “intellectual property”.

[Sigh.  The link to the essay on "intellectual property" is disheartening.  Mr. Stallman is trying to say that property rights between physical and intangible things are somehow different.  A more rigorous understanding of ownership of property demonstrates that ownership is a bundle of intangible rights, like the right to use, dispose, sell an object.  The property right is to the intangible ability-- the physical object is just a distraction.  Stallman has not earned the standing to make a statement like "Economics operates here, as it often does, as a vehicle for unexamined assumptions."  I might as well be saying nuclear physicists are a bunch of raving lunatics and whining ninnys, despite me not know squat about nuclear physics.]

Or maybe license incompatibility isn’t so bad! 
LAMP applications are often developed as a single unit, yet they use GPL, Apache, Hybrid and PHP licenses respective, which the FSF says are incompatible with each other.

Dual Licensing
Make different versions that are released under different licenses.   So there might be a CDDL version and a GPL version.  Contributors would could contribute to the GPL branch and mix in GPL code.  Alternatively they could contribute to the CDDL version and take advantage of it’s superior terms.

Copyright and Dual Licensing
The choice to do dual licensing can only be done by the original project’s copyright holder. 

Merging and Dual Licenses
You can’t merge GPL improvments back into the dual licensed branch of code.  This is a strange restriction because when GPL is violated, the party that has standing is the copyright holder!  Unless the chosen opensource license gives users and contributors standing, I don’t see how they could stop a copyright holder from periodically merging the proprietary and opensource branches, unless the contributor retains rights to the contribution. 

If the contributor is retaining rights to the contribution, then he could take action against merging.  If the license says that contributions are automatically granted to the original copyright holder, then merging is OK.

If the contributor didn’t have the rights to his contribution in the first place, (say the contributor stole some proprietary code) then merging could pose a risk to the the original copyright holder trying to sell under a proprietary license.

Trade Marks and Dual Licenses
You might want to give the open source and closed source product a different name, since a trademark is a signal to the market that one company has done something to enforce quality.  An open source product uses different means to create quality.

Forking and Dual Licensing
Because only the orginal copyright holder can release a proprietary version, forks don’t normally pose a competitive risk

Purism and Dual Licensing
Purism is a sign of obsessive compulsive personality disorder being projected onto software licensing.  Just because some GPL users are OCPD doesn’t mean non OCPD developers should strive for a perfect ‘all GPL’ world.

What about LGPL?
LGPL specificly talks about allowing code mixing with dynamically link libraries, a specific technology for mixing code.  CDDL allows for code mixing on a file based level, which allows for mixing licenses that are statically or dynamically linked.  (This may be a red herring, since some LGPL licenses say “static or dynamic linking”)

Incentives LGPL and Dual Licensing
If I am a commercial developer and want to sell some code, I will look for open source licenses that allow me to mix free code with my proprietary code.  If I take an LGPL licensed work and put a thin layer of my code over the top, I can start selling it.  This would compete with the original copyright holder’s commercial version.

On the other hand, if I am a commercial developer, I might use the open source version as a trial version, and then buy the commercial license if I didn’t want to deal with the viral features of GPL or CDDL.

Viral to the Modifications, Viral to the Derived Works

In CDDL, you only need to make your modifications to the CDDL work opensource.  If you compose the CDDL code, statically or dynamically (what ever that may mean), the derived work doesn’t have to be made open source.

Can you change your mind?
You can change your mind with a proprietary license, but many open source licenses are forever.  Also, because the licenses are long lived and are based on licenses who’s official version might change!  So at the very least, you might want to pick a type of license that has something to say about versioning or that doesn’t refer back to the “latest greatest” license that some foundation has tweaked.

Blogged with Flock

Review: Pervasive Data Integration

I was about to review this product on the basis of their customer service, but fortunately I got a generous 48 hour trial license.  Ironic how Microsoft trusts developers to play with the full feature set of SQL Enterprise Edition (a product that costs upwards of $35,000) through either a three month trial or the Developer Edition.  Pervasive Data Integration costs between $10,000 and $20,000 and you don’t get a database to go with it.  Rumor exist about a 14 day trial license.

The tool supports a remarkably long list of data sources. If you happen to be in an organization where you have to put up with 100+ native data formats, this might be exciting.  In my professional experience, this is not a good or typical integration pattern.  More typical is that organizations dump their native data formats to various text formats before exchanging with other organziations.  Native and binary formats are too brittle (subject to breaking over time due to technological or other changes).  Text, especially fixed width layouts, was the universal data exchange format before there was XML or the like.  That said, native data exchange tend to have more meta-data in them, so there is less problem of data corruption as the data goes from native to text durring inter-organizational exchanges.

Also, FYI, if you do have to deal with a native data format, you might be better off using ODBC, JDCBC, ADO, OleDB or the like.  Interacting with native data formats is for applications where it is important to access specific features of the source data platform (like running PL-SQL, access to indexes), or performance reasons.  If you have extremely high performance requirements for ETL, then ironically, you will probably end up working a lot with text. 

The GUI is Java. So far I’ve only gotten a few JVM error messages.  After more than a decade, Java apps still don’t deploy very smoothly, if I had a nickle for everytime I got a JVM version error, I’d be rich.

The Data Integration uses a system of workspaces, repositories and thingies.  In practice, this means the source code files are stored in a xml database layered over the file system.  I got numerous error messages attempting to save files.  Apparently it is not good enough to save a file, it needs to go into this xml filesystem layer.

The Integration engine itself is without a UI at all.  The various designers have a UI, but feedback is mostly sent to log files.  The choice to use error logs instead of message boxes for the design time experience is bemusing and echos what I think is one of the backwards steps MS took going from DTS classic to SSIS (that is, moving more feedback from the UI to error logs)

I haven’t figured out how the Integration Engine works yet, so it’s hard to say if it is using buffers like SSIS, individual objects like DTS or some other as of yet undiscovered pattern.

The mapping tool is not intuitive.  The message box you get on first open is a strongly worded exhortation to study the documentation, i.e. they know the mapping tool is unintelligible.  I probably will not be able to grok this before the trial license expires.

Pervasive has fallen for the Wasabi patttern, that is inventing a programming language for a single application.  Pervasive uses RIFL, which is supposed to be some sort of VBScript rip off.  Why they didn’t expose a COM interface to their API and just use real VBScript, I don’t know. (Or anyone of a bunch of other scripting languages with broad industry adoption, I mean, Lua, javascript, you name it.)

Posted in ETL

Pervasive Data Integrator: Title pending…

[Update]

I finally found Fernando Lambastida‘s blog.  I called him, he apparently is associated with Pervasive.  I told him who I was, a lowly techie in a big software consulting company that needs to evaluate/work with/play with some ETL bits.  I said that I saw the offer for a 14 day ‘free demo software‘ on his blog and he offered to help me work with Pervasive to get a trial key.  If your calls to Pervasive are channellings you through the sales sewer pipeline, instead try Fernando, he’s a good chap.

[Okay back to what had transpired before]
So I got a job interview coming up. They want a Pervasive ETL expert. Pervasive is a relatively uncommon ETL package in a fairly narrow vertical market: ETL. So I do what seems reasonable, I download the free evaluation trial. I install it, read the documentation. I almost grok it, so I try to open a designer. The designer asks me for a license key. Hmm. I check the email. Yup, it says I have to contact them to get a key. Website says so, too. Trial keys require a phone call to Pervasive. Okay. I’m a developer, I understand that companies need to defend their IP. So I call.

I call the company and eventually get transfered to a Chris, who said if I was interested in pitching my skills to a company who had a copy of Pervasive, I had to get a license key from that company or that company’s sales representative!!! This is very weird, he can’t possibly mean I need to illegally obtain a license key from an existing customer, does he? And Chris says under no circumstance can I be given a key.

So I call them back and ask not to be transfered to Chris and now I get a Howard. Howard likewise said no, no demo key.

So if anything, Pervasive‘s website is highly misleading about the availability of an evaluation version.

If you came to this post researching Pervasive, I recommend that you look into something else. There are too many choices available, from ad hoc, to RhinoETL, Jasper ETL, DTS, SSIS, Oracle DataIntegrator, Informatica–why settle for dicey customer service?

Data is the lifeblood of an organization. If the data fails to flow, you’re out a job and the company will be losing money.

Buying tools from companies that are actively hostile to developers is like loading a metaphorical gun and pointing it to your company’s head.

Redirecting SNAFUs in global.asax

It seemed all so simple.  Check to see if the database is set up, if not redirect to an installation page.  I wanted the check to be done at session start instead of every page to reduce database queries.  Ideally, I only want this check to happen once per web.config modification, but I don’t know how to hook into that event.

First I put this into the global.asax at Session_Start
Response.Redirect(“~/install/installdb.aspx”)

I get a permissions error in Medium trust, but not in Full trust.

Then I try
Server.TransferRequest(“~/install/installdb.aspx”)

And I get
This operation requires IIS integrated pipeline mode.”

Apparently IIS5 and Cassini can’t deal with TransferRequest.

Then I try
Server.Transfer(“~/install/installdb.aspx”)

Finally, that works.

Blogged with Flock

How hard is it to make SQL Express easy?

Goal. Make it easy for a end user, say with half as many brain cells as a mollusk, to install a DB driven website.

Particulars. A typical database set up requires creating the database, setting up the logons and users for the anonymous web user, the ASPNET web user and maybe an application user or role. The user then needs to run a TSQL script to install objects. Finally the user needs to set the connection string in the web.config.

Solution so far. If the user has an existing database, user and connection string and can put it in the web.config, I can run the TSQL scripts for him and even detect and create the application user.

Speed Bump. Wouldn’t it be easier if I created everything in advance and put it into App_Data and connected to it using an user instance? Well, so one would think. It would mean you could use a mdb file without knowing: the server name, the credentials, the file location (except that it is in the usual App_Data folder)

Here is the magical connection string:

Data Source=.\SQLExpress;Integrated Security=True;User Instance=True;AttachDBFilename=|DataDirectory|calendar.mdf

Here are the magic error messages (short list, I forgot to copy some of them down)

Invalid value for key ‘attachdbfilename’.

Failed to generate a user instance of SQL Server due to a failure in starting the process for the user instance. The connection will be closed.

“No connection could be made because the target machine actively refused it”

Unable to open the physical file “C:\Inetpub\foo\App_Data\aspnetdb.mdf”

Random things tried:
Don’t create the mdb file using the “Add New Item” menu in Visual Studio (create DB the old fashion way with Management Studio)
Change .\SQLExpress to actual server name, eg. MyBox\SQLExpress
Changed |DataDirectory| to the actual physical directory.
Grant rights to NETWORK SERVICES, LOCAL SERVICE, MyBox\ASPNET to modify files in App_Data folder (preferably only to the account that the anonymous user is running as, not all three of them)
Switch from IIS to ASP.NET Development Server (or other way around)
Delete files found at:
C:\Documents and Settings\[some user name]\Local Settings\Application Data\Microsoft\Microsoft SQL Server Data\SQLEXPRESS
The above folder holds the various databases that SQLExpress creates upon creating a User Instance
Don’t use Remote Desktop. When you run a user instance across remote desktop, it is hard to guess what User profile the various system databases will be written to.

Advice- User Instance, just say “=false”.
User Instances are bad. Bad bad bad. They might be okay in a windows application that you are running on a single disconnected machine in a salt mine mile below ground.

AttachDBFilename
This is only going to work if you have administrator rights. You typically will not give your anonymous account admin rights to the database. So right away, we can see Integrated Security=True and AttachDBFilename=… do not go together…unless you are using windows authentication. A brain damaged mollusk doesn’t know how to set up windows authentication, less so on a hosted account where user admin tools are often crippled and incomplete (I’m thinking of the lunarpages control panel here)

Furthermore, |DataDirectory| doesn’t always resolve.

So what is left? We have a file that isn’t attached, that ADO can’t find, and we need a priori information about the SQL instance name and a priori information about the user ID, password, and database name. Sigh. Thanks Microsoft. Not a single break.

Final Solution
Half brained mollusks will have to use conventions. First, assume the server name is “localhost”, second the user will have to find out what the credentials are and hope the credentials have dbo. Finally, the user will have to know what the db name is.

Worse, the user will have to be able to edit the web.config file. I’m now going to work on a web.config generating page, so the user can enter the five magic words and get a web.config file generated for him.

Blogged with Flock

Evolution of an Idea

I’ve been working on creating a social website to fix the things I didn’t like about meetup.com for a few years now.  Here are some of my early mis-steps:

Using as the manual intended
Wordpress for blogging– very successful, my most trafficked websites, although as a community, I mostly get drive by commenters.

Even if you install it, no one will come
- phpList mailing lists. Useless unless you are some sort of organization that sends out announcements to a large user list.
- forums- phpBB. These are useless until a stampede of users show up.
- WordPress as an event manager (well it supported user accounts, comments as RVPSs, posts and events, categories as interest groups)  Utter failure.  I couldn’t figure out how market it.
- MS Word documents (ugh– pre-2007 this was a exercise in futility for anything except trivial page authoring). Utter waste of time, attracted no traffick to speak of.

Over-ambitious Custom Development, FilmClans-  Movie club software
version 1– almost finished before my attention wandered. 
version 2– tried to port it to C# and set up a InfoCard logon.  Failure on both accounts.

There were probably a few too many features to pull of FilmClans in the time and attention available.

Small Enough to Succeed: Toki Pona Dictionary and Search Engine
First successful website (meaning, feature complete, provided value to at least one person)

Hardly any users, but it is a nice social website non-the-less and it became my favorite dictionary and search engine for toki pona.

Social Animals DC
This is a calendar aggregator.  I manually track down events from many organizations and merge them into a single calendar.  Jury still out on how successful it will be for other people, but I use it to keep track of events I would like to go to.

Lessons Learned

  • One size fits all solutions to social websites don’t work, (i.e. using a blog for in person events)
  • Pick a website design & feature set that is useful with just one person (or else it won’t get off the ground, exception special cases, like when you got a pre-existing audience)
  • Leverage existing technologies (i.e. using RSS for calendar feeds)
  • Be a barnacle (start thinking about using other website’s APIs early on, i.e. using delicious to drive my links page for social animals dc)

Blogged with Flock