The standard advice on new opensource projects

The Linux Advice
Work on a project alone until the project becomes so big, successful and large that other developers beat a path to your door.

This isn’t really very useful advice. It kind of implies some extreme assumptions– no one will help on a small project, and only really big, successful one man projects can attract community contributions.

Code yard sales
I heard the phrase first on Hanselman. Here’s some junk code, have fun. Because the original commiter is so … uncommitted, the project isn’t likely to go far.

This model has it’s places, but it doesn’t help if you want a medium sized open source project to grow and have some modest success.

Face it, odds are low your project will become as successful as Linux and it’s unnecessarily lazy to give away your starting code base and abandon it.

WordPress-style Plug in Model
If the application is designed from the get-go for extensibility in a variety of ways, from basic stuff like using abstract classes and interfaces, to using a full blown plug-in architecture, then people can  contribute in an unco-ordinated fashion, perhaps without even committing to the main trunk.

The core of the application is a component host. The lead developer first job is to provide the glue for the parts. The particular functions would be secondary. I think this is counterintuitive, but essential. If I have an idea for a great blog engine, if I write a blog engine first, I will have to retrofit it for plugins. If I write an application host first, albeit one optimized for plugins, then it will be a very different application.

Pervasive Data Integrator: Title pending…


I finally found Fernando Lambastida‘s blog.  I called him, he apparently is associated with Pervasive.  I told him who I was, a lowly techie in a big software consulting company that needs to evaluate/work with/play with some ETL bits.  I said that I saw the offer for a 14 day ‘free demo software‘ on his blog and he offered to help me work with Pervasive to get a trial key.  If your calls to Pervasive are channellings you through the sales sewer pipeline, instead try Fernando, he’s a good chap.

[Okay back to what had transpired before]
So I got a job interview coming up. They want a Pervasive ETL expert. Pervasive is a relatively uncommon ETL package in a fairly narrow vertical market: ETL. So I do what seems reasonable, I download the free evaluation trial. I install it, read the documentation. I almost grok it, so I try to open a designer. The designer asks me for a license key. Hmm. I check the email. Yup, it says I have to contact them to get a key. Website says so, too. Trial keys require a phone call to Pervasive. Okay. I’m a developer, I understand that companies need to defend their IP. So I call.

I call the company and eventually get transfered to a Chris, who said if I was interested in pitching my skills to a company who had a copy of Pervasive, I had to get a license key from that company or that company’s sales representative!!! This is very weird, he can’t possibly mean I need to illegally obtain a license key from an existing customer, does he? And Chris says under no circumstance can I be given a key.

So I call them back and ask not to be transfered to Chris and now I get a Howard. Howard likewise said no, no demo key.

So if anything, Pervasive‘s website is highly misleading about the availability of an evaluation version.

If you came to this post researching Pervasive, I recommend that you look into something else. There are too many choices available, from ad hoc, to RhinoETL, Jasper ETL, DTS, SSIS, Oracle DataIntegrator, Informatica–why settle for dicey customer service?

Data is the lifeblood of an organization. If the data fails to flow, you’re out a job and the company will be losing money.

Buying tools from companies that are actively hostile to developers is like loading a metaphorical gun and pointing it to your company’s head.

Webcast Notes: Top 5 Challenges Faced by Software Development Teams…

Recommended listening speed: 1.25

Top five software challenges:

-Invisible Progress
-Inefficient (manual tasks, expensive communication)
-Too many *unintegrated* projects (stovepipes?)
-No process (I’d add, wrong process, too much process, fake process, dysfunctional process)
-Not enough $$

What is invisible progress?

LOC failed to measure software. Metrics tricky to collect. Customers, managers become unhappy and unrealistic. Related to the *estimation* problem. If you can’t measure progress, you probably can’t estimate either. The shift away from code that can be measured with LOC makes it that much harder to measure progress

Test for invisible progress– How much progress? (Software measurement) How much more progress can we make with our current resources? (Software estimation)

What are Inefficiencies?

SDLC– The SDLC is what is inefficient. (primary, secondary activities, e.g. analysis, code, source control, build, etc. etc.) Bottle necks here are more technical, physical, e.g. it takes long time to build, slow debug process, etc.
Social bottlenecks, eg. inter-team communication, interperson communication, customer-team communication

Communication tools- In person, phone, IM, email, memos (word documents), PPT shows <– unstructured communication.

What are disparate projects?

In particular, different tools, ie. as you move thorugh the SDLC, you have a lot of context switches, for example here is a SDLC with many tool context switches: IDE–>Source Control (subversion)–>Text Environment –>Communication Env (Word/Email) –>Reporting (ad hoc) –>Work flow (white board)

What is a lack of discipline?

Missing, wrong or bad process.

Evidence of discipline:
- formally defined workflows
- formally defined responsibilites (specialists, not a collection of 1 man teams)
- accomodating different skills (i.e. explicit consideration of the fact that there is a singnificant learning curve, different skill levels, etc.) i.e. training & progressive increase in duties
- someone collecitng metrics
- someone doing estimates, collectin data on estimates: metrics + predictions
- improvement feedback loop (formally identify problems & solutions)

Problems lack of money causes:

Fewer people, fewer tools, less formal training, fewer outside experts. Can make existing problems worse.

However…some problems can’t be solved with more money, i.e. necessary but not sufficient resource

Challenges in curing this– organizations don’t spend money on IT because it is a task peripheral to the main organization’s mission (running government, selling stuff, etc.)

Generic Solutions
Invisible progress: collect metrics! publicize it! Pay the cost of collecting metrics.
Inefficiencies: Formalize improvement process! (identify & solve, duh)
Disparate tools: Pick integrated tools (prefer IDE plug ins over separate tools)
Lack of discipline- adopt some sort of softare process. Works best if there is a advocate for process on the team, preferably the project manager.
Small budget-Solve other problems first. (spend carefully, duh)

MS Solutions
Visual Studio– plug ins, VSTS has metrics, formalized communication (bug tracker, feature tracker, tracker=workflow, reports, even datawarehousing style reports)
Editions encourage specialization (implies about 4 kinds of developers on a team)

MSDN = 5 user license for team server + 1 dev environment, ~$2,500/yr (not clear about other 4 developers licenses, presumably dev env will need to be purchased separate for each)
Team System Server — (not clear what price is, presumably $$ if more than 5 users + per user costs)Dev Env not targeting team system- $0 – $800
Dev Env targeting team system = $5,500 -> $11,000 (!!) each!

Despite the sticker shock when comparing against say a MSDN subscription, this may be cheap when comparing against Rational or other IDEs meant for huge programming teams (50+ developers)

Conclusion. Team System is for big companies. Everything up to VS200X Pro is for small teams. If you use VS200X Pro, you will still need affordable source control, collaboration, etc. Can use CodePlex if the project if opensource (which is essentially free Team System Hosting)

Getting Subversion to Work: Tips

Apache Server. Don’t change versions. Changing versions is a super-duper-ultra-advanced user scenario. Mere super geniuses shouldn’t attempt. Use the version that comes with the Collabnet subversion installer. (The blank URL warning the installer gives you means it doesn’t want http:// in the field.) In fact, this is probably a good idea for all LAMP stack applications—it’s better to have a dozen LAMPs that work than a dozen broken apps using the same LAMP.

The Passwords.

htpasswd.exe -cs passwords.txt matthew

The Location. The location should look something like this. Ideally you want this over https, but getting apache to run https is a pain.

<Location /svn>
DAV svn
SVNParentPath c:\svn

AuthType Basic
AuthName “Vienna Subversion Repository”
AuthUserFile “C:/Program Files/CollabNet Subversion Server/httpd/bin/passwords.txt”

Require valid-user

SSL. SSL is fairly hard. Your first goal will be to get SSL to work with any certificate at all. The next goal will be to get the SSL certificate to match the URL, expiration date and so on. Even if you get rid of the expiration and URL errors, you will be dogged by certificate authority warnings until you pay money for an SSL cert, which is not worth it.

Checking Out/Checking In Code. Use a combination of Ankh and Tortoise.

Wow! I have to say I’m impressed with Microsoft

In the good ole days of COM, I didn’t have the platform or the audience to let anyone know that COM was a technology only a mother could love. So unusable and arcane–it was elitist.

Now when I blog, I’ve gotten response from project managers at Microsoft. Jeffery Snover (a PM for powershell) replied to one of my posts with my notes about batch versus .net code versus powershell. And now, after getting thoroughly ticked off after a few days worth of work and couple of dollars trying to implement Infocard/Cardspace, I got a response from Kim Cameron, the PM for Infocard–(detailed, even!)

Overview of Code Generation in the .NET World

Code generation is hand for common coding patterns, like the half dozen lines of code you for a class property. This will save you minutes of labor.

Code generation is really useful for meta-data drive programming. For example, if you have something that is a database, or looks like one, you may find yourself writing code to do the same half dozen operations with each table, such as fetch by primary key, insert, delete, update, undo an update, update within the context of a transaction, check for string lengths, and so on. Dynamic code that figures out how to do this at run time, for example, Fetch(table_name as object, pk_value as object) as object means you are working with late bound data types and it is potentially very expensive to look up the metadata on each call. It would be more efficient to code generate the tables and call Account.Fetch(pk_value as integer) as Account. And obviously either one of these patterns is more efficient than writing by hand the dozens of lines of code that it takes to fetch a row from a table and fill a business object.

Three big business object frameworks include CSLA, Subsonic, and .nettiers. CSLA and .nettiers are big frameworks with lots of fancy features. Subsonic is a relatively simple framework. It probably is better to pick any of the above than to try to write one’s own.

Metadata drive code generation is also a type of Object Relational Mapping (ORM). There is a “ORM impedance mismatch” between how the world looks from the standpoint of a relational model and an object oriented one. Sometimes the code generation frameworks do are able to translate from a database schema to objects without a change in semantics, sometimes not. For example, .nettiers can’t deal with tables with composite foreign keys.

To get code generation to work, you may have to adapt the templates or change the schema. If you have the luxury of completely rewriting your schema or if you don’t care what the database schema looks like, you might at this point decide to drop code generation and instead use nHibernate or the like, which takes the opposite approach—you write business objects and nHibernate decides how to save them to a database.

Code generation can be done in any language, although JSP/ASP like languages seem to be the preferred way to generate code. Template languages like to mix inline code in document templates. XSLT is an also ran that a bit hard to write. Microsoft itself uses CodeDom, which is like the HTML DOM, except the document is a .NET source file. It also has a reputation for being hard to write.

CodeSmith is almost just like ASP.NET and the code templates in Subsonic are ASP.NET pages. They are ordinary aspx files with source code instead of HTML around the golden nugget—(the inline code).

Code generation will become part of the build process, so a final component of a code generation is editing the nant or msbuild files to include the build. Both are a type of a script language with xml syntax. MsBuild is used by Visual Studio 2005, but nant has more features. MS-build’s feature list is more competitive once you include :

Email Bankruptcy

Email is begining to fail as a technology for me on a number of levels– it over run by the worst sorts of spam, it’s too easy a way to avoid dealing with people as humans and it consumes too much mindspace. And the volume. There is too many messages I want to ‘subscribe’ to and not enough filtering.

Email’s Failure on a Social Level. I’ve recent read the book “Painfully Shy” and while I probably am, I don’t want to be in pain or shy, so I’m going to try to declare email bankrupcy and switch to making phone calls when ever I can, especially for family and friends & avoid emails where it isn’t really a very valuable social exchange anyhow, like dealing with customer service. (This doesn’t apply to my work situation though, where 2/3 of my office work team is off site–it that case, multimodal–in person, phone, email and IM–is the way to go)

Email’s Failure in denfense against mal-mail. POP3 email is out of control, and worse yet, it has been taken over by purveyors of not just regular porn, but porn plus hate crimes. I think we are going to see a replacement soon. My domain account gets thousands of spam emails, which I finally have rerouted to a suitable blackhole in a nearby galaxy.

My email is secret, but not my phone number. I’m going to change my email address to something hard to guess, rather than something logical, like my name. This will make handing out my email address difficult, since non-geeks have a hard time typing email addresses. I’m thinking of using my phone number and name, since it would create a huge search space (3 thousand common names * 24 intitials * 9^9 power phone numbers * 12 different ways to combine the names. ) Also, this will let people know that they have no excuse not to call me when ever they email me.

Aliases. Long ago I switch to using primarily forwarding accounts to increase ISP mobilitiy. It a strick policy of never using you SMTP/POP3 email account except via a forwarding alias is also proving to be handing while I declare email bankruptcy.

Untrusted mail. I can’t stop using my old email address, which now is a combination of spam and untrusted mail, so I think I will route it to gmail, which isn’t too bad and has a good spam filter. My Yahoo account will continue to be by rarely checked account I give to websites that I suspect will spam or crapfood my mailbox.

RSS. I now use bloglines and I intend to use it, or whatever better RSS reader I can find. I’d kind of like to declare webbrowser bankruptcy, too. Webbrowsing is another way to avoid dealing with people, but on the otherhand, most of the people I’ve been friends or acquitances with I met at events I learned about on the internet. Someday this will lead to RSS reader bankruptcy, too, but it is a step up from checking 25 sites only to find out that 20 of them haven’t changed recently and those that have now are listing stale events.

Documenting Anything

Some document generators are as smart as compilers and can figure out what is a function, what is a subroutine, etc for all language specific elements.  This is true for database documenters that only reference the system tables.  This documenation isn’t as exciting as it could be, because it just repeats what is usually available with a database or class browser.
Some document generators are entirely on hand generated code.  This is a bit more tedious because if you forget to add the comment templates, the document generator won’t notice. Examples: RoboDoc, Natural Docs
And of course some are in between, which automatically generate documentation from the structure of the class and add tagged comments when available.  Examples: (mostly commercial stuff)

Visual Studio for non-Visual Studio Languages

What can be done 

  • VS will let you color sytax if it thinks the file is a C++ file
  • You can use custom snippets for any language
  • You can use external tools to call the external compiler, etc.
  • You can use the VSS integration
  • Create file templates
  • Create project templates

What can’t be done (as far as I can tell)

  • Debug non VS languages
  • Intellisense

However, plug ins do exist that do the intellisense and debugging from the VS environment.

Someday I will have to review this product, since all of my personal websites are opensource applications written in PHP. Given I spend $100+ a year on the website, $100 for a development environment that makes PHP development less frustrating might be worth it.  I have very little time for writing code on my personal sites, so every second counts.

If the code snippet manager is hidden…

The code snippet manager sometimes fails to show up on the “Tools” menu of Visual Studio 2005.  If it does, go to Tools/Customize, select the tools category, find “Snippet Manager” drag out of dialog window and up into the menu.  The operation is a bit like the MS-Access menu designer if you’ve ever used that.