Search This Blog

2010-05-31

Wasabi - Apparently ASP/VBScript Isn't Evil Enough

I've been looking for "project management" software (I know, yuck) for some time. Mainly because I want to incorporate bug/issue tracking, discussion boards, and wikis into j0rb (and in personal projects, etc.). Firstly, it's very difficult to keep track of bugs by hand and often you forget about them before you fix them. When it comes to software development there's just too many issues, whether bugs, design decisions, or potential hiccups; to keep track of. I also want some persistent medium to discuss software design with colleagues. At j0rb we currently discuss things either through instant messaging or we talk in person.

The problem with instant messaging is that there's no organization to it. If you forget what was discussed or what conclusions everyone came to then you have to search through chat logs and hope to find it. If you do find it, you have to sort through intertwined discussions to find the relevant material. It's also hard to have meaningful discussions because instant messengers are generally designed to send single sentences or short paragraphs. It's hard to communicate code ideas or complex ideas that require more space to type. The problem with talking in person is that there's no log at all. There's no searching it to refresh your memory, etc.

Open source projects have gotten by with mailing lists for a long time and it seems the most experienced still prefer them to discussion boards. I figured there must be good reason for this so I've personally taken a liking to them. However, I don't think mailing lists would work well at j0rb. It's a Windows-based shop and I generally hate Windows-based mail clients for good reason. I generally use Gmail's Web interface for personal mail, which works quite well for most E-mail needs, but I wouldn't be allowed to discuss company topics through a remotely hosted service. At least, not a free one with no guarantees about security or privacy.

I'd be happy to try mutt from Cygwin against a locally hosted server, but odds are that my colleagues would be sending HTML E-mails and wouldn't understand why that was a problem (nor why I'd choose to use a plain-text client). For these reasons, I don't think a mailing list would work particularly well at j0rb. Colleague incompatibility. :P A discussion board seems to be the next best thing (with the advantage of edits to correct mistakes, etc.).

Wikis are great for keeping track of a growing and dynamic knowledge base of information. I think they'd work well for documenting certain gotchas discovered in languages, APIs, and platforms, etc.; as well as our own software's gotchas.

Anyway, I've recently been reading Joel on Software. There are a lot of really good articles that make a lot of sense. It seems Joel Spolsky is very experienced in project management and the like. The allure of fully defined specifications and feasible schedules got me interested in taking a look at FogBugz, some project management software developed by Joel's company, Fog Creek Software. I generally avoid commercial software, since as a general rule it's garbage, but I am willing to appreciate some commercial offerings (everything from Microsoft not being among them *cough*).

Anyway, it sounded like FogBugz was a pretty complete solution and I had hoped that it was better than what we have now (which can't do any of the things I mentioned above). I suggested it as an option to management, but the price seems kind of steep. Fortunately, they offer the hosted service for free to individuals so I decided to sign up for personal projects so I could get a look at just what it offered. I was pretty disappointed. At least at first glance, the UI seems rather bloated and disorganized. I'm having a hard time trying to figure out where different types of information are organized. It seems all it really tracks are "cases", which can be bugs, issues, or what have you. Then they use "filters" to determine which cases you see, by tons of criteria. This seems to be the way to show cases for a particular project. It seems awkward that way, but maybe that's just because I'm not used to it.

I decided to ask the Interwebz what it thought of FogBugz. I started where I usually start: at Wikipedia. It was there that I discovered, much to my dismay, Thistle and Wasabi.

It seems that FogBugz was originally written in Classic ASP/VBScript. I can sympathize because our main software project is mostly written in the same. What appears like an OK language from the surface is surely not. Visual Basic is bad enough, but VBScript is like a stripped down Visual Basic with half of the features missing or mutilated. Those that have worked with VBScript for any considerable period of time know that it is the decay-er of sanity. It has numerous limitations and has a weak "standard library" and most non-trivial functionality comes from server "components" that do not appear to be native VBScript at all (I've always assumed they were DLLs written in C or C++, but I suppose it's possible that they're actually written in pure evil). VBScript can't do much without these components. You apparently need these components for things like handling file uploads or connecting to databases. Things that other languages, like PHP, Python, and Perl can do "themselves". Probably because those languages are "open".

Apparently, Fog Creek Software wanted FogBugz to run on Linux servers, but ASP/VBScript is (officially) an abomination of IIS and the Windows operating system. I think at this point most would realize their mistake (developing software with ASP in the first place) and work towards correcting it by rewriting the application in a better language (and one that was cross-platform), but it seems that Fog Creek decided instead to develop a "compiler" (converter, or what ever you want to call it) that could convert ASP into PHP so that they could run their application on Linux boxes without having to completely rewrite it.

Now I don't know how many lines of code FogBugz was at the time or how complicated this "compiler", dubbed Thistle, was to write (it's over my head, I know that); but I know that it only took a few days of maintaining an ASP/VBScript application for me to begin begging almost daily to rewrite the application in something more sane. They didn't stop there, however. They apparently realized that VBScript combined with Thistle was too limited in what it could do (light bulb, anyone?). They missed their second opportunity to change platforms and instead wrote another "compiler" that extends VBScript's functionality, adding modern features that probably should have been there in the beginning, and spits out either PHP or .NET. They call this monstrosity Wasabi. All of that trouble to extend evil that never should have existed in the first place. I can't imagine that developing Thistle and Wasabi was faster than just rewriting their application in a cross-platform language, preferably an open one that will be around for a while, and leaving it at that.

Apparently they keep both of these tools internal. So not only did they create such evils, encouraging the persistence of VBScript-ness, but they didn't even release it to the world for them to benefit (free or otherwise). WTF.

I'm somehow less enthusiastic about FogBugz now..

2010-05-18

Source Code Management -- A Minor Success Story

In my quest to become an open source [UNIX-y] programmer (or hacker :-o), I try to learn about development tools and practices that make development easier. One such category of tool is source code management (SCM) tools (also called version control software). Essentially they keep track of the history of changes that you make to your code[1], as well as who made each change, when they made the change, and even comments from the author of the change explaining the change. This greatly helps developers manage a project because it allows us to keep track of what we've done and even what we're doing right now. It allows you to undo changes that you've made easily and share changes easily between developers. There are plenty of benefits so don't consider this a complete list.

For the record I've found Git to be the best SCM. At j0rb, however, we use Subversion. I myself learned about Subversion a year or so after college and taught myself to use it. Then I introduced j0rb to it and eventually managed to get it adopted. More recently (~past six months) I started using Git after watching Google Tech Talks on YouTube of Linus Torvalds and Randal Schwartz explaining why Git is the superior SCM and why everything else sucks. I didn't want to be "stupid and ugly" so I naturally adopted Git. Now I'd like to switch j0rb over to Git, but I mostly work with Windows-y, GUI-y programmers that are afraid of something like Git. Needless to say, they are refusing to change for now. We all know what that makes them.

Anyway, I've been working on something at j0rb for the past ~week. With Git, I could easily branch and/or commit locally as I go to separate changes, but with Subversion branching and merging is expensive and painful and there's no local repository to commit to. That's because everything is centralized so committing would put my changes in the central repository that everyone uses, which would mean that the application that my colleagues and I are working on would be broken until I'm done, preventing others from doing any work of their own.

Branching (Side Tracked)

One of the nicest things about Git's branching mechanism is that I don't need to go anywhere in the file system. When I change branches in Git, Git automatically makes my working directory that I'm already in the branch that I'm switching to (checking out, technically). With Subversion, a branch is really just a copy of some tree; it's a duplicate of a subtree. In order to work on the new branch, I need to check it out somewhere else on my file system (or I could remove my working directory and overwrite it with the new branch). To demonstrate:
# With Git, it's simple. Create and checkout a new branch named 'newbranch'
# based on the master branch at the current HEAD of the branch (last commit).
bamccaig@castopulence:~/src/example$ git status
# On branch master
nothing to commit (working directory clean)
bamccaig@castopulence:~/src/example$ git checkout -b newbranch master
Switched to a new branch "newbranch"
bamccaig@castopulence:~/src/example$ git status
# On branch newbranch
nothing to commit (working directory clean)
bamccaig@castopulence:~/src/example$
As can be seen above, with one simple command Git has created a branch and I'm already in it! I didn't have to do anything else. I can just start working. And Git is fast when it comes to branching so I didn't have to wait for anything. The new branch just points to the master branch so there was no need for expensive duplication of data. Subversion tells a different story, however:
# With Subversion, it's quite painful and it's also pretty slow. Create
# and checkout a new branch named 'newbranch' based on the repository
# trunk in the HEAD revision. Note that in my experience it's best to do
# branching in Subversion server-side. At least if you ever intend to
# merge back into the original branch...
bamccaig@castopulence:~/src/example/trunk$ svn cp -m 'Example...' \
        file:///home/bamccaig/src/example.repo/trunk \
        file:///home/bamccaig/src/example.repo/branches/newbranch

Committed revision 2.
bamccaig@castopulence:~/src/example/trunk$ svn up .. && \
        cd ../branches/newbranch
A    ../branches/newbranch
A    ../branches/newbranch/foo
A    ../branches/newbranch/bar
A    ../branches/newbranch/baz
Updated to revision 2.
bamccaig@castopulence:~/src/example/branches/newbranch$ 
Notice that Subversion basically requires me to not only type out a semi-lengthy URL (twice!) and download another complete copy of the original branch (in this case branches/newbranch, which is a copy of trunk), but also requires me to move around in the file system. In this simple example, I had the entire tree in my working copy (from the root of the repository, including branches, tags, and trunk). Often though you aren't interested in all of the many branches and tags that exist so you'll only checkout the branch(es) that you're interested in. In that case, you have to type out a checkout command with what's probably a semi-lengthy URL and then type out a command to change to the new branch's working directory. In short, branching in Subversion is just no fun. And don't get me started on merging... :'(

Back To The Story

OK, so here I am with a lot of changes to my working copy (give or take, 15 added or modified files). I come back in to work on Monday after the weekend and start working on a separate, though related project. When I finally get that done in mid-afternoon I get back to my original project, rebuild it and run it (something I generally do to get an idea of what state things are in and remind myself what I was working on last; again, Subversion doesn't help much when there's 15 added or modified files). To my dismay, this ASP.NET project throws a StackOverflowException immediately upon launching in Visual Studio's development Web server, which subsequently "crashes" the server since there's really no recovering from that. Unfortunately, with a StackOverflowException, there is apparently no stack trace (something I discovered right then) because the stack[2] itself is in an unholy state. On to tracking down what was causing the problem. But how? A stack overflow usually means you're either calling too many nested functions (often a result of recursion) or you've allocated too much memory on the stack.

Here's where the SCM comes in handy (albeit, this particular SCM still comes up short). I tried looking at all of the changes I had made since my last commit to see if I could spot anything suspicious.
[bamccaig@j0rb:foo]$ svn diff | less -S
Unfortunately, nothing stood out. I had added some new LINQ to SQL entities to the project and added some code to work with them. Much of the new code was generated for me by Visual Studio. The code I had worked on didn't stand out as a culprit.

This is where having Cygwin installed, a UNIX-like environment for Windows, comes in handy[3]. I decided the most efficient way for me to find the problem was to undo the changes I had made, confirm that it worked, and then redo the changes bit by bit until I encountered the StackOverflowException. This way I would know where to look for problems: the last applied changes. UNIX and UNIX-like operating systems (and Cygwin, as mentioned above) have tools that make this easy. First, I generate patches[4] with the SCM, Subversion, and a little shell scripting.
[bamccaig@j0rb:foo]$ for f in `svn st | grep '^M' | \
        sed -r 's/M *(.*)/\1/'`;
do
    svn diff "$f" 1> "$f.patch" && svn revert "$f";
done
For every file, foo/bar, that was modified since the last commit, I get a file foo/bar.patch that stores the changes made. Then I undo those changes (svn revert). For added files, the changes are irrelevant because they're basically the entire file anyway so instead I just temporarily remove them from the Visual Studio project. I can easily get a list of which files to remove though using Subversion and the shell again.
[bamccaig@j0rb:foo]$ svn st | grep '^A'
With all the changes undone (there were no deleted files in my working copy) I was able to retest the code. Lo and behold, it runs fine now. This confirmed that it was indeed me that broke it (damn). That came as no surprise though because I had been working on it for close to a week without problems and without pulling changes from the central repository. I was the only one making changes.

Now comes the fun part. Applying each patch one at a time and testing. It might sound tedious, but imagine how much more tedious it would be without the SCM or UNIX tools. To apply the patches from before, we use (surprise) the patch program.
[bamccaig@j0rb:foo]$ patch -p0 -ui path/to/the.patch
The -p0 option is required to leave paths in the patches alone. The default behavior for patch is to strip off the directory part, leaving only the filename. That only works if the file you're patching is in the current working directory. Mine are all over the working tree. The paths just happen to be correct from where I'm working though so the 0 says to strip nothing from them. The -u option tells patch that the patch file is in unified format, which is what Subversion's diff sub-command outputs by default. patch will likely figure this out on its own, but why waste the resources? :P The -i option specifies the patch file to use (which is followed by the path to the file).

Since I'm using Cygwin, with UNIX newlines[5] on a Windows-based project, however, patch is going to somewhat mangle my source code by filling my files with the wrong newline type. It's not a problem. The code will still work, but Subversion will see every line as a change, which will make reviewing the changes later on rather difficult. To fix this, we can use the unix2dos tool to convert them back. To save myself a lot of tedious typing, I created a bash function for all of this.
p() {
    patch="$1"
    file="`dirname $patch`/`basename $patch .patch`";
    patch -p0 -ui "$patch" && unix2dos "$file" && rm -i "$patch";
}
This way instead of typing out that long patch ... unix2dos command line, I can just say `p path/to/file.patch'. After each patch is applied, I'd confirm that it applied properly and remove the patch file to mark which ones I had done. Then I'd refresh the Visual Studio solution, rebuild it, and run it. If there was no StackOverflowException then I'd move on to the next patch. Once again, the SCM and UNIX tools allow me to easily track my progress. The following function listed which patches I had yet to apply:
[bamccaig@j0rb:foo]$ c() {
    svn st | grep patch;
}
I used that list to try to apply patches in order of dependencies to avoid unrelated problems.

The Suspense Is Killing Me!!!11

So what was the bug in the code causing the StackOverflowException? I have no clue... :( After going through the above, I seemingly have applied all patches and added all new files back to the project and it works fine. The only bug I encountered in the code had to do with a data-layer interface that I recently tried (again, I now realize) to get fancy with. Essentially, LINQ to SQL is handled through a DataContext class that is generated for you by Visual Studio. When you query for a set of entities, they are linked to the DataContext that retrieves them. When you make changes to them, the DataContext knows and uses those changes to generate SQL that ultimately updates the database. However, often times the changes are coming from the client, or take place over a few layers of the application. It's hard in some of these instances to maintain the original DataContext and the entities that are attached to it. It's particularly difficult when using serialization to communicate with a user agent. Fortunately, there is an interface to attach detached entities. Unfortunately, it requires both the modified object and the original unmodified object to know what record it's dealing with. This means that something as simple as saving an entity can require an entity-specific query and it just generally results in code bloat. To get around this, I created an interface that returns a LINQ Expression that identifies the entity and implement it for each entity. This way, the framework that I've developed can automatically fetch the original object, reducing the bloat to a simple call:

1
2
3
new LinqManager().Save(entity); // INSERT or UPDATE.
new LinqManager().Delete(entity); // DELETE.
 

Anyway, getting even more lazy, I also added an interface that returns the record identifier (all tables of this database have INT record identifiers). This allowed me to generate a typed LINQ Expression using generics and that interface.

1
2
3
4
5
6
public Expression<Func<IEntity, bool>> DefaultIdPredicate(
        IEntity e)
{
    return o => o.GetRecordId() == e.GetRecordId();
}
 

It sounds good and compiles happy, but it fails hard at run-time because LINQ to SQL can't translate it into SQL. During the process of applying the above patches I eventually ran into an exception whose message spoke of this. It was then that I remembered trying it previously (which is why the above method existed already), but it failed and I reverted, leaving the method intact for a future revelation. Instead I'm stuck resorting to ugly type-casting and explicit property access, which I manually re-coded throughout the project. That is the only fix that I made to the code as I applied the changes.

I can only hope that was the problem, though I'm not sure how that could cause a StackOverflowException, and be thankful that I had an SCM, even a poor excuse for one, and a UNIX-like environment to help me through this mess..

References
1. Source code management (version control software) tools aren't limited to tracking source code. They can actually track changes to any set of files (though it may depend on the particular tool), but as a general rule they don't work as well with binary files as they do with text files.

2. If you're unfamiliar with the call stack or stack vs. heap then ask Wikipedia. I'm normally happy to explain it, but I feel exceptionally lazy right now. I'll give you some hints though.

3. Though not as handy as running a UNIX-like operating system, such as Linux, would be. Unfortunately, I'm stuck with Windows at j0rb, but I digress..

4. Patches are essentially instructions for how to change a file automatically. They show the difference between two files, which can be used to automatically modify the original and produce the new one. I feel like I'm doing a horrible job explaining this today so I'm trying to refer to material that will do a better job explaining than I can right now. See here.

5. http://en.wikipedia.org/wiki/Newline.

2010-05-05

.NET + XPath + Namespaces (Conclusion)

In my last post, I discussed how XML namespaces were interfering with XPath expressions that were being used by an application to map XML data. Thanks to kind people on #xml on irc.freenode.net I finally made sense of it.

Within an XSLT document, which you of course know is an XML document, XPath expressions can apparently use the namespaces defined in the XSLT document (I assume then that they can't use namespaces defined in the transforming XML; confirmed). I think that means that if I have the following XML document:

1
2
3
4
5
6
7
8
<?xml version="1.0" encoding="utf-8"?>
<root>
    <level1 xmlns:foo="http://www.bamccaig.com/foo">
        <level2>Text</level2>
    </level1>
</root>
 
 

Then I could use an XSLT document like this to transform it:

1
2
3
4
5
6
7
8
9
10
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:bar="http://www.bamccaig.com/foo">
    <xsl:template match="//bar:level2">
        <xsl:value-of select="text()" />
    </xsl:template>
</xsl:stylesheet>
 
 

Note that the XPath expression used for the <xsl:template> element's match attribute...

//bar:level2

...uses the namespace prefix defined in the XSLT document (i.e., bar), not the namespace prefix defined in the XML document (i.e., foo).

What this helped me to realize is that it doesn't matter that the XML source document that I'm querying for data with XPath has child elements with default namespaces (i.e, no prefixes) because the prefixes can come from elsewhere. It was then that the .NET API began to make sense. In a way, my .NET is taking the place of the XSLT document, and it then makes sense to define namespaces in .NET. I tried to avoid this originally because it felt like hard-coding what should be soft-coded, but now it makes sense.

In my last post, I mentioned that the XPathNavigator.Evaluate method accepted an IXmlNamespaceResolver. While the XPathNavigator is itself an IXmlNamespaceResolver, it doesn't make sense that it could resolve our namespace (since the namespace in the given example is a default namespace, with no prefix).

So just like in the XSLT, we're going to define a namespace prefix for the namespace used in the XML. We can do that with an XmlNamespaceManager (which implements IXmlNamespaceResolver). Unfortunately, and it escapes me why, we need to pass an XmlNameTable to the XmlNamespaceManager's constructor. Both XmlDocument and XPathNavigator have NameTable properties of that type, but we've already concluded that those won't help us here. Why is it required? I don't know, but it is. With that information, we can finally resolve our namespace:

1
2
3
4
5
6
7
8
9
10
11
var nav = ...; // Our XPathNavigator.
var nsMan = new XmlNamespaceManager(nav.NameTable);
 
nsMan.AddNamespace("bar", "http://www.bamccaig.com/foo");
 
...
 
var it = nav.Evaluate("/root/bar:level1/bar:level2/text()",
        nsMan) as XPathNodeIterator;
 
 

There is one other thing you might be interested in. The data contract API that we're working with (from the post, LINQ to SQL + Serialization) adds a default namespace to the serialization based on the .NET namespace of the serialized type. This happens to be of the form[1],...

http://schemas.datacontract.org/2004/07/Clr.Namespace

...where the namespace is Clr.Namespace. If you don't like that then there are various ways to control it. You can pass a named parameter into the DataContract attribute, however, since in my case the types (and attributes) are generated for me, I don't have control of what parameters are passed to the attribute. The alternative solution that I preferred (because it was easy) was to set an assembly attribute.[1]

Essentially, I opened up the Properties/AssemblyInfo.cs file that Visual Studio had generated for me automatically and added the following line(s):

1
2
3
4
5
// Setting the data contract namespace for the assembly...
// See http://msdn.microsoft.com/en-us/library/ms731045.aspx.
[assembly: ContractNamespace(MyNamespace.ContractNamespace.Uri,
        ClrNamespace = "MyNamespace")]
 

Note: You'll also need a using directive for the System.Runtime.Serialization namespace, but that should be a given. ;) The latest Visual Studio should resolve that for you if you ask it to politely. Alternatively, you could fully-qualify the attribute.

Note that instead of using a string literal, which I could have done, I elsewhere declared a constant field that could be referenced (both here, and earlier, when creating the XmlNamespaceManager), which makes it a lot easier to change the namespace URI in the future.

And that's all there is to it. Let me know if this does or doesn't work for you, or if you have any questions, concerns, or advice. :P

References

  1. http://msdn.microsoft.com/en-us/library/ms731045.aspx