Projects: Re-assessing Commitments
Jun. 4th, 2008 09:16 amI took the scalpel called rm to much of my projects list this morning. I keep getting overwhelmed by the list and deciding to do none of them. This made me happy, to make the list shorter, by deleting:
- Emacs Diction/Rhyme/Thesaurus
- While it would be nice to actually have a working dictionary, rhyming engine, and thesaurus linked to Wordnet in emacs, I'm not about to teach myself enough elisp anytime soon to actually pull this off.
- Faerybriar
- This was a simple program written in Python that did automated layout of websites. Pylons and Rails do this so much better than what I achieved that there's no point to my finishing it. It did have a nice architecting system based on Conallen's Building Web Applications with UML, but it crashed a lot and I'm just not going to have time to finish it. One rule to writing software is that it should be something I want to use, and I never actually used it.
- GoX
- One of my favorite courses in college, and the big topic of my last year, was library databases. These were specialized compression and search engines designed to index huge masses of text. To give you an idea of how small we were thinking, the textbook was called Managing Gigabytes. These days I work at a company that manages terabytes and petabytes. MG databases were computationally expensive to build and compress and they could not be changed on the fly, but if you had a very large archive of unchanging text accessed often, they were computationally cheap and even faster than uncompressed accesses.
GoX was an idea I had to create a separate index space that would keep a tokenized, compressed index of an XML document's structure, the Huffman token keys built in a two-pass operation to sort by density, along with the textual word index. By combining the two indices, you could use XPATH to efficiently extract "the fifth paragraph," or "The paragraphs surrounding the word 'consilience'" or so on. But it's a huge amount of lifting for very small payout. People still seem to think that processing power and drive speeds are fast enough not to need the esoteric solutions of MG. While I disagree, that doesn't mean I have the bandwidth to engage in a project this massive. - ObjectiveWeb
- No point to this one. It's been done, and better.