Okay, this isn't a release announcement. But last night, much to my pleasant surprise, I got Narrator's intake engine to accept Word, RTF, PDF, Text, and HTML documents, convert them to a whitelisted variety of HTML, and put them up in TinyMCE for the author to review before posting it to his series of choice.
It only took two hours, and now that it's done it's done. It's horribly primitive, and it's slow (Okay, it was very slow processing Sterlings: Breaking the PDF into text, doing metrics to figure out the paragraph breaks, running the text throught Textile and then again through Tidy to make sure everything's balanced. I think I'll rip Tidy out, though, and replace it with Feedparser.htmlSanitize, which is even more sane.
I'm doing a lot of processing at the intake, but I've learned an important lesson from programming for the Palm platform: it's an output device; the more you optimize for speedy output, the happier your users will be. Input can be tedious, but after you've gotten commitment from the users a tedious input of a single story is acceptable, especially if what you get back is washed and ready-to-wear.
It only took two hours, and now that it's done it's done. It's horribly primitive, and it's slow (Okay, it was very slow processing Sterlings: Breaking the PDF into text, doing metrics to figure out the paragraph breaks, running the text throught Textile and then again through Tidy to make sure everything's balanced. I think I'll rip Tidy out, though, and replace it with Feedparser.htmlSanitize, which is even more sane.
I'm doing a lot of processing at the intake, but I've learned an important lesson from programming for the Palm platform: it's an output device; the more you optimize for speedy output, the happier your users will be. Input can be tedious, but after you've gotten commitment from the users a tedious input of a single story is acceptable, especially if what you get back is washed and ready-to-wear.
no subject
Date: 2010-03-27 09:11 pm (UTC)no subject
Date: 2010-03-28 05:39 am (UTC)