elfs: (Default)
[personal profile] elfs
Okay, this isn't a release announcement. But last night, much to my pleasant surprise, I got Narrator's intake engine to accept Word, RTF, PDF, Text, and HTML documents, convert them to a whitelisted variety of HTML, and put them up in TinyMCE for the author to review before posting it to his series of choice.

It only took two hours, and now that it's done it's done. It's horribly primitive, and it's slow (Okay, it was very slow processing Sterlings: Breaking the PDF into text, doing metrics to figure out the paragraph breaks, running the text throught Textile and then again through Tidy to make sure everything's balanced. I think I'll rip Tidy out, though, and replace it with Feedparser.htmlSanitize, which is even more sane.

I'm doing a lot of processing at the intake, but I've learned an important lesson from programming for the Palm platform: it's an output device; the more you optimize for speedy output, the happier your users will be. Input can be tedious, but after you've gotten commitment from the users a tedious input of a single story is acceptable, especially if what you get back is washed and ready-to-wear.

Date: 2010-03-27 09:11 pm (UTC)
From: [identity profile] norikos-author.livejournal.com
What are you using to convert the documents? I had real trouble finding something for either RTF or Word, I forget which (I think, counterintuitively enough, it was RTF....)

Date: 2010-03-28 05:39 am (UTC)
From: [identity profile] elfs.livejournal.com
I use UnRTF (http://www.gnu.org/software/unrtf/unrtf.html) and AntiWord (http://www.winfield.demon.nl) for the first step of conversion.

Profile

elfs: (Default)
Elf Sternberg

December 2025

S M T W T F S
 12345 6
78910111213
14151617181920
21222324252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 10th, 2026 02:08 pm
Powered by Dreamwidth Studios