elfs | Narrator now supports PDF, Word, RTF, Text, and HTML...

Okay, this isn't a release announcement. But last night, much to my pleasant surprise, I got Narrator's intake engine to accept Word, RTF, PDF, Text, and HTML documents, convert them to a whitelisted variety of HTML, and put them up in TinyMCE for the author to review before posting it to his series of choice.

It only took two hours, and now that it's done it's done. It's horribly primitive, and it's slow (Okay, it was very slow processing Sterlings: Breaking the PDF into text, doing metrics to figure out the paragraph breaks, running the text throught Textile and then again through Tidy to make sure everything's balanced. I think I'll rip Tidy out, though, and replace it with Feedparser.htmlSanitize, which is even more sane.

I'm doing a lot of processing at the intake, but I've learned an important lesson from programming for the Palm platform: it's an output device; the more you optimize for speedy output, the happier your users will be. Input can be tedious, but after you've gotten commitment from the users a tedious input of a single story is acceptable, especially if what you get back is washed and ready-to-wear.

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Most Popular Tags

anime - 28 uses
art - 18 uses
book - 73 uses
books - 44 uses
camping - 42 uses
cats - 18 uses
cooking - 49 uses
design - 38 uses
django - 20 uses
exercise - 51 uses
family - 60 uses
food - 55 uses
geek - 340 uses
geeky - 33 uses
gtd - 18 uses
health - 23 uses
intelligent design - 43 uses
kids - 181 uses
life - 1084 uses
meme - 55 uses
movie - 58 uses
muse - 30 uses
music - 56 uses
news - 298 uses
philosophy - 102 uses
photography - 61 uses
politics - 176 uses
porn - 41 uses
programming - 51 uses
python - 20 uses
reading - 51 uses
recipe - 37 uses
religion - 63 uses
restaurant - 25 uses
review - 266 uses
science - 22 uses
seattle - 43 uses
sex - 64 uses
shrill - 654 uses
sighting - 387 uses
transhumanism - 20 uses
uncategorized - 44 uses
unemployed - 30 uses
vacation - 33 uses
video - 20 uses
video game - 23 uses
web development - 25 uses
wine - 19 uses
work - 27 uses
writing - 386 uses

Flat | Top-Level Comments Only

From:

norikos-author.livejournal.com

What are you using to convert the documents? I had real trouble finding something for either RTF or Word, I forget which (I think, counterintuitively enough, it was RTF....)

elfs.livejournal.com

I use UnRTF (http://www.gnu.org/software/unrtf/unrtf.html) and AntiWord (http://www.winfield.demon.nl) for the first step of conversion.

Elf Sternberg

Narrator now supports PDF, Word, RTF, Text, and HTML...

Narrator now supports PDF, Word, RTF, Text, and HTML...

no subject

no subject

Profile

March 2026

Most Popular Tags

Page Summary

Active Entries

Style Credit

Expand Cut Tags