If I know two things...
Oct. 16th, 2006 09:56 amOne of the other things I did this weekend was write an HTML to PDF engine. The HTML it currently speaks is extremely limited and still borked, but I'm definitely on the right track. I intend it to parse only a subset of full HTML, basically that subset necessary to render my stories and not much more. And this isn't your grandma's HTML to PDF renderer, because I'm using the Scribus rendering engine.
This means that many features are radically customizeable, such as text placement, titling and rendering (with a hand-written title plugin, but I'll provide three generics-- one for Bloody Beth, one for Aimee', and the base generic-- the Journal Entries requires a customized titler for the date handler), the addition of copyright notices, and so forth. Even better, the frontpage, interior pages, and copyright notice can all be created using Scribus Templates, so the look and feel will be as sweet as any website you've ever seen-- which is important, because so many traditional HTML to PDF converters strip out much of the look and feel, much of the ambiance, of a story site.
I know two things: I know the Scribus program, and I know Python. And now Scribus's internal automation toolkit is written in Python. I can easily write a program that will suck down the stories from the site in HTML, parse them out into PDF using templates I create and then define in a configuration/build file, and render each one as a unique PDF, or combine story arcs into a larger "book" format, or even pack the entire series as one humungous document. (I doubt anyone wants that, though.)
There is a bug in Scribus: When linking text serial frames for overflow, frame two correctly identifies the overflow it accepted from frame one, but frame three always reports its overflow as zero, indicating that the story content is complete. This is a bug. When rendering pages in the Scribus engine, the autoredraw feature is turned off to save CPU and time, but this interferes with the overflow calculation. After calling scribus.linkTextFrames(), you must call scribus.redrawAll() to force the overflow calculation to update.
And I have one major bug in my code: I don't know what to do with <blockquote> blocks yet. I want to be able to do it in the text stream. If I can't, the alternative will be to render it as its own linked frame with special behaviors, but that will require that I take over page-and-frame-size management duties from the Scribus engine, and I really don't want to do that. I would need to keep a parallel linked list of frames, and hand-calculate how much space was needed, and break blockquotes that cross page boundaries, and a whole bunch of other things that really, I shouldn't be worrying about. And I use blockquotes quite often in my stories.
This means that many features are radically customizeable, such as text placement, titling and rendering (with a hand-written title plugin, but I'll provide three generics-- one for Bloody Beth, one for Aimee', and the base generic-- the Journal Entries requires a customized titler for the date handler), the addition of copyright notices, and so forth. Even better, the frontpage, interior pages, and copyright notice can all be created using Scribus Templates, so the look and feel will be as sweet as any website you've ever seen-- which is important, because so many traditional HTML to PDF converters strip out much of the look and feel, much of the ambiance, of a story site.
I know two things: I know the Scribus program, and I know Python. And now Scribus's internal automation toolkit is written in Python. I can easily write a program that will suck down the stories from the site in HTML, parse them out into PDF using templates I create and then define in a configuration/build file, and render each one as a unique PDF, or combine story arcs into a larger "book" format, or even pack the entire series as one humungous document. (I doubt anyone wants that, though.)
There is a bug in Scribus: When linking text serial frames for overflow, frame two correctly identifies the overflow it accepted from frame one, but frame three always reports its overflow as zero, indicating that the story content is complete. This is a bug. When rendering pages in the Scribus engine, the autoredraw feature is turned off to save CPU and time, but this interferes with the overflow calculation. After calling scribus.linkTextFrames(), you must call scribus.redrawAll() to force the overflow calculation to update.
And I have one major bug in my code: I don't know what to do with <blockquote> blocks yet. I want to be able to do it in the text stream. If I can't, the alternative will be to render it as its own linked frame with special behaviors, but that will require that I take over page-and-frame-size management duties from the Scribus engine, and I really don't want to do that. I would need to keep a parallel linked list of frames, and hand-calculate how much space was needed, and break blockquotes that cross page boundaries, and a whole bunch of other things that really, I shouldn't be worrying about. And I use blockquotes quite often in my stories.
no subject
Date: 2006-10-16 10:35 pm (UTC)You can generate PDF from XML easily via an XSLT FOP implementation.
Free version here : http://xmlgraphics.apache.org/fop/
no subject
Date: 2006-10-17 01:28 am (UTC)The scribus engine is no worse that FOP, and from my experience a hell of a lot easier to learn.