Apr. 3rd, 2012

elfs: (Default)
For reasons I'm not going to go into, I need to two different HTML parsers. One needs to accept almost any arbitrary HTML5 input without the concomitant javascript processing, and then spit out a stripped down, whitelist-tags-and-attributes-only version for storage; the other needs to recognize the full suite, plus a completely alien set of tags into which I'll be throwing some, er, extra functionality.

I need this all written in coffeescript.

Nobody's done anything like this before, at least not in Coffeescript. My brain is spinning; I haven't worked with real parsers since my days at F5. Nothing like this was necessary for Isilon or IndieFlix. And, oh my gods, the HTML5 parsing standard is explicit, easy to implement, and huge.

I can use some of the existing Javascript or Python parsers as starting points, but they're not terribly easy to extend. I'd also like to try and use a parser-combinator, because my experience has been that PC grammars are easier to understand. But try as I might, my head explodes when trying to grasp whatever it is I'm trying to do. Still, we'll see. After fridgemagnets, I need a bigger project.

Profile

elfs: (Default)
Elf Sternberg

May 2025

S M T W T F S
    123
45678910
111213141516 17
18192021222324
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jun. 8th, 2025 06:55 am
Powered by Dreamwidth Studios