Dear Code Fairy:
Sep. 7th, 2010 09:28 amDear Code Fairy:
I understand that your sudden fascination with Latent Dirichlet Allocation corresponds orthagonally to your idea of normalized HTML document databases with XPATH, but no, you may not resurrect Project ToXIC (Terabytes of XML, Indexed, Compressed). We don't have time.
But damn, LDA is cool. So much cooler than the TF/IDF (Term Frequency Over Inverse Documents Frequency) stuff we were doing back in '91-'92. And I still have a copy of the MG software package somewhere. For a document store, LDA would be golden. For a text-backed forum like Usenet, it would be platinum.
Whoa, there's a paper on LDA for Tag Normalization. OMG!
Dammit, I have work to do. Must. Not. Geek. Out!
I understand that your sudden fascination with Latent Dirichlet Allocation corresponds orthagonally to your idea of normalized HTML document databases with XPATH, but no, you may not resurrect Project ToXIC (Terabytes of XML, Indexed, Compressed). We don't have time.
But damn, LDA is cool. So much cooler than the TF/IDF (Term Frequency Over Inverse Documents Frequency) stuff we were doing back in '91-'92. And I still have a copy of the MG software package somewhere. For a document store, LDA would be golden. For a text-backed forum like Usenet, it would be platinum.
Whoa, there's a paper on LDA for Tag Normalization. OMG!
Dammit, I have work to do. Must. Not. Geek. Out!