Gathatoulie

And of these shall I speak to those eager, That quality of wisdom that all the wise wish And call creative qualities And good creation of the mind The all-powerful truth Truly and that more & better ways are discovered Towards perfection --Zarathustra.

Tuesday, February 9, 2010

most singular

"The SVD algorithm preserves as much information as possible about the
relative distances between the document vectors, while collapsing them
down into a much smaller set of dimensions. In this collapse,
information is lost, and content words are superimposed on one
another.

Information loss sounds like a bad thing, but here it is a blessing.
What we are losing is noise from our original term-document matrix,
revealing similarities that were latent in the document collection.
Similar things become more similar, while dissimilar things remain
distinct. This reductive mapping is what gives LSI its seemingly
intelligent behavior of being able to correlate semantically related
terms. We are really exploiting a property of natural language, namely
that words with similar meaning tend to occur together."

-- http://www.knowledgesearch.org/lsi/lsa_explanation.htm

That page continues with a nice tutorial on how to use SVD to make

http://www.knowledgesearch.org/lsi/tutorial.htm

Unfortunately, I haven't been able to get the corresponding code
to build with the latest Ubuntu:
http://semantic-engine.googlecode.com/

Contacting the developers and other project members to bug them.

0 comments:

Post a Comment

Post a Comment

Blog Archive

words cut, pasted, and otherwise munged by joe corneli otherwise known as arided.