elfs: (Default)
[personal profile] elfs
Last night I realized that, given you have two vectors in a 3-dimensional space starting from the same point that are not the same vector, that those two vectors create a plane-- then the same is true for any n-dimensional space. 3 dimensionas, 10 dimensions, 40 dimensions, it doesn't matter.

Why is this important? Simple; once you've identified the plane, you can find the angle between the two vectors.

If can get two people to answer 100 yes or no questions, you can create two vectors in a 100-d space and find the angle between those vectors. Those people with the smallest angle will have the greatest similarities in response. This is the basis of a vast number of recommendation engines.

It was the n-dimensionality that was bugging me. I finally grasped how little that matters in the end.

Date: 2010-03-17 03:56 pm (UTC)
ext_3294: Tux (Default)
From: [identity profile] technoshaman.livejournal.com
Neat! I had no idea that "you might like" could be solved using fairly simple math. FSVO simple :)

Date: 2010-03-17 04:28 pm (UTC)
From: [identity profile] elfs.livejournal.com
Overview of recommendation engines, and how Amazon's is different (http://www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf).

Amazon's is especially straightforward. It throws "you" out of the picture after you've made your purchase. Amazon doesn't care about you at all. All it cares about is the purchase relationship inside shopping carts.

In Amazon's algorithm, every object in a given shopping cart is a node, and every node has a relationship to every other node expressed as an intensity. After you buy something, Amazon looks that object up and finds the top n items that were found in other people's shopping carts next to that item, and shows them to you. If you pick up several items, Amazon can float to the top items that might have multiple relationships. This is computationally much cheaper than caring about "you."

The entire point is not to provide you with utility. It's to encourage impulse buying.

Date: 2010-03-17 05:18 pm (UTC)
From: [identity profile] en-ki.livejournal.com
Relatedly, the correlation of two variables across n samples is the dot product of the two n-dimensional vectors (once they are recentered to have a mean of zero and scaled by the standard deviation).

Geometrizing statistics is fun:

http://en.wikipedia.org/wiki/Principal_component_analysis
http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.0020190

Technically, you need an inner product

Date: 2010-03-17 05:37 pm (UTC)
blaisepascal: (Default)
From: [personal profile] blaisepascal
It doesn't matter the dimension of the vector space, buy you need an inner product to define the "angle". The standard Euclidean dot product usually works, but it's not the only valid inner product. Another might make sense.

In fact, the choice of inner product is probably the place to tune the recommendation engine.

Date: 2010-03-17 05:43 pm (UTC)
From: [identity profile] urox.livejournal.com
This doesn't seem like angles anymore to me when you have either yes or no as the answers. This seems like you could better just do percent matching profile.

But now I'll have to go read the article because I want to hear more about algorithms. :)

Date: 2010-03-19 06:26 pm (UTC)
From: [identity profile] ivolucien.livejournal.com
<nods> I'm doing something similar with my project, but in addition to the general "plane" I'm working with categorically related sets of responses as distinct entities in order to support goal specificity and human-centric search, sort and filter functionality. I'm still playing with modeling options, and the assignment of "intensities" as you say, will be a process of ongoing refinement.

Have you run across any articles or related info that you could easily point me to? I'm always looking for fuel for the creative fire. ^_^

Profile

elfs: (Default)
Elf Sternberg

December 2025

S M T W T F S
 12345 6
78910111213
14151617181920
21222324252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 1st, 2026 07:26 am
Powered by Dreamwidth Studios