next up previous
Next: Performance Implications Up: Prediction Computation Previous: Weighted Sum

Regression

This approach is similar to the weighted sum method but instead of directly using the ratings of similar items it uses an approximation of the ratings based on regression model. In practice, the similarities computed using cosine or correlation measures may be misleading in the sense that two rating vectors may be distant (in Euclidean sense) yet may have very high similarity. In that case using the raw ratings of the ``so called'' similar item may result in poor prediction. The basic idea is to use the same formula as the weighted sum technique, but instead of using the similar item N's ``raw'' ratings values Ru,N's, this model uses their approximated values Ru,N' based on a linear regression model. If we denote the respective vectors of the target item i and the similar item N by Ri and RN the linear regression model can be expressed as

\begin{displaymath}\bar{ R_{N}^{'}} = \alpha {\bar R_i} + \beta + \epsilon
\end{displaymath}

The regression model parameters $\alpha$ and $\beta$ are determined by going over both of the rating vectors. $\epsilon$ is the error of the regression model.



Badrul M. Sarwar
2001-02-19