next up previous
Next: Evaluation Metrics Up: Data set Previous: Data set

Movie data:

We used data from our MovieLens recommender system, MovieLens is a web-based research recommender system that debuted in Fall 1997. Each week hundreds of users visit MovieLens to rate and receive recommendations for movies. The site now has over 43000 users who have expressed opinions on 3500+ different movies. We randomly selected enough users to obtain 100,000 ratings from the database (we only considered users that had rated 20 or more movies). We divided the database into a training set and a test set. For this purpose, we introduced a variable that determines what percentage of data is used as train and test sets, we call this variable x. A value of x=0.8 would indicate $80\%$ of the data was used as train set and $20\%$ of the data was used as test set. The data set was converted into a user-item matrix A that had 943 rows (i.e., 943 users) and 1682 columns (i.e., 1682 movies that were rated by at least one of the users). For our experiments, we also take another factor into consideration, sparsity level of data sets. For the data matrix R This is defined as $1-\frac {\mbox{nonzero entries}}{\mbox{total
entries}}$. The sparsity level of the Movie data set is, therefore, $1-\frac {100,000}{943 \times 1682}$, which is 0.9369. Throughout the paper we term this data set as ML.


next up previous
Next: Evaluation Metrics Up: Data set Previous: Data set
Badrul M. Sarwar
2001-02-19