Next: Evaluation Metrics
Up: Data set
Previous: Data set
We used data from our MovieLens recommender system, MovieLens
is a web-based research recommender system that debuted in Fall 1997. Each week
hundreds of users visit MovieLens to rate and receive recommendations for
movies. The site now has over 43000 users who have expressed opinions on
3500+ different movies. We randomly selected enough users to
obtain 100,000 ratings from the database (we only considered
users that had rated 20 or more movies). We divided the
database into a training set and a test set. For this purpose, we
introduced a variable that determines what percentage of data
is used as train and test sets, we call this variable x. A value
of x=0.8 would indicate
of the data was used as train set
and
of the data was used as test set. The data
set was converted into a user-item matrix A that had
943 rows (i.e., 943 users) and 1682 columns (i.e.,
1682 movies that were rated by at least one of the users).
For our experiments, we also take another factor into
consideration, sparsity level of data sets. For the data matrix
R This is defined as
.
The sparsity level of the Movie data set is, therefore,
,
which is 0.9369. Throughout
the paper we term this data set as ML.
Next: Evaluation Metrics
Up: Data set
Previous: Data set
Badrul M. Sarwar
2001-02-19