There are two types of user preferences:

  • explicit preference (also referred as "explicit feedback"), such as "rating" given to item by users.
  • implicit preference (also referred as "implicit feedback"), such as "view" and "buy" history.

MLlib ALS provides two functions, ALS.train() and ALS.trainImplicit() to handle these two cases, respectively. The ALS algorithm takes RDD[Rating] as training data input. The Rating class is defined in Spark MLlib library as:

1
case class Rating(user: Int, product: Int, rating: Double)

By default, the recommendation template uses ALS.train() which expects explicit rating values which the user has rated the item.

To handle implicit preference, ALS.trainImplicit() can be used. In this case, the "rating" value input to ALS is used to calculate the confidence level that the user likes the item. Higher "rating" means a stronger indication that the user likes the item.

The following provides an example of using implicit preference.

Training with view events

For example, if the more number of times the user has viewed the item, the higher confidence that the user likes the item. We can aggregate the number of views and use this as the "rating" value.

First, we can modify DataSource.scala to aggregate the number of views of the user on the same item:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
class DataSource(val dsp: DataSourceParams)
  extends PDataSource[TrainingData,
      EmptyEvaluationInfo, Query, EmptyActualResult] {

  @transient lazy val logger = Logger[this.type]

  override
  def readTraining(sc: SparkContext): TrainingData = {
    val eventsDb = Storage.getPEvents()
    val eventsRDD: RDD[Event] = eventsDb.find(
      appId = dsp.appId,
      entityType = Some("user"),
      eventNames = Some(List("view")), // MODIFIED
      // targetEntityType is optional field of an event.
      targetEntityType = Some(Some("item")))(sc)

    val ratingsRDD: RDD[Rating] = eventsRDD.map { event =>
      try {
        val ratingValue: Double = event.event match {
          case "view" => 1.0 // MODIFIED
          case _ => throw new Exception(s"Unexpected event ${event} is read.")
        }
        // MODIFIED
        // key is (user id, item id)
        // value is the rating value, which is 1.
        ((event.entityId, event.targetEntityId.get), ratingValue)
      } catch {
        case e: Exception => {
          logger.error(s"Cannot convert ${event} to Rating. Exception: ${e}.")
          throw e
        }
      }
    }
    // MODIFIED
    // sum all values for the same user id and item id key
    .reduceByKey { case (a, b) => a + b }
    .map { case ((uid, iid), r) =>
      Rating(uid, iid, r)
    }.cache()

    new TrainingData(ratingsRDD)
  }
}

You may put the view count aggregation logic in ALSAlgorithm's train() instead, depending on your needs.

Then, we can modify ALSAlgorithm.scala to call ALS.trainImplicit() instead of ALS.train():

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
class ALSAlgorithm(val ap: ALSAlgorithmParams)
  extends PAlgorithm[PreparedData, ALSModel, Query, PredictedResult] {

  ...

  def train(sc: SparkContext, data: PreparedData): ALSModel = {

    ...

    // MODIFIED
    val m = ALS.trainImplicit(
      ratings = mllibRatings,
      rank = ap.rank,
      iterations = ap.numIterations,
      lambda = ap.lambda,
      blocks = -1,
      alpha = 1.0,
      seed = seed)

    new ALSModel(
      rank = m.rank,
      userFeatures = m.userFeatures,
      productFeatures = m.productFeatures,
      userStringIntMap = userStringIntMap,
      itemStringIntMap = itemStringIntMap)
  }

  ...

}

Now the recommendation engine can train a model with implicit preference events.