Listwise Recommendation Approach with Non-negative Matrix Factorization

. Matrix factorization (MF) is one of the most eﬀective categories of recommendation algorithms, which makes predictions based on the user-item rating matrix . Nowadays many studies reveal that the ultimate goal of recommendations is to predict correct rankings of these un-rated items. However, most of the pioneering eﬀorts on ranking-oriented MF predict users’ item ranking based on the original rating matrix, which fails to explicitly present users’ preference ranking on items and thus might result in some accuracy loss. In this paper, we formulate a novel listwise user-ranking probability prediction problem for recommendations, that aims to utilize a user-ranking probability matrix to predict users’ possible rankings on all items. For this, we present LwRec , a novel listwise ranking-oriented matrix factorization algorithm. It aims to predict the missing values in the user-ranking probability matrix, aiming that each row of the ﬁnal predicted matrix should have a probability distribution similar to the original one. Extensive oﬄine experiments on two benchmark datasets against several state-of-the-art baselines demonstrate the eﬀectiveness of our proposal.


Introduction
Conventional recommendation algorithms like collaborative filtering follow a rating-oriented paradigm.They generally learn a recommendation model with users' observed historical ratings, using which they predict users' ratings on their unrated items.Nowadays, ranking-oriented recommender systems are receiving increasing attention from both academic communities and industry.Many studies reveal that the ultimate goal of recommendations is to predict correct rankings of these unrated items, and prediction of accurate ranking is more important than predicting accurate rating scores [1,2].Accurate prediction of ratings does not necessarily imply improvement in the ranking results.
To elaborate, let us consider items: {a, b, c} with their correct ratings R : {5, 4, 3} and two rating predictions P 1 : {3, 4, 5} and P 2 : {3, 2, 1}.A rating oriented approach would prefer P 1 over P 2 , since predicted ratings in P 1 are more accurate, being closer to the ratings in R.However, the order in P 1 (a < b < c) is completely opposite to the desired order (a > b > c).In contrast, a ranking oriented approach would prefer P 2 , as it predicts the correct ranking, i.e. (a > b > c).This would generate desirable results (correct order of items), in spite of having lower accuracy in predicted ratings.
Given the above argument, some pioneering efforts on ranking-oriented recommendation algorithms have been proposed.Due to the effectiveness of matrix factorization (MF) algorithms in rating-oriented recommender systems, a few ranking-oriented MF algorithms have been presented, reporting state-of-the-art results.However, most of the pioneering efforts on ranking-oriented MF predict users' item ranking based on rating scores, failing to explicitly present users' preference ranking on items and thus possibly resulting in some accuracy loss.
Therefore, we define a novel listwise user-ranking probability prediction problem for recommendations.We utilize the listwise user-ranking probability matrix [3] to explicitly characterize users' preference on items.Given a set of rating scores on items, each ranking on items might be possible, where "correct" rankings (higher scores are ranked at top positions) receive greater probabilities.Thus for each user, the probabilities on users' all possible rankings could formulate as a user-ranking probability matrix, where each element presents a probability that certain user holds certain ranking on items.Thus each row of the user-ranking probability matrix consists of users' probabilities on different item rankings, forming a distribution.With the initial probabilities of users' possible rankings on their rated items, the listwise user-ranking probability prediction problem aims to predict probabilities of users' possible rankings on all items.Meanwhile, the predictions should satisfy the requirements of probability distributions: each element in the probability matrix should be between 0 and 1, and sum of each row should be 1.
Given a collection of items, there might be a very large number of possible rankings (i.e.n! rankings for n items), resulting in a extremely huge ranking probability matrix and calculations in training.In this study, we only consider the top-k ranked items in rankings, and the size of the matrix could be shrinked significantly, especially the size of the ranking probability matrix is equal to that of the user-item rating matrix when k = 1.Based on this matrix, we then present LwRec, a novel listwise ranking-oriented MF algorithm, which minimizes the difference between the initial distribution on the known rankings and the final distribution on all items with predictions for each user.Considering the non-negative property for each element, we adapt non-negative MF to implement LwRec.Our experimental results on benchmark datasets demonstrate significant performance gains over state-of-art recommender algorithms.
To summarize, our contributions are as follows.(1) We define a novel listwise user-ranking probability prediction problem for recommendations.(2) We present an effective algorithm to solve the problem based on non-negative MF.
(3) We achieve significant performance gains against state-of-the-art recommendation algorithms on benchmark datasets.
The rest of the paper is organized as follows.Section 2 briefly presents the related work and Section 3 describes the problem formulation.Then, we explain the LwRec approach in Section 4 followed by experimental setup in Section 5. Finally, Section 6 presents the results and Section 7 concludes the paper.

Related Work
This section presents related work for collaborative filtering (CF) recommendation algorithms, which use only the ratings given by the users for the items, and do not need the domain knowledge.They are mainly of two types: rating oriented and ranking oriented.While rating oriented algorithms predict unknown item ratings for each user, ranking oriented algorithms predict item rankings.Both of them can be further categorized as memory-based or model based.
Rating Oriented Algorithms: Memory-based rating oriented algorithms are either user-based CF [4], that utilize similarities between users on the basis of available ratings; or item-based CF [5], that utilize similarities between items.Various advanced versions of this approach have been introduced.For example, SLIM [6] directly learns from the data, a sparse matrix of aggregation coefficients that are analogous to the traditional item-item similarities.FISM [7] learns the item-item similarity matrix as a product of two low-dimensional latent factor matrices.Model-based rating oriented algorithms aim to predict ratings by learning a model from observed ratings.Traditional model of this type is matrix factorization (MF) [8], that uses dimensionality reduction to decrease the distance between predicted and observed rating matrices.Some of the models that are based on matrix factorization are: Probabilistic MF [9], Non-negative MF [10], Factorization Machines [11], Hierarchical Poisson MF [12] and LLORMA [13].
Ranking Oriented Algorithms: EigenRank [14] is a well known ranking oriented memory based CF algorithm that follows the pairwise approach.It employs a greedy aggregation method to aggregate predicted pairwise preferences of items into total ranking.VSRank [15] represents users' pairwise preferences for items by using vector space model and utilizes the relative importances of each pairwise preference.Moreover, various model-based ranking oriented CF algorithms have been introduced that try to optimize a ranking oriented objective function.Some of the notable algorithms of this type are: CLiMF [16], CoFiRank [17], ListCF [3] and GBPR [18].

User-Ranking Probability Matrix
Considering m users and n items, for each user there are obviously n! possible rankings of items.Given a set of rating scores on items, each ranking on items might be possible, where "correct" rankings (higher scores are ranked at top positions) receive greater probabilities.The probability of item rankings could be derived with the Plackett-Luce model [19], which is a widely used permutation (each permutation is actually a ranking) probability model in various domains.Each ranking ρ can be represented as an ordered list (ρ 1 , ρ 2 . . .ρ n ), where ρ i represents the item at the ith position, and positions of the items are unique.Hence, the probability of the ranking ρ can be calculated as: where r ρi is the rating for the item ρ i and γ(r) = e r .Since, there are n! rankings of items, which is a large number of rankings even for a small value of n, it makes the computation impractical.Hence, we employ the same approach as Huang et al. [3], that uses an alternative efficient method introduced by Cao et al. [20].The approach focuses only on top k items in the rankings, leading to n! (n−k)!different top k sets.So, the probability of the rankings ρ S whose top-k items are exact S = {i 1 , i 2 . . .i k } can be calculated as: We have m users and for each user we have p = n! (n−k)!ranking sets for top k items.Now, we construct the user-ranking probability matrix Θ m×p .In Θ, each row corresponds to a particular user and contains the probabilities for the p rankings.To clarify, if P rob ui (S j ) represents the probability of ranking S j calculated for the user u i (where 1 ≤ j ≤ p and 1 ≤ i ≤ m), then: Θi,j =P robu i (Sj), if ratings of all items in Sj are known, Especially when k = 1, i.e. when we consider only the top-1 items in all rankings, the size of Θ is equal to that of the user-item rating matrix.This is because, in this case, p = n! (n−1)!= n.

Objective and Constraints
Given the matrix of known top k probabilities of items: Θ m×p , where p = n! (n−k)!, we aim to predict the unknown probabilities, that in turn can be used to generate recommendations.This can be achieved by using a listwise loss function and optimizing it using matrix factorization.For this, we define the following objective and the two related constraints: Objective: Using two matrices U z×m and G z×p that construct the predicted probability matrix U G, utilize a listwise loss function and matrix factorization to minimize the distance between Θ and U G. C1: Values in U G should be in the range 0 to 1 (as they are probabilities).i.e. 0 ≤ U ij ≤ 1, ∀i = 1 . . .z and ∀j = 1 . . .m. C2: Sum of each row of U G should be 1 (as a row contains probabilities of rankings for a particular user, that should sum up to 1).i.e.
Definition 1 (Listwise User-Ranking Probability Prediction).Given a user-ranking probability matrix Θ, where each observed element of Θ i,p indicates certain user's probability for her certain (top-k) preference ranking on her rated items, and each row of Θ forms a probability distribution.The listwise userranking probability prediction problem aims to predict each user's probability of her top-k preference ranking on all items, where each row of U G forms a probability distribution as well after prediction, and each user's two distributions, observed and predicted, should be as similar as possible.Formally, Here diff (Θ i , (U G) i ) is the difference between two distributions Θ i and (U G) i , i.e. the ith row of the user-ranking probability matrix before and after prediction.

Prediction Method
In this section, we present LwRec to solve our listwise user-ranking probability prediction problem.We use Kullback-Leibler divergence [21], a commonly used measure for calculating difference between probability distributions, to compute diff (Θ i , (U G) i ).In LwRec, we utilize non-negative matrix factorization (MF) [10] to implement our proposed algorithm, which can generate non-negative elements for U and G. Thus the elements of the user-ranking probability matrix U G are all non-negative.In order to satisfy the constraint C2, we introduce a collection of Lagrange penalty terms in the objective function p j=1 (U G) ij = 1 where i = 1 . . .m.In standard Lagrange methods, the coefficients of Lagrange penalty terms could be either positive or negative, but in non-negative MF, all of the parameters have to be non-negative.Thus we introduce two non-negative vectors α and β, and regard (α i − β i ) that can be either positive or negative, as the coefficient of the ith Lagrange penalty term to formulate our loss function.Moreover, addressing constraint C2 together with ensuring that the values in U G are non-negative, also satisfies constraint C1 (i.e.values in U G should be in range 0 to 1).The formulation of our loss function can be presented formally as follows: In Equation 5, the first term represents the main optimization objective from Equation 4, which defines the divergence between U G and observed probability matrix Θ.The second term is the weighted cumulative Lagrange penalty term for constraint C2.The last two terms are l2-norms of U and G to avoid overfitting, where λ 1 and λ 2 are the respective coefficients.By expanding the log in L(U, G, α, β) and considering that the sum of each row in Θ is 1, the function can be reformulated as: To minimize the loss function using gradient descent, we compute its gradients with respect to the variables U , G, α and β and derive the following updates: where, η u , η g , η α and η β are the step sizes.Now, using non-negative matrix factorization [10], we choose the step sizes such that: Substituting these values of the steps in updation formulas in Equations 7, we derive the following multiplicative updates: 1. CF: CF [22] calculates the similarity between users, and ranks the items according to the predicted ratings for each user.

Matrix Factorization (MF):
User matrix U and item matrix I are optimized in MF [8], to minimize the difference between their product UI T and rating matrix R. UI T regenerates the rating matrix to predict unknown ratings.
3. EigenRank: EigenRank [14] is a pair-wise ranking-oriented algorithm that employs a greedy aggregation method to aggregate the predicted pairwise preferences of items into total ranking.4. ListRankMF: ListRankMF [23] minimizes a loss function representing uncertainty between training and output lists produced by a MF ranking model.

FISM:
Factored Item Similarity Models (FISM) [7] learn the item-item similarity matrix as a product of two low-dimensional latent factor matrices.While FISMrmse computes loss using sqaured error loss function, FISMauc considers a ranking error based loss function.

LLORMA:
Local Low-Rank Matrix Approximation (LLORMA) [13] approximates the observed matrix as a weighted sum. 7. ListCF: ListCF [3], a ranking oriented CF algorithm, predicts item order for a user, based on similar users probability distributions over item permutations.

Evaluation Metrics.
We use the standard ranking accuracy metric called normalized discounted cumulative gain (NDCG@1-10) [24] that is able to handle multiple levels of relevance, to evaluate item rankings generated by LwRec and the baselines.Statistical significance of observed differences between the performance of two runs is tested using a two-tailed paired t-test and is denoted using (or ) for strong significance for α = 0.01; or (or ) for weak significance for α = 0.05.

Results
In Table 1, we can see that LwRec outperforms the comparison partners for all the metrics (NDCG@1 to 10) for MovieLens-100K as well as MovieLens-1M.ListCF is the second best followed by LLORMA and FISMrmse, for both datasets.For MovieLens-100K, EigenRank and ListRankMF have comparable performances followed by MF and FISMauc.For MovieLens-1M, ListRankMF performs better than FISMauc followed by MF.
We also calculate statistical significance of LwRec against ListCF which is our best performing comparison algorithm.The results for MovieLens-100K show weak to strong statistical significance for most metrics and for MovieLens-1M the results have strong statistical significance in almost all cases.

Conclusion
In this paper, we defined a novel listwise user-ranking probability prediction problem.Then we described LwRec, a listwise recommendation algorithm, that solves the problem by minimizing a listwise loss function using non-negative matrix factorization.Our experimental results on benchmark datasets show significant performance gains of LwRec over state-of-the-art recommender algorithms.
In this study, we have experimented for top k item rankings, for k = 1.In the future, we would like to explore the effect on results on using higher value of k.Moreover, we have used column length 10 for the matrices U and G.It would be interesting to see the changes in results on varying this column length.