Recommendation Systems Blog Post
Recommendation systems handle the problem of information overload that users normally encounter by providing them with personalized, exclusive content and service recommendations. Recently, various approaches for building recommendation systems have been developed, which can utilize either collaborative filtering, content-based filtering or hybrid filtering.
Collaborative Filtering
Collaborative filtering is the most mature and the most commonly implemented. Collaborative filtering recommends items by identifying other users with similar taste; it uses their opinion to recommend items to the active user.
Collaborative filtering technique works by building a database (user-item matrix) of preferences for items by users. It then matches users with relevant interest and preferences by calculating similarities between their profiles to make recommendations [43]. Such users build a group called neighborhood. A user gets recommendations to those items that he has not rated before but that were already positively rated by users in his neighborhood.
Recommendations that are produced by CF can be of either prediction or recommendation. Prediction is a numerical value, Rij, expressing the predicted score of item j for the user i, while Recommendation is a list of top N items that the user will like the most.
Content-Based Filtering
Content-based techniques match content resources to user characteristics. Content-based filtering techniques normally base their predictions on user’s information, and they ignore contributions from other users as with the case of collaborative techniques.
Despite the success of these two filtering techniques, several limitations have been identified. Some of the problems associated with content-based filtering techniques are limited content analysis, overspecialization and sparsity of data [12]. Also, collaborative approaches exhibit “cold-start”, sparsity and scalability problems. These problems usually reduce the quality of recommendations. In order to mitigate some of the problems identified.
These techniques make recommendations by learning the underlying model with either statistical analysis or machine learning techniques. Content-based filtering technique does not need the profile of other users since they do not influence recommendation. Also, if the user profile changes, CBF technique still has the potential to adjust its recommendations within a very short period of time. The major disadvantage of this technique is the need to have an in-depth knowledge and description of the features of the items in the profile. However, content based filtering techniques are dependent on items’ metadata. That is, they require rich description of items and very well organized user profile before recommendation can be made to users.
Hybrid Filtering
Hybrid filtering, which combines two or more filtering techniques in different ways in order to increase the accuracy and performance of recommender systems, combines two or more filtering approaches in order to harness their strengths while leveling out their corresponding weaknesses.
Developing a recommendation system involves the following processes:
Information collection phase
Explicit feedback
Implicit feedback
Learning phase
Recommendation phase
Examples
Collaborative Filtering
Let’s say you want to build a mini Netflix platform with six different movies and five users. The users have already watched and rated some of these films on a scale from one to five (five if they loved it; one if they hated it). You can now use a collaborative filter system to decide which films to recommend. You write down the ratings in a table with the columns corresponding to the films and the rows to the users. In mathematics, this list-like structure with numerical entries is called a matrix.
matrix.
This matrix presents the ratings of five users for a hypothetical streaming platform with six films.
Because not every person has seen all six films, many fields are empty. This is where recommendation algorithms struggle most: they have to draw the most accurate conclusions possible based on sparse data. For example, to give User 1 a recommendation, you could try to pick another user with similar taste. But how do you determine this similarity? To define how far or close the preferences of two people may be, you can fall back on the mathematical discipline of measure theory. For your mini Netflix system, you could assign each user a list of numbers with their corresponding ratings (called a vector). In that way, you have five straight lines, one for each user, located in a six-dimensional space (with one dimension for each film). To determine the similarity of two vectors is to determine the angle they make with each other. This quantity is called cosine similarity.
To mathematically determine the similarity in taste between two people, each user can be treated like a vector and the similarity can be measured as the angle between these lines. In the example above, User 1 and User 2, as well as User 1 and User 3, can be compared because they rated some of the same films. User 1 and User 2 both rated Oppenheimer well. User 1 and User 3, on the other hand, came up with different results for Interstellar and Indiana Jones. To calculate the angle between two vectors, you multiply them together using the scalar product and then divide by the two vector lengths. Doing this for the above example shows that the angle between User 1 and User 2 is smaller than that between User 1 and User 3. In short, User 1 and User 2 seem to have more similar taste than User 1 and User 3. Because User 2 liked Barbie and User 1 hasn’t seen this film yet, you can suggest it to User 1.
The most serious disadvantage of collaborative filter systems is insufficient data. That’s why Netflix often asks you to rate content you have already seen when you register. But even that approach has its pitfalls: just because I liked Oppenheimer does not mean I like all historical films, as may be the case for another Oppenheimer fan. In addition, some platforms use other data—such as age, gender or online behavior—to filter for similar interests. For example, some providers track how long users look at certain content or the other websites people visit. This information results in an immense, ever changing matrix that grows with each new user or product. For optimal results, you have to constantly reevaluate the matrix. This task pushes the computing capacity limits even of large companies, such as Netflix and Amazon.
To spot patterns in this massive amount of data, companies use common methods from linear algebra, such as singular value decomposition or principal component analysis. The idea is to express the matrix as a product of simpler matrices—similar to the prime factorization of a number. The simpler matrices also contain information about user preferences, which is more easily accessible. With this approach, one can approximate nonessential information corresponding to small numerical values in the matrices by zero. Multiplying the approximated simple matrices back together yields a new matrix that is similar to the original but has a much simpler shape. A computer can better process it to issue recommendations.
Content-Based Filtering
Instead of just linking users to each other, you can also link products with other products. Amazon introduced such a system in 2003. To build a mini Netflix platform on this principle, you would reverse your table: the rows would correspond to the films and columns would correspond to the users. To add a missing rating—for example, “How will User 1 like Barbie?”,—look for similar films. For instance, if Oppenheimer and Dune were rated similarly to Barbieby other users, this content could be considered similar.
Amazon has found success with this system and continued to develop it. The relationship between products is central: running shoes are often associated with sportswear and water bottles, for instance. Combining this with other approaches leads to even more powerful predictions on the website.
While the collaborative approach relies on a lot of user data, content-based recommendations focus on the products being recommended. One can categorize movies by genre, directors, actors, length, and so on. This step is partially automated. By comparing a user’s preferences for relevant categories, recommendations can quickly be made. If a person sees the science fiction film Interstellar and then watches Barbie, co-starring actor Ryan Gosling, a content-based system might recommend Blade Runner 2049, a science-fiction film with Gosling. You can also use the cosine similarity here to compare content you have already seen with other products. The advantage of this method is that you don’t need to explicitly rate a user. It is more important to properly characterize the products—a task that algorithms can take over.
Summary
Most recommendation algorithms now use hybrid approaches composed of collaborative and content-based systems. Netflix makes recommendations based on user behavior and similarity to other users, but it also takes into account preferences in terms of genre, actors, year of release and other attributes. Additionally, the platform evaluates what time you prefer to use it, how long you like to do so and which device you employ.
Comments