5 Models for Engineering Personalized Digital Experiences (Part 2 – Spotify and Pinterest)
This article is the second of a three-part series on algorithms and predictive models for creating personalized digital experiences. Part 1 includes the introduction and case studies from Amazon and Netflix, while Part 3 provides a case study from Facebook.
In the first part of this series, we examined the algorithms and predictive models Amazon and Netflix to create personalized homepages with dynamic content.
Now in Part 2, we’ll tackle how Spotify employs algorithms to craft the perfect playlists and Pinterest’s use of machine learning for its home feed — mainly from the own words of the people responsible for designing these systems.
Ever wonder how Spotify crafts those Discover Weekly playlists that are handcrafted and personalized to your musical tastes? Well, it’s actually quite complicated.
Spotify’s approach mainly includes collaborative filtering (similar to Amazon and Netflix), convolutional neural networks, natural language processing (to both scan music blogs to build micro-genres and analyze the contents of playlists), and outlier detection.
Let’s start from the beginning, though.
Spotify maintains giant datasets of users, their listening habits, and clusters of micro-genres that define the types of songs listened to.
On first pass, a user’s behavior populates their individual user profile that is defined by the micro-genres of music they listen to and the proportion of each genre.
When populating a user profile, Spotify employs outlier detection to determine if songs listened to are part of part of a normal pattern of behavior. Through outlier detection, Spotify understands if any songs are aberrations — like if someone else plays DJ with your phone at a party — and makes sure not to weigh these in your user profile.
To determine genres that comprise a user’s profile, Spotify uses natural language processing to scan music news and blogs for how particular artists are mentioned and defined — with the commonalities used to connect artists and draw dotted lines between genres.
Once a user profile is populated, a collaborative filtering algorithm finds users that are similar to each other based on listening patterns. Users with similar tastes are used to develop recommendations for the playlist, as opposed to basing recommendations on the songs themselves. Think of this feature like Amazon’s “Customers Who Viewed This Item Also Viewed” recommendations and the other recommended song areas of Spotify.
Spotify Discover adds to collaborative filtering with ensemble method, a collection of machine learning practices to further refine recommendations and build those weekly playlists. Given that there are multiple ways for people to access songs, ensemble methods help rank and prioritize song selections.
Now, what if a user does not have a lot of behavior or listening to process? A common problem many algorithms and predictive models face.
For Spotify, this problem surfaces in two ways, if a user is new or if a song is obscure and/or new.
And to address these issues, Spotify uses convolutional neural networks to analyze the acoustics and structure of songs themselves and determine similarities.
Spotify also applies natural language processing (again) to songs to isolate similarities through Word2Vec. Word2Vec allows Spotify to take playlists and translate each song into a vector representation that can be used to match similar songs — in the same way natural language processing can find connections between synonyms without knowing the definition of every word.
Finally, to actually build a user’s Discovery Weekly playlist, Spotify takes all of these data inputs and recommendations and cross-references them with other playlists across the channel.
A recent Quartz article describes how taste profiles and playlists are used to generate Discover Weekly, “It gives extra weight to the company’s own playlists and those with more followers. Then it attempts to fill in the blanks between your listening habits and those with similar tastes. In the simplest terms, if Spotify notices that two of your favorite songs tend to appear on playlists along with a third song you haven’t heard before, it will suggest the new song to you.”
If you’re interested in digging deeper into Spotify’s Discovery Weekly playlists, Spotify engineers also shared many of the technical details in a presentation earlier this year.
The home feed on Pinterest is centered around creating a personalized experience by surfacing content that each user would be most interested in seeing — a shift away from the original chronological home feed.
The feed displays content based on people and boards a user follows, interests, and recommendations — among other inputs that will be discussed below.
The core feature of the smart feed is being able to present users with content that they haven’t already seen and hasn’t already been presented to them, all in an order that introduces the most relevant content.
To achieve an accurate predictive model, Pinterest breaks down the smart feed into four individual parts: pinnability, the smart feed worker, the smart feed content generator, and the smart feed service.
Pinterest engineers describe Pinnability as, “the collective name of the machine learning models developed to help Pinners find the best content in their home feed.”
Pinnability is basically part of the technology powered by the smart feed and is the system that develops the relevance score of how likely a user is to interact with a Pin. The accuracy predictions created by pinnability in turn prioritize Pins with high relevance scores and show them at the top of a user’s home feed.
In order to accurately predict how likely a Pinner will interact with a Pin, Pinterest uses machine learning models including logistic regression, support vector machines, gradient boosted decision trees and convolutional neural networks.
The three major components of pinnability are training instance generation, model generation, and home feed serving.
Training instance generation takes into consideration a broad spectrum of inputs to filter all available content down to what is most applicable to each user based on the “positive actions” and “negative action” taken on each Pin by that user.
Yunsong Guo, a software engineer on the Pinterest recommendations team further defines the mechanics of pinnability in writing:
“Our unique data set contains abundant human-curated content, so that Pin, board and user dynamics provide informative features for accurate Pinnability prediction. These features fall into three general categories: Pin features, Pinner features and interaction features:
- Pin features capture the intrinsic quality of a Pin, such as historical popularity, Pin freshness and likelihood of spam. Visual features from Convolutional Neural Networks (CNN) are also included.
- Pinner features are about the particulars of a user, such as how active the Pinner is, gender and board status.
- Interaction features represent the Pinner’s past interaction with Pins of a similar type.”
Pinnability model generation serves as the next level of filtration of what populates a user’s feed, which uses area under the ROC curve to determine the best online A/B tests.
Guo also explains how pinnability populates the smart feed, “When a new Pin is repinned, smart feed worker sends a request to the Pinnability servers for the relevance scores between the repinned Pin and all the people following the repinning Pinner or board. It then inserts the Pins with the scores to the pool that contains all followed Pins. PFY Pins are inserted into the PFY pool with the Pinnability relevance score in a similar fashion."
When a user logs on or refreshes home feed, smart feed content generator materializes the new content from the various pools while respecting the relevance scores within each pool, and the smart feed service renders the Pinner’s home feed that prioritizes the relevance scores.”
The smart feed worker
Given the sorting mechanism and predictive nature of pinnability, the smart feed worker determines what should appear in a user’s feed and what should be stored in a queue for later.
The worker manages all new incoming content, sorting them based on their input and pinnability.
Chris Pinchak from Pinterest further explains the smart feed worker by writing, “Incoming Pins are currently obtained from three separate sources: repins made by followed users, related Pins, and Pins from followed interests. Each is scored by the worker and then inserted into a pool for that particular type of pin. Each pool is a priority queue sorted on score and belongs to a single user. Newly added Pins mix with those added before, allowing the highest quality Pins to be accessible over time at the front of the queue.”
The smart feed content generator
The smart feed content generator establishes what new content is for the smart feed. Meaning, when a user accesses their home feed, the content generator determines the new Pins since their last visit.
As Pinchak states, “The generator decides the quantity, composition, and arrangement of new Pins to return in response to this request.”
Since pinnability and the smart feed worker determine the best content to service, the generator determines the best order of the best content available for each user.
Pins that are selected for a user’s feed are done so in chunks and once they’re selected, they’re removed from the possibility of further inclusion.
The smart feed service
The final piece of Pinterest’s smart feed is the service. The smart feed service combines new Pins returned by the content generator with those that previously appeared in the home feed — known as the materialized fee.
The materialized feed represents a frozen view of the feed as it was the last time the user viewed it. Since a materialized feed was most likely not highly interacted with, it’ll be deprioritized. However, it serves an important role in filling the smart feed, broadening what a user sees, and helping the smart feed continue to learn user preferences.
You can also join my newsletter to receive the latest posts, digital marketing news, a marketing book recommendation, and a custom playlist every month.