Hulkshare Recommendation Algorithm Challenge

Modern media platforms rely heavily on data gathered from their users' activities to recommend content that is appropriate for them. However, if users do not express whether they liked it or not, how can a platform accurately guess which content is enjoyable in order to provide the best service for them?

Hulkshare is hosting a competition to tackle this issue as a free internet radio since the vast majority of its listeners don’t indicate their preferences about the songs they’ve listened to. As with all recommender systems in our daily lives, this makes determining the possible content that the user might be interested in difficult. The competition's goal is to determine whether a song is "liked" by listeners without requiring any direct feedback from the user. The ratio of a song's "likes" to total view count can be a good indicator of whether or not a song is liked. However, the goal is to investigate whether it is possible to identify the underlying features that affect whether a user likes or dislikes a song. As a result, competitors are expected to use additional information to create their algorithms.

An example of how song recommender systems are working. If you and John both like similar songs, it’s quite probable that you might like the ones the other one likes. Source

The only key information about this determining process is contained within the user's "sessions." When users log in to Hulkshare, they create a new session in which they can listen to songs. These sessions can include X different songs as well as the same song repeated X times. This means that each session is a distinct stream of songs that can be further investigated to develop machine learning algorithms. To accelerate the process, the competitors are given tabular data about these sessions rather than song files. The important information is not only about the song combination, but also about a few key data features such as:

  • Ratio of session duration to song duration: If a user listens to a song for half of its session, it might be probable that they’ve liked the song. 

  • Frame by frame timeline of a session: If a user skips 80% of a song through their session, it might be probable that they did not like the song. The opposite can be argued as well for songs that were skipped backwards(indicating that the user wants to listen to it again).

  • Number of sessions for a song(past 6 months): If there are relatively high numbers of sessions including a song, it might be probable that it’s widely liked. Or it was quite controversial, who knows?

The competitors are expected to develop algorithms that estimate the number of likes a song has by connecting these features. Following that, they will use their estimates to calculate the ratio of likes to views (referred to as the "target") to create a proxy for whether or not the song was enjoyed. The host will then use their like button system to compare their findings to their own calculated ratio. The competition winner will be determined by their predictions of the target values of the given songs. Keep an eye on our website to see how we rank on the leaderboard! 

Previous
Previous

CityLearn Challenge

Next
Next

Multi Agent Behavior Challenge