1

Song Info Being Deleted

I've noticed that, quite often with this website, all the different genres, styles, and other related information for various song will, at random points, be completely deleted. Can I ask why this is, and honestly, can I ask that you please stop? It's pretty irritating if you want to check up on a song's style and for no reason all the info is gone.

3 replies

Richard, are you referring to user-contributed attributes (genres, styles, moods, themes) for songs?

RL

Hi, yes I was.

We are not specifically going in and deleting info about musical styles and genres.

If you could provide some examples maybe we could try to track down the issue.

RL

So, take this Depeche Mode album for example (https://www.allmusic.com/album/the-singles-8698-mw0000044407). Here, when you click the individual songs on the track listing, their attributes are all missing. However, I had viewed these songs a few months ago and I remember the attributes being there (and in fact checking on the Wayback Machine backs this up). Similarly, I believe your listing for Fleetwood Mac's Rumours is having the same issue.

(Sorry for assuming that you were specially deleting them, btw. I just was feeling annoyed by it and needed to raise the issue.)

RL

So have you been able to track down the issue?

Yes and no. It gets complicated, but we're working toward a solution.

Our "Song" data is pretty convoluted. From our data provider we receive tracks attached to albums. These are each individually listed as the song title, the performer, and the composer(s).

From this info, we create a "Song" entity (which is the text string of the title + the composers) and we also create a "Performance" entity (the now-created Song ID + performer names). So "Go Your Own Way" written by Lindsay Buckingham is a "Song" and "Go Your Own Way" performed by Fleetwood Mac is a "Performance."

Then we try to merge all of the instances of the Song together, and also merge all of the Performances of that Song together.

So whether you click on "Go Your Own Way" from Rumours or from Fleetwod Mac's Greatest Hits, you still arrive at a page that has as much info as possible.
https://www.allmusic.com/album/rumours-mw0000193833#trackListing
https://www.allmusic.com/album/greatest-hits-warner-bros--mw0000652803#trackListing

Now.

What we've discovered is that in our Performance ID creation process, something hiccuped and now we have two different performance IDs for "Go Your Own Way."

There is a new ID that doesn't have genres, styles, moods and themes: https://www.allmusic.com/song/go-your-own-way-mt0000587701#genreStyles

and an older ID that still retains that song info: https://www.allmusic.com/song/go-your-own-way-mt0005540011#genreStyles

The good news is that the info is still available to us in our system. Nothing is deleted.
The bad news is that we have a lot of data to clean up so it presents properly again.

So we're currently working to figure out what interrupted our usual databuilding process, and how to properly merge these diverged Performances together (without losing any of the info).
Multiply this by tens of millions of tracks and you can see the project we have in front of us.

Thanks for the heads-up.

I'll keep you posted on our progress.

RL

Great! Thank you!

We've been able to merge the different song and performance entries that had created the blank records.

https://www.allmusic.com/song/stripped-mt0006111845

We're still doing some cleanup work but most of the songs that were missing descriptors have been recovered.

Thanks.

RL

Great! You madlads actually did it! Thanks!

RL

Just one thing I want to add on to this is that I believe this isn't the first time this has happened. I think the website as a whole has a bad habit of doing this splitting at random times over the past few years.

RL

Hi Zac, so did you take into account what I had said about this not being the first time it's happened?

Hi Richard,

This was the first time we've had to do a mass patching and correction of the data in this way.
We don't really have a method of tracking back and seeing when data may or may not have become incorrect in the past.

I will say that the database is being constantly built, revised, added to, deleted from, and changed.

It is certainly possible that if we had one track called "Stairway to Heavn" and 970 tracks titled "Stairway to Heaven" if the typo was corrected by the data team, the single instance would be absorbed into the correct mass that is "Stairway to Heaven" and if there were data elements associated with the incorrect "Stairway to Heavn" those could be orphaned.

Similar things can happen if we have duplicate albums in the database. If the data team deletes the duplicate entry, some metadata that was associated with the removed album can be lost.

We'll keep an eye on it and see if we see any systemic splits in the future.

Thanks for bringing this to our attention.

RL

Hi, it's me again.

Unfortunately, I believe another systemic split may have occurred. Now, a lot of The Cure's songs have their attributes missing, at least.

I hope not. It was a real pain to get things synced back up again.

I've poked around and am not seeing a split in Cure songs (at least in my initial investigation).

Can you please let me know of some examples where you're seeing Cure songs that should have attributes but have gone missing?

Thanks.

Thanks for the examples and your patience. We've figured out the culprit and its a "good news/bad news" situation.

A refresher on our complex process:

Our "Song" data is pretty convoluted. From our data provider we receive tracks attached to albums. These are each individually listed as the song title, the performer, and the composer(s).

From this info, we create a "Song" entity (which is the text string of the title + the composers) and we also create a "Performance" entity (the now-created Song ID + performer names). So "Boys Don't Cry" written by Michael Dempsey, Robert Smith, & Lol Tolhurst is a "Song" and "Boys Don't Cry" performed by The Cure is a "Performance."

Along these same lines, "Boy's Don't Cry" performed by Grant-Lee Buffalo would be the same "Song," but a different "Performance"

We link these song descriptors (genres, moods, themes, moods) to the Performance ID.

In this situation, it looks like our data team discovered that there were two Name IDs in the system for Lol Tolhurst.

If you take a look at the web archive page for this song, and click on Lol Tolhurst, you'll see his name ID was mn0001776536
https://web.archive.org/web/20220101215038/https://www.allmusic.com/song/boys-dont-cry-mt0033378214

At some point, the duplicate ID was discovered and merged into the current ID MN0000117572.

This is good, because it cleans up duplicate names and streamlines the data.

This is also bad because at this point, the Song ID for "Boys Don't Cry" is now changed. Robert Smith+Michael Dempsey+Lol Tolhurst used to be MN0001033109MN0000885775MN0000117572 and now they are MN0001033109MN0000885775MN0001776536.

This creates a new Song ID which creates a new Performance ID and therefore the link between the old Performance ID and the Descriptor IDs is severed.

-----------------------------

Bottom line: it is an unfortunate side effect of our process to try to programmatically determine songs and performances.

I hate to lose this kind of information so maybe we can pull together a project to try to resurrect some of these orphaned descriptors and apply them when a new Song/Performance is created.