I've been on Last.fm since 2006. I have most of the music I listened to in the last 15 years listed there. I built a website off my terrible music taste and I would definitely define myself a bit of a statistics nerd.

That would kinda explain why this summer I got really annoyed when I discovered that last.fm and spotify were both "scrobbling" what I was listening to. I noticed it by accident, I don't even remember what I was doing, but I realised that it had gone on for a few months.

I fixed it by manually deleting the duplicated entries. It took me a bit, but I managed to get through it quite easily, it was really just a few hundreds.


Today I saw a tweet by a friend claiming that "once I enjoy a song or album, I listen to it for months and nothing else".

Mood.

I wanted to find something that would highlight how similar my music listening patterns are, and I started digging in my Last.fm account. As soon as I clicked on one of my most listened tracks, my eye immediately fell on the duplicated timestamps.

Damn

And this was way before the previous issue, so it's likely some other apps were misbehaving too.


I googled a bit for solutions, but apparently there isn't much out there: I found a web app that was meant to fix the problem. Alas, and I don't remember where I read it, Last.fm in ~2016 removed from the public API the endpoint to delete scrobbles, and since then the functionality is only available on their website.

The tool helped me realise I had a bit of a problem though: not even half way processing my history, it identified 10k duplicates, not something I could deal with manually.


The obvious solution was to let it go.
I'm a bit more stubborn than that.

I found a nice little bookmarklet on github – https://github.com/shevchenkoartem/lastfm-smart-deduper – which can be used in the url bar as well as in the console to automatically remove duplicated scrobbles from any given page of the library. I choose the balanced version as I didn't want to remove false positives – afterall I'm often listening to the same song on loop and unless the timestamp is almost the same it's just me being me.

Having to go through 5.5k pages, made the solution nor really feasible manually, but that was a start.


Cypress

Here comes Cypress.

Assuming that you are familiar with node and javascript, getting started with Cypress should be somewhat straightforward. Running:

> npx cypress open

is enough to create a folder structure and a few examples.

I removed all the boilerplate tests and created a new one.

describe('Last.fm', () => {
  it('delete duplicate scrobbles', () => {
    // In order to delete my scrobbles I need to be logged in
    cy.visit('https://last.fm/login')
    // who doesn't love cookie management
    cy.get('#onetrust-accept-btn-handler').click()
    // let's login
    cy.get('#id_username_or_email').type('cedmax')
    cy.get('#id_password').type(Cypress.env('PASSWORD'))
    cy.get('#login .btn-primary').click({ force: true })
    // I'm IN, that wasn't too bad   
  })
})

Once logged in, I needed to go to the last page of my library, run the script to dedupe the scrobbles in the context of the page and then move to the previous page.

Something on the lines of

let counter = LAST_PAGE;

while (counter) {
  // let's go to the page and give it a bit of rest before injecting the script
  cy.visit(`https://www.last.fm/user/cedmax/library?page=${counter}`)
  // now let's execute the bookmarklet code in the context of the page.
  // that's probably the only complex thing about cypress and most
  // browser automation tools: there are 2 contexts of execution – the test 
  // and the browser. In order to run code in the browser most libs offer
  // some method such as the one provided by Cypress
  // The try catch is just in case.
  cy.window().then(win => {
    win.eval('try{(function(){function a(a){const b=document.createElement("button");return b.innerText=l,b.onclick=a,b.style.background="#D92323",b.style.color="white",b.style.padding="5px",b}function b(b,c,d,e){return{songId:b,date:c,isDuplicate:!1,isForcedDeleted:!1,isRemainedOriginal:!1,extraDeleteButton:null,get hasFinalState(){return this.isForcedDeleted||this.isRemainedOriginal},deleteMe:function(b){if(this.isDuplicate=!0,e.style.background="#F2CDCF",!b){const b=e.querySelectorAll("button"),c=b[b.length-1];c.innerText!==l&&(this.extraDeleteButton=a(function(){d.click()}),e.appendChild(this.extraDeleteButton))}else(function(){d.click()})(),this.isForcedDeleted=!0,null!=this.extraDeleteButton&&(e.removeChild(this.extraDeleteButton),this.extraDeleteButton=null)},forcedDeleteIfDuplicated:function(){this.isDuplicate&&!this.isForcedDeleted&&this.deleteMe(!0)},remainMe:function(){e.style.background="#DDFFD9",this.isRemainedOriginal=!0}}}function c(a){k.push(a),k.length>e&&k.shift()}function d(){for(let a=0;a<k.length-1;++a){const b=k[a];for(let c=a+1;c<k.length;++c){const d=k[c];if(b.hasFinalState&&d.hasFinalState)continue;const e=Math.abs(d.date-b.date)/1e3/60,i=e<=g;i&&(b.forcedDeleteIfDuplicated(),d.forcedDeleteIfDuplicated());const j=b.songId===d.songId,l=c===a+1;j&&(e<=f||l&&h)&&(!d.isForcedDeleted&&(!d.isDuplicate||i)&&d.deleteMe(i),!b.isRemainedOriginal&&!b.isDuplicate&&b.remainMe())}}}const e=4,f=15,g=1,h=!1,k=[],l="Delete",m=document.querySelectorAll(".chartlist-row");for(let a=m.length-1;0<=a;--a){const f=m[a],g=f.querySelector(".chartlist-name").querySelector("a").innerText,h=f.querySelector(".chartlist-artist").querySelector("a").innerText,i=f.querySelector(".chartlist-timestamp").querySelector("span").getAttribute("title"),j=new Date(i.replace("pm"," pm").replace("am"," am")),l=f.querySelector(".more-item--delete"),n=b(g+" - "+h,j,l,f);c(n),(k.length===e||0===a)&&d()}})();}catch(e){}')
  })
  counter--
 }

It seems all good, but there are 3 critical problems with this:

  1. Some third party scripts in the library page were erroring, failing the test (cypress is still a test tool afterall and that's an healthy default behaviour).
  2. the browser would unload the page to go to the next one before the calls to the deletion endpoint were done and hence some of them got cancelled
  3. cypress was failing after roughly 20 pages (and 5.5k / 20 = A LOT OF TIME)

Fixing the third party scripts errors

The solution to the first problem was incredibly trivial: I found an answer in the official documentation:

Cypress.on('uncaught:exception', (err, runnable) => {
  // returning false here prevents Cypress from
  // failing the test
  return false
})

Adding these few lines at the top of my script made sure that uncaught exceptions were ignored and didn't fail the test.


Waiting for the delete calls

This one was a little trickier. At the beginning I tried with an arbitrary wait, but to be honest that didn't leave any guarantee and seemed the wrong way to do it anyway – or it might end up being waiting for too long on an already long execution.

So I dug deeper and discovered that Cypress offers an interceptor out of the box and you can wait for the actual call to happen.

cy.intercept('/user/cedmax/library/delete').as('deleteCall')

// [...]

cy.wait('@deleteCall')

The problem with this is that the intercept waits for one call, the first, to the matching end point, and with a variable number of calls per page there's a quite real chance that some of the calls could be cancelled on page unload because the script would move on too soon.

Luckily there's a plugin solution to answer specifically this need: https://github.com/bahmutov/cypress-network-idle

import 'cypress-network-idle'

// [...]

cy.waitForNetworkIdle('/user/cedmax/library/delete', 500)

This would guarantee that there's no pending calls to the delete endpoint before moving on to the next page.


Browser crashing

Brave, chosen to filter out some ads / marketing calls, was struggling to get more than 20/30 pages without showing me the snap "folder of death"

I googled a bit and found a couple of solutions that moved me to ~500 pages before the next error.

The first was to edit the cypress.jsonfile and add

{
  "numTestsKeptInMemory": 0
}

This is "the number of tests for which snapshots and command data are kept in memory". The default is 50, not that it makes a difference as there's only 1 test, but since it's a bulky one with hundreads of commands, turning this number to 0 helped the memory consumption.

But it was the second trick that really made a difference.

By default running Cypress tests in a real browser (as opposed to a headless solution) means that the the app opens with a nice UI, that includes a log of the commands

The log grows over time and the DOM after a a while become too big for brave to handle. Clicking the arrow to fold the section sorted the problem.

After ~500 pages the failure would happen again, but that's definitely more manageable than having to re-run it every 20 and the CLI was quite useful to understand where the script stopped anyway


And so this is the final script:

import 'cypress-network-idle'

Cypress.on('uncaught:exception', (err, runnable) => {
  // returning false here prevents Cypress from
  // failing the test
  return false
})

let counter = 2312

describe('Last.fm', () => {
  it('delete duplicate scrobbles', () => {
    cy.viewport(1000, 1500)
    cy.visit('https://last.fm/login')
    cy.get('#onetrust-accept-btn-handler').click()
    cy.get('#id_username_or_email').type('cedmax')
    cy.get('#id_password').type(Cypress.env('PASSWORD'))
    cy.get('#login .btn-primary').click({ force: true })
    while (counter) {
      cy.visit(`https://www.last.fm/user/cedmax/library?page=${counter}`)
      cy.window().then(win => {
        win.eval(
          'try{(function(){function a(a){const b=document.createElement("button");return b.innerText=l,b.onclick=a,b.style.background="#D92323",b.style.color="white",b.style.padding="5px",b}function b(b,c,d,e){return{songId:b,date:c,isDuplicate:!1,isForcedDeleted:!1,isRemainedOriginal:!1,extraDeleteButton:null,get hasFinalState(){return this.isForcedDeleted||this.isRemainedOriginal},deleteMe:function(b){if(this.isDuplicate=!0,e.style.background="#F2CDCF",!b){const b=e.querySelectorAll("button"),c=b[b.length-1];c.innerText!==l&&(this.extraDeleteButton=a(function(){d.click()}),e.appendChild(this.extraDeleteButton))}else(function(){d.click()})(),this.isForcedDeleted=!0,null!=this.extraDeleteButton&&(e.removeChild(this.extraDeleteButton),this.extraDeleteButton=null)},forcedDeleteIfDuplicated:function(){this.isDuplicate&&!this.isForcedDeleted&&this.deleteMe(!0)},remainMe:function(){e.style.background="#DDFFD9",this.isRemainedOriginal=!0}}}function c(a){k.push(a),k.length>e&&k.shift()}function d(){for(let a=0;a<k.length-1;++a){const b=k[a];for(let c=a+1;c<k.length;++c){const d=k[c];if(b.hasFinalState&&d.hasFinalState)continue;const e=Math.abs(d.date-b.date)/1e3/60,i=e<=g;i&&(b.forcedDeleteIfDuplicated(),d.forcedDeleteIfDuplicated());const j=b.songId===d.songId,l=c===a+1;j&&(e<=f||l&&h)&&(!d.isForcedDeleted&&(!d.isDuplicate||i)&&d.deleteMe(i),!b.isRemainedOriginal&&!b.isDuplicate&&b.remainMe())}}}const e=4,f=15,g=1,h=!1,k=[],l="Delete",m=document.querySelectorAll(".chartlist-row");for(let a=m.length-1;0<=a;--a){const f=m[a],g=f.querySelector(".chartlist-name").querySelector("a").innerText,h=f.querySelector(".chartlist-artist").querySelector("a").innerText,i=f.querySelector(".chartlist-timestamp").querySelector("span").getAttribute("title"),j=new Date(i.replace("pm"," pm").replace("am"," am")),l=f.querySelector(".more-item--delete"),n=b(g+" - "+h,j,l,f);c(n),(k.length===e||0===a)&&d()}})();}catch(e){}'
        )
      })
      cy.waitForNetworkIdle('/user/cedmax/library/delete', 500)
      counter--
    }
  })
})

NB: I only tested this on MacOs Big Sur, Node v12.14.0, Cypress v8.7.0 and cypress-network-idle v1.3.3. No guarantee it would work with a different setup.