Wednesday, April 30, 2008

Duplicate songs on iTunes

Cav just stuck up a post where he was trying to figure out the chances of playing the same song twice in a session when you have a library of 1641 songs. I thought I'd have a shot at solving it, although my probability math isn't perfect either :-P. His question was particularly aimed at iTunes, but for our purposes we're going to assume that song selection is purely random (iTunes biases things somewhat in reality).

Obviously, for a 1 song session, the chances of playing the same song twice are zero. In a two song session, they are 1 in 1641, and in a 1,642 song session, the chances hit 100%. A 1,641 song session has an extremely small chance of playing each song exactly once. Those are some good reality checks on the probability. But how do you work it out for other session lengths?

Well, let's say I want to figure out the probability for a session of length n. We'll assume I know the probability for a session of length n-1, which we'll call p, and which is some value between 0 (no chance at all) and 1 (100% chance). I picture it something like this:


0....................0.5.....................1
ppppppppxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

This is kind of a random-o-meter. 'p' is the probability that we've already had a duplicate, and 'x' is what's left over. So, here's how I think of it: We're throwing a random dart at the meter. If it hits 'p', we had a duplicate even before we hit the current song. If we hit an x, we MIGHT still have a duplicate, because the current song might be the duplicate. What are the chances of that?

Well, we know that we've already played n-1 songs, and none of them were duplicates (or our dart would have landed on p). There are 1,641 songs to choose from, and there's an (n-1)/1641 chance that this time, we will pick a duplicate. So, if we land in the x region, then (n-1) in 1641 times, we still get a duplicate. So the probability is:

p: The chance that we have already played a duplicate
PLUS
The chance of hitting the x region (1-p) TIMES the chance of playing a duplicate (n-1)/1641

We already know the values for n=1 and n=2, but now we have some rules for figuring out additional values:

1: 0
2: 0.06%
3: 0.18%
4: 0.36%
5: 0.61%
10: 2.7%
20: 10.97%
30: 23.41%
50: 52.96%
100: 95.4%
200: 99.99997%

I hope that was enlightening.

5 comments:

Cavan said...

Thanks, dude.

Anonymous said...

"in a 1,642 song session, the chances hit 100%. A 1,641 song session has an extremely small chance of playing each song exactly once."

The first part here sounds like you're talking about a combination; the second like a permutation.

If you are indeed permuting (that is, a song may be played twice before every other song has been played), the chance of hearing the same song twice in a 1,642-song session isn't 100%.

If you're doing a combination, a 1,641-song session is guaranteed to play each song once by definition.

And you're kind of fuzzy about whether you're talking about repeating one particular song or repeating any song. This post didn't really make a lot of sense to me.

Shana said...

Oh.

Adam said...

Peter, I was referring to the odds that you would hear any of your songs twice while using Party Shuffle in iTunes. It's neither a permutation nor a combination.

It might help if you think of it using the alphabet--I'm building a word with random letters, with duplicated permitted--but what are the chances of actually having a duplicate pair of letters anywhere in the word? A single-character word has a 0% chance of having the same letter occur twice. There are 676 (26*26) two-character words, and 26 of them are the same letter twice, so those odds are 1/26. A 26 character word has a small chance of containing every letter exactly once, and all the other possibilities contain duplicates. A 27 character word must, by definition, contain some letter twice.

Adam said...

change "duplicated permitted" to "duplication permitted." Bah!