Hipsters' Choice [Code]

Someone asked me which of the badges was the “Hipsters’ Choice” that was assigned to people who didn’t select one online.

It actually isn’t quite that simple. Hipsters’ Choice was an incremental process, and out of laziness I’ll just post the code I used in lieu of an explanation:

#!/usr/bin/env python

import csv
import random
import sys

with file(sys.argv[1], "r") as f:
    reader = csv.reader(f)
    rows = list(reader)

def get_candidate_keys(d):
    lengths = [len(d[k]) for k in d]
    min_length = min(*lengths)
    return [k for k in d if len(d[k]) == min_length]   

assigned = [r for r in rows if r[-1]]
keys = set(r[-1] for r in assigned)
kl = {}
for k in keys:
    kl[k] = [r for r in rows if r[-1] == k]

unassigned = [r for r in rows if not r[-1]]
random.shuffle(unassigned)
while unassigned:
    row = unassigned.pop()
    candidates = get_candidate_keys(kl)
    choice = random.choice(candidates)
    row[-1] = choice
    kl[choice].append(row)

with file (sys.argv[2], "w") as f:
    writer = csv.writer(f)
    for k in kl:
        writer.writerows(kl[k])

Yes, I know that the asymptotic performance of this implementation is less than ideal, I just don’t care—it got the job done at bleary-eyed-o’clock and still runs in a tiny fraction of a second for this year’s data. I’m cringing at some of the less-than-idiomatic parts looking back at it now, but eh, whatever. This is how the sausage got made. (Mmm, meat…) =D