21/12/2011

The Super Present-Selector, Part 2: Variables



To follow up Part 1, which I realise it might make the people involved sound like terribly ungrateful kiddies:
Well, he is really ungrateful and difficult!

- J/K. But no, seriously, he is really picky. Some help is needed. And choosing Christmas presents is probably not the most important thing in the world, but it's the type of problem that we have here that interests me, rather than its magnitude.


How has all of this gone down so far?


Obviously, I mentioned this project to my "boyf", to get a bit of feedback.


He isn't totally keen. He doesn't like the idea that we (our thoughts? opinions? judgement?) might be reduced to a formula where the values of pre-decided variables are simply plugged in to give a yes/no (or percentage, whatever) outcome.


He apparently doesn't realise that this is what I've been doing in an informal way already, for the last two years we've been living together, in my efforts to procure furnishings, clothes and kitchen utensils that he doesn't turn his nose up at.


I know what he means. How could we possibly take into account the interaction of all of the variables, in, I don't know, a lampshade -- its size and shape and level of shininess, which in that particular colour renders it just the perfect level of kitsch to be acceptable, whereas something-very-similar-but-slightly-different would be raté


Maybe we can't -- that's the whole question.


An important point: These variables are not picked automatically. It would actually be so much more elegant, beautiful, perfect, if they were. But I don't have a machine that I can just give a duvet cover to and it'll examine it and pick out variables I would never have even thought of using: size of inside hems; width of stitching; whether the factory that made it is in Turkey or Bangladesh. That kind of machine doesn't yet exist. 
Ideally, I would think of as many as possible of these variables myself -- as many as possible! even the seemingly irrelevant ones! -- but with the real-world constraints of time, manpower, and not-going-absolutely-crazy-ness, that's not going to happen either.


This means that there is already some human, non-automatic intervention involved - I'll have to pick out the variables I think might be likely candidates. The machine's just going to remember the values of all of them a lot better than I would be able to, and hopefully find some patterns that I might not have spotted.


I'm calling it a "machine" because I think that's a lot more parlant for non-specialists, but what we're basically talking about here is an algorithm. Or, in fact, a comparison of different algorithms -- it was a class assignment to experiment with the data-mining software Weka that inspired me to actually think about turning this into a project. 


Weka has a whole bunch of algorithms you can play around with and try out on your data sets, to find out which might be the most suitable for a classification task such as "Good present or not good present?". One I'm particularly excited about using is "Decision Tree" - it'll make an actual decision tree from your pre-classified data! I think that's pretty cool.




So, why hot water bottles covers?


Gripping stuff.
Before I get stuck into comparing things like pairs of boots (what am I gonna do, measure the angle of the toe?) I thought I'd better start with something "simple" - that is, with easy-to-measure relevant variables.


Some ideas I had: t-shirts? (still a bit complicated), duvet covers, paper folders... basically, something where at least one of the major factors (eg. shape) is fairly limited in variability.


In fact, I think hot water bottle covers is still not ideal for my first experiment run -- I'll explain why in an upcoming post (spoiler: it's about pattern). I have the next two days to choose which product I actually want to try this out with and get a load of examples of it, before I go off to the land of no internet connection.




The method


The basic plan is to give a list of products (pictures) to the choosy-person (who I'm going to start calling "Subject A"), and ask them to class them as "yes/no" depending on whether they like them or not. I'm thinking, let's keep this binary for the moment. We can always try something more fine-graded later.


Whether or not they will see some of the variables (price, brand, etc.) is something I haven't decided on. There are obvious psychological implications to this. At the same time, these are "realistic" - they are variables that we're aware of when we're making our own shopping choices. This is a question to come back to later.


The next week or so will be prime data-collection time: a few days in a house with not much else to do than eat and drink till bursting, go to mass, open presents play with data.


In fact, it'll be a few days in a house with members of Subject A's immediate family, which means: more data collection!


I could: ask Subject A's sister what her judgement of his preferences is (i.e. does she think he would like/not like each item). That way, I can have something to compare the machine's performance to: ok, it's 60% accurate, but is that better than a mother's judgement? A brother's? A girlfriend's?




Teaching/Learning


In the first post I mentioned two different projects: machine learning (using the data on someone's past choices to work out an algorithm equivalent to their decision process) and machine-aided human learning (using this data and algorithm to help another person learn someone's taste). 


Since the second of these involves all sorts of methodological questions (what would be the best way to do this exercise? Would it be better to be consciously aware of the variables involved, or not?) that's something I'm going to leave aside for now. Basically: let's make the algorithm to represent preferences for now, and if it can be useful for human-learning later, that's something else.

No comments: