21/12/2011

The Super Present-Selector, Part 2: Variables



To follow up Part 1, which I realise it might make the people involved sound like terribly ungrateful kiddies:
Well, he is really ungrateful and difficult!

- J/K. But no, seriously, he is really picky. Some help is needed. And choosing Christmas presents is probably not the most important thing in the world, but it's the type of problem that we have here that interests me, rather than its magnitude.


How has all of this gone down so far?


Obviously, I mentioned this project to my "boyf", to get a bit of feedback.


He isn't totally keen. He doesn't like the idea that we (our thoughts? opinions? judgement?) might be reduced to a formula where the values of pre-decided variables are simply plugged in to give a yes/no (or percentage, whatever) outcome.


He apparently doesn't realise that this is what I've been doing in an informal way already, for the last two years we've been living together, in my efforts to procure furnishings, clothes and kitchen utensils that he doesn't turn his nose up at.


I know what he means. How could we possibly take into account the interaction of all of the variables, in, I don't know, a lampshade -- its size and shape and level of shininess, which in that particular colour renders it just the perfect level of kitsch to be acceptable, whereas something-very-similar-but-slightly-different would be raté


Maybe we can't -- that's the whole question.


An important point: These variables are not picked automatically. It would actually be so much more elegant, beautiful, perfect, if they were. But I don't have a machine that I can just give a duvet cover to and it'll examine it and pick out variables I would never have even thought of using: size of inside hems; width of stitching; whether the factory that made it is in Turkey or Bangladesh. That kind of machine doesn't yet exist. 
Ideally, I would think of as many as possible of these variables myself -- as many as possible! even the seemingly irrelevant ones! -- but with the real-world constraints of time, manpower, and not-going-absolutely-crazy-ness, that's not going to happen either.


This means that there is already some human, non-automatic intervention involved - I'll have to pick out the variables I think might be likely candidates. The machine's just going to remember the values of all of them a lot better than I would be able to, and hopefully find some patterns that I might not have spotted.


I'm calling it a "machine" because I think that's a lot more parlant for non-specialists, but what we're basically talking about here is an algorithm. Or, in fact, a comparison of different algorithms -- it was a class assignment to experiment with the data-mining software Weka that inspired me to actually think about turning this into a project. 


Weka has a whole bunch of algorithms you can play around with and try out on your data sets, to find out which might be the most suitable for a classification task such as "Good present or not good present?". One I'm particularly excited about using is "Decision Tree" - it'll make an actual decision tree from your pre-classified data! I think that's pretty cool.




So, why hot water bottles covers?


Gripping stuff.
Before I get stuck into comparing things like pairs of boots (what am I gonna do, measure the angle of the toe?) I thought I'd better start with something "simple" - that is, with easy-to-measure relevant variables.


Some ideas I had: t-shirts? (still a bit complicated), duvet covers, paper folders... basically, something where at least one of the major factors (eg. shape) is fairly limited in variability.


In fact, I think hot water bottle covers is still not ideal for my first experiment run -- I'll explain why in an upcoming post (spoiler: it's about pattern). I have the next two days to choose which product I actually want to try this out with and get a load of examples of it, before I go off to the land of no internet connection.




The method


The basic plan is to give a list of products (pictures) to the choosy-person (who I'm going to start calling "Subject A"), and ask them to class them as "yes/no" depending on whether they like them or not. I'm thinking, let's keep this binary for the moment. We can always try something more fine-graded later.


Whether or not they will see some of the variables (price, brand, etc.) is something I haven't decided on. There are obvious psychological implications to this. At the same time, these are "realistic" - they are variables that we're aware of when we're making our own shopping choices. This is a question to come back to later.


The next week or so will be prime data-collection time: a few days in a house with not much else to do than eat and drink till bursting, go to mass, open presents play with data.


In fact, it'll be a few days in a house with members of Subject A's immediate family, which means: more data collection!


I could: ask Subject A's sister what her judgement of his preferences is (i.e. does she think he would like/not like each item). That way, I can have something to compare the machine's performance to: ok, it's 60% accurate, but is that better than a mother's judgement? A brother's? A girlfriend's?




Teaching/Learning


In the first post I mentioned two different projects: machine learning (using the data on someone's past choices to work out an algorithm equivalent to their decision process) and machine-aided human learning (using this data and algorithm to help another person learn someone's taste). 


Since the second of these involves all sorts of methodological questions (what would be the best way to do this exercise? Would it be better to be consciously aware of the variables involved, or not?) that's something I'm going to leave aside for now. Basically: let's make the algorithm to represent preferences for now, and if it can be useful for human-learning later, that's something else.

The Super Present-Selector, Part 1: Background

I'm in the process of constructing a machine I'm provisionally titling the Super Present Selector. I thought I'd blog as I go along. Firstly, some background: What is it? How did this come about?




Because my mum wants to know what to get me for Christmas. Because my mum wants to know what to get my boyfriend for Christmas.



Because my mum has a department store voucher that needs using up, and suggests buying me clothes with it. Because I think this is a terrible idea.


Because she asks: "Any kind not to get? Any colours not to choose? Plain colours or stripes? Smart or casual? If dresses or skirts, short, mid or long? Fitted or flared?"


Because I draft an email reply advising her strongly against the idea, but add in, just in case (because I fear I'm not too good at the advising strongly thing): "work stuff: size 8-10, like most shades of grey, skinny trousers but muscular calves make this difficult, don't like most stripes I've seen in the shops lately, never get anything from Jane Norman I'm pretty sure it's made for curvy people".


Because I don't send that email because I'm worried I sound like a dick, and I'm still not sure those rules are watertight enough. I just tell her on the phone that that homeware department (where they sell overpriced bottles of fancy alcohol) is a much safer bet.


Because she asks again what to get for my boyfriend.


Because I could have taken one look and told her that the hot water bottle with a striped cover she got him last year was not going to be a favourite, but I can't find the words to  explain exactly why.


Because I am definitely *for* the idea of getting people surprise gifts (surely it's not really a gift if you choose it yourself? they just pay for it, is all, and choosing presents is half of the fun), and anyway I don't have time to go picking out presents, but it would be a shame for someone else to spend money and goodwill on things that won't be appreciated.




Because, actually, maybe I could find the words to explain the problem with the hot water  bottle cover: it's stripes, of different colours and thicknessess, making up the kind of design you see in high-street home furnishings when they do their version of "minimalist" or "modern" (see also). "Bright modern stripes". And although the polyester-lined fleece material is one of the most insulating I've seen in bottle-cover technology, the aesthetic considerations here come up trumps. But how can I tell her all this without, simply, saying "it's ugly" or "he'll think it's ugly"?




And actually this other one, striped in simple pink and white, which I've just found on the website of the same shop, would have been easily more successful. 


What could possibly be a solution to all this heartbreak and confusion and timewasting?


Obviously: A Super Present-Selector. A machine that contains all the rules affecting present-pleasing probability. A machine into which candidate items can be plugged, (so my Mum can go out shopping, and this "ooh that looks nice, I wonder whether X would like it...") and the machine will say: "This item contains colours of more that 3.6 points* of  chromatic difference, in a stripe-formation of varying widths. This item is from shop Y, and is NOT from the range "classic". Other users have tagged this design as "bright modern stripes". The probability of this item bringing pleasure and joy to person X is: 13%"
*This is a scale I just made up


And in this situation, because my Mum is not a massive risk-taker, she would probably put the item down. Even better: because she is quite diligent and careful and because she likes finding out how things work she would probably continue through the shop, plugging in other interesting-looking items to see the results. She might even start systematically plugging in *all* the items in the shop, in order to work out for herself what the machine's rules are. And then, maybe, she will learn and internalise all of these rules herself and she won't even need the machine any more: she will have "learnt" what this difficult-to-please person's tastes in home furnishings are.


Wouldn't that be amazing?


So, that's what the idea of the Super Present Selector is. 


Part 2: Variables