In just one case I see there, it's possible people weren't being jackasses. Given that HPMOR is written by "Less Wrong" on fanfiction.net, and the author's notes point to the website pretty regularly, people that are new might not know who the author is and have put that down. I'd still bet on jackasses, though.
I'm seriously hoping I didn't screw up and put any alpha in with my numerics so as to contribute to your rage.
Data cleaning & formatting is *always* the worst part.
> One person gave their home country as "Australia Germ"
I bet they were trying to write 'Australia Germany'. At least, I hope they were.
I hope so, too, but I'm leaving the possibility open that they meant Austria Germany.
You are my hero.
Are the original individual answers (of those who didn't mark a preference for them to remain confidential) going to be made available for the general public to do their own statistics on?
Also, Regular Expressions can help you get through that website formatting thing.
Yes, but I was planning on giving out only the cleaned-up version so that other people trying to work with the data wouldn't have to repeat my frustration. Can you think of a reason that would be a bad idea?
No attempted SQL injection attacks? I am disappoint.
Alternatively, the attack was successful and the 1150 entries Scott analyzed were only the ones input before that with the attack (or after, depending on the method).
Why not put some form validation into the survey? If you want a number, only accept a number, etc.
Because it's done using Google Forms. Writing and hosting a web application for this would be *far* too big a job.
Thank you for putting yourself through this *again* having already done it once. Now I really hope I avoided all of these mistakes :)
A smarter kind of stupid, eh?
2012-11-27 07:59 am (UTC)
Thanks for taking a bullet for the rest of us. An occasional fun part of cleaning up data files is that you get to include amusing footnotes in your article, e.g. "One respondent who answered several questions with obscenities was excluded from analysis." Or, from an actual publication, "A male subject completely misunderstood instructions and his data were omitted from analysis."
"5/10/2012" and "6/7/2012" were probably entered as "5-10" and "6-7". I had one survey with a confidence interval question come back full of dates; now I always make sure to have two blanks. Although I would've expected "10-30" to become October 30th.
Almost all these can be prevented with good validation in the form processing, whether client-side or server-side.
It might be possible to do this without developing a full webapp, using some easy-to-use SaaS that is more sophisticated than Google Forms: SurveyMonkey, Wufoo, etc.
And there are plenty of software engineers in the LW crowd -- I'll bet you can find a volunteer next year.
Edited at 2012-11-27 08:42 am (UTC)
ROFL. That is quite hilarious, although I sympathise with the frustration!
Edit: I hope I didn't do any of those. The embarrassing thing is that I can very easily imagine doing so, in that if someone asked you that question colloquially, a non-numeric answer might well be most useful.
In fact, I'm trying to avoid victim-blaming by giving Scott lots of gratuitous speculation on how to avoid this sort of answer, because I think that might be useful, but also, (a) some of it is only useful if you can recommend how to make a form with more validation and (b) it WAS clear, so even if Scott can alleviate the problem, it's not his fault that it occurred.
In fact, an interesting question might be, is there any way of phrasing the form so it's more obviously going to be computer parsed, so even if people have been already told, it's intuitive to them that writing english answers won't help? (Another question would be, would anyone who know regexes help tidy the data up?)
Edited at 2012-11-27 10:46 am (UTC)
Ugh, some people...
I have been known to claim my first language is maths; but I'm not sure how "silence" is a language at all...
(I didn't do your survey because I don't really read LW)
Dang it, I forgot to put "logic
" as a language!
...gah, this is a site where the average IQ is like 130 to 140, brilliant people in neuroscience and artificial intelligence
Are the IQs self-reported?
I think it's mostly self-reported results of random online IQ tests.
There was a survey? And I missed it? Aw, darn: I love filling in surveys! Though I probably would have made a mess of it, so it's as well I didn't know about it until this post.
I will now indulge in some smug triumphalist religious believer gloating over the foibles of self-identified smart people, being fed-up to the back teeth of one too many self-identified smart person condescending to announce that smart people don't fall for that religion stuff, only dumb hicks.
Yes, I am a dumb hick, but that's not the point.
This reminds me of my job 2-3 years ago. Although I was dealing with professionals being paid to fill out a form as their primary work product...
Basically the only answer is to make ill-formed data impossible, by forcing numeric/date responses when appropriate, and using drop-downs, checkboxes, and radio buttons for category responses. Free text is basically only worth it when you want to find out what "other" meant.
"Two people refused to answer any question denominated in "feet" because they claimed not to know what feet were"
That one are easy! "Feet" is the thingies what you wears your shoes on!
Re: form filling - oh, yes. Working in a local government (education) office, we get the grant forms, the applications for courses forms, the applications for employment forms, all kinds of forms as filled in by the public. Invariably, people get confused and put answers on the wrong lines, fill in the wrong sections, leave sections blank that need to be filled in, or are completely terrified of these being "official forms" to the point they ask us to fill them in for them (true story: an art teacher asked me to fill in an application form for her, seeing as how I was the clerical officer and knew how these things worked.)
Our national government in their wisdom (bless their little hearts) decided this year that instead of live persons handing out forms and taking them back and processing them, all in the name of greater efficiency and cost-cutting, they would put student grant application forms online.
That worked about as well as you'd expect: there's a two-month delay in processing the applications and getting the grants awards paid, and they've actually had to take on extra staff to speed up the process.
Yes, indeedy, as we could have told them if they'd asked. But nobody listens to the lowly paper-pushers on the coalface. (This is another reason why I'm so sceptical of the previous post quoting the libertarian stating her preference for the "smart, agenty physicist versus public health official", because I've seen these snazzy new ideas being put forward by consultants and outside experts, but nobody ever asks for input from those of us who deal with the public and know about the likely pitfalls.)
God, that made my evening. Stay classy, LW!