In just one case I see there, it's possible people weren't being jackasses. Given that HPMOR is written by "Less Wrong" on fanfiction.net, and the author's notes point to the website pretty regularly, people that are new might not know who the author is and have put that down. I'd still bet on jackasses, though.
I'm seriously hoping I didn't screw up and put any alpha in with my numerics so as to contribute to your rage.
Data cleaning & formatting is *always* the worst part.
> One person gave their home country as "Australia Germ"
I bet they were trying to write 'Australia Germany'. At least, I hope they were.
I hope so, too, but I'm leaving the possibility open that they meant Austria Germany.
You are my hero.
Are the original individual answers (of those who didn't mark a preference for them to remain confidential) going to be made available for the general public to do their own statistics on?
Also, Regular Expressions can help you get through that website formatting thing.
Yes, but I was planning on giving out only the cleaned-up version so that other people trying to work with the data wouldn't have to repeat my frustration. Can you think of a reason that would be a bad idea?
No attempted SQL injection attacks? I am disappoint.
Alternatively, the attack was successful and the 1150 entries Scott analyzed were only the ones input before that with the attack (or after, depending on the method).
Why not put some form validation into the survey? If you want a number, only accept a number, etc.
Because it's done using Google Forms. Writing and hosting a web application for this would be *far* too big a job.
Thank you for putting yourself through this *again* having already done it once. Now I really hope I avoided all of these mistakes :)
A smarter kind of stupid, eh?
2012-11-27 07:59 am (UTC)
Thanks for taking a bullet for the rest of us. An occasional fun part of cleaning up data files is that you get to include amusing footnotes in your article, e.g. "One respondent who answered several questions with obscenities was excluded from analysis." Or, from an actual publication, "A male subject completely misunderstood instructions and his data were omitted from analysis."
"5/10/2012" and "6/7/2012" were probably entered as "5-10" and "6-7". I had one survey with a confidence interval question come back full of dates; now I always make sure to have two blanks. Although I would've expected "10-30" to become October 30th.
Almost all these can be prevented with good validation in the form processing, whether client-side or server-side.
It might be possible to do this without developing a full webapp, using some easy-to-use SaaS that is more sophisticated than Google Forms: SurveyMonkey, Wufoo, etc.
And there are plenty of software engineers in the LW crowd -- I'll bet you can find a volunteer next year.
Edited at 2012-11-27 08:42 am (UTC)
ROFL. That is quite hilarious, although I sympathise with the frustration!
Edit: I hope I didn't do any of those. The embarrassing thing is that I can very easily imagine doing so, in that if someone asked you that question colloquially, a non-numeric answer might well be most useful.
In fact, I'm trying to avoid victim-blaming by giving Scott lots of gratuitous speculation on how to avoid this sort of answer, because I think that might be useful, but also, (a) some of it is only useful if you can recommend how to make a form with more validation and (b) it WAS clear, so even if Scott can alleviate the problem, it's not his fault that it occurred.
In fact, an interesting question might be, is there any way of phrasing the form so it's more obviously going to be computer parsed, so even if people have been already told, it's intuitive to them that writing english answers won't help? (Another question would be, would anyone who know regexes help tidy the data up?)
Edited at 2012-11-27 10:46 am (UTC)
Ugh, some people...
I have been known to claim my first language is maths; but I'm not sure how "silence" is a language at all...
(I didn't do your survey because I don't really read LW)
Dang it, I forgot to put "logic
" as a language!
...gah, this is a site where the average IQ is like 130 to 140, brilliant people in neuroscience and artificial intelligence
Are the IQs self-reported?
I think it's mostly self-reported results of random online IQ tests.
There was a survey? And I missed it? Aw, darn: I love filling in surveys! Though I probably would have made a mess of it, so it's as well I didn't know about it until this post.
I will now indulge in some smug triumphalist religious believer gloating over the foibles of self-identified smart people, being fed-up to the back teeth of one too many self-identified smart person condescending to announce that smart people don't fall for that religion stuff, only dumb hicks.
Yes, I am a dumb hick, but that's not the point.
This reminds me of my job 2-3 years ago. Although I was dealing with professionals being paid to fill out a form as their primary work product...
Basically the only answer is to make ill-formed data impossible, by forcing numeric/date responses when appropriate, and using drop-downs, checkboxes, and radio buttons for category responses. Free text is basically only worth it when you want to find out what "other" meant.
"Two people refused to answer any question denominated in "feet" because they claimed not to know what feet were"
That one are easy! "Feet" is the thingies what you wears your shoes on!
Re: form filling - oh, yes. Working in a local government (education) office, we get the grant forms, the applications for courses forms, the applications for employment forms, all kinds of forms as filled in by the public. Invariably, people get confused and put answers on the wrong lines, fill in the wrong sections, leave sections blank that need to be filled in, or are completely terrified of these being "official forms" to the point they ask us to fill them in for them (true story: an art teacher asked me to fill in an application form for her, seeing as how I was the clerical officer and knew how these things worked.)
Our national government in their wisdom (bless their little hearts) decided this year that instead of live persons handing out forms and taking them back and processing them, all in the name of greater efficiency and cost-cutting, they would put student grant application forms online.
That worked about as well as you'd expect: there's a two-month delay in processing the applications and getting the grants awards paid, and they've actually had to take on extra staff to speed up the process.
Yes, indeedy, as we could have told them if they'd asked. But nobody listens to the lowly paper-pushers on the coalface. (This is another reason why I'm so sceptical of the previous post quoting the libertarian stating her preference for the "smart, agenty physicist versus public health official", because I've seen these snazzy new ideas being put forward by consultants and outside experts, but nobody ever asks for input from those of us who deal with the public and know about the likely pitfalls.)
God, that made my evening. Stay classy, LW!
One person not giving a random number for the "Is the tallest redwood tree taller or shorter than your randomly generated number" question, but still declaring that the tree was shorter.
Oops, that sounds like it might have been me...
Maybe the form just didn't accept the +∞ that random.org decided to spit out for some bizarre reason.
"1138 people (96.2%) chose the first option, and 12 people (1%) the second. "
I work with survey data for a living. (Or more precisely, part of my job is dealing with survey data, but that doesn't sound so impressive as a claim to expertise.)
I can tell you now that this level of what I call bozo-response is exceptional.
Exceptionally low, that is!
Good luck with the rest of the analysis. And with facing the brickbats you'll get when you present the results!
When dealing with the public-at-large... yeah, 5% is low. Doing Data Entry or Market Research is an amazing way to kill all faith and hope in the species >.>
Two people started the survey before it was open, even though it said in huge letters "DO NOT START SURVEY, IT IS NOT OPEN YET" and even though the first question on the survey was "Do you understand the survey is not yet open and that you should not take it?" One of these people had IQ 140. He ticked "Yes" to the "Do you understand..." question.
If that was me, I apologize. :(
Thank you for leaving me collapsed on my desk laughing. You do a wonderful rant. :)
Don't forget that at any one time, 30%* of web users are drunk, high, or otherwise impaired. I think that goes a long way to explaining why 'smart' people wrote dumb responses.
*source: figure pulled out of my ass
Thanks for this! I've had similar experiences with surveys given to Personal Genome Project participants, so I found it hilarious. Our participants are on the whole lovely, but free text fields are an invitation for weird and/or unusable responses.
2013-01-16 11:50 pm (UTC)
How to buy a best beats by dre headphone online shop
xdonv [url=http://drdrebeatsonlinesale.co.uk]beats by dre[/url] rzsdu http://drdrebeatsonlinesale.co.uk wwqjh [url=http://beatsdrdresales.co.uk]beats by dre[/url] hjmxr http://beatsdrdresales.co.uk ywelq [url=http://drdrebeatsonlinestores.co.uk]beats by dre[/url] vxijy http://drdrebeatsonlinestores.co.uk nqzor [url=http://cheapbeatsdrdreonlinesale.com]beats by dre[/url] bcguc http://cheapbeatsdrdreonlinesale.com lmzcx [url=http://beatsbydrdreshopping.com]beats by dre[/url] oatim http://beatsbydrdreshopping.com aoyji [url=http://beatsbydrdreoutletforsale.com]beats by dre[/url] bgywa http://beatsbydrdreoutletforsale.com hpvm
2013-02-06 04:24 pm (UTC)
where can i buy cheap celine bags outlet online
qfqjq [url=http://www.add-celinehandbags.co.uk]celine bags[/url] vfpczr http://www.add-celinehandbags.co.uk fqtqj [url=http://www.add-celinebags.co.uk]cheap celine bag[/url] wetafm http://www.add-celinebags.co.uk orrof [url=http://www.getcelinebags.co.uk]celine handbags[/url] ywokqz http://www.getcelinebags.co.uk rwhf [url=http://www.pay-celinebags.co.uk]celine bag[/url] jzglab http://www.pay-celinebags.co.uk cphwa [url=http://www.pay-celinehandbags.co.uk]celine handbags[/url] qiojml http://www.pay-celinehandbags.co.uk jukhl [url=http://www.online-celinebags.co.uk]celine bag[/url] yguvgh http://www.online-celinebags.co.uk jbgi
2013-02-13 08:30 am (UTC)
where can i buy a cheap beats by dre outlet in uk
vstnj [url=http://www.foxbeatsbydre.co.uk]dre beats[/url] gqimc http://www.foxbeatsbydre.co.uk etbtl [url=http://www.add-beatsbydre.co.uk]dr dre beats[/url] oepkb http://www.add-beatsbydre.co.uk ckgip [url=http://www.beatsshop2013.com]beats by dre[/url] oovis http://www.beatsshop2013.com jhote [url=http://www.onlinecheapbeatsbydre.com]beats by dre[/url] swmcb http://www.onlinecheapbeatsbydre.com bbdfs [url=http://www.justdrdrebeatssale.com]beats by dre sale[/url] txvbx http://www.justdrdrebeatssale.com woyty [url=http://www.vip-beatsbydresale.com]cheap beats by dre[/url] plntx http://www.vip-beatsbydresale.com kfbl
2013-02-17 05:35 pm (UTC)
where can i buy a cheap celine handbags sale outlet
zyvzu [url=http://www.add-celinehandbags.co.uk]celine handbags[/url] vldxpo http://www.add-celinehandbags.co.uk gxrux [url=http://www.add-celinebags.co.uk]celine bag[/url] umqyzn http://www.add-celinebags.co.uk qzpyh [url=http://www.getcelinebags.co.uk]celine handbags[/url] svtgnx http://www.getcelinebags.co.uk wcxt [url=http://www.pay-celinebags.co.uk]celine bags[/url] xbxxcx http://www.pay-celinebags.co.uk zheuy [url=http://www.pay-celinehandbags.co.uk]celine bag[/url] fhnmlx http://www.pay-celinehandbags.co.uk dfyhf [url=http://www.online-celinebags.co.uk]celine bag online[/url] bjdwur http://www.online-celinebags.co.uk hvzx