Friday, April 25, 2014

DATA, DATA, DATA - Finding and Using and Giving Meaning to Data Available Online from Chrys Wu

 Rough notes, as she said it.  Take this with a big grain of salt.  But there's lots of good info and links in here for using data that's out there. (This was a really good, content rich session.)

Chrys Wu - What to think about when you're thinking about DATA

About me:  

DataKind - using data to advance the public good.  In New York, but you can volunteer around the world.

Altered Oceans - project won the Pulitzer Prize (Note:  I didn't win the prize, just worked on it.)

About You:

Work for NYTimes, Developer Advocate
but lots of other roles before in different organizations.

Election Map, Interactive graphics, technology group - about 300 people considered developers, make it possible for people coming to the website or thru mobile phone.

I'm a little bit of glue and little bit of grease.  Help others get their work done.

My role to solve problems and help people.

Collecting Data
People should know what data is out there.  Where get it.
I work with National pubs, for regional, wait til !&A

Govt. agencies - FRED -  Make friend your friend - terrific trove of economic data
GeoFred - maps
AlFred - archive of economic data -- code for 0380 for Alaska  will pinpoint Alaska
Get exel add-in

US Census Data - PUMS - public use microdata and IPUMS

Gives a lot of insight into what's happening in Alaska.  PUMS what feds are supplying, a sample pad, understand trends in household.
IPUMS - U of Minn. -






National Conference of STate Legislatures - tracks legislation


NGOS

World Bank - lots of data, particularly on poverty, also good tumblr account


The Internet Archive - SF, Friday around noon - they serve a free lunch and talk about the Internet Archive

Collect tv advertising campaigns, old newspapers, etc. 


Draw from your own well∫

Set up your own data base, 

Cleaning Data

Tabula - how to use - cracks pdfs.

School of Data -

Open/Refine (now googleRefine) can deal with >1 million records
Tutorials for Open/Refine - Github

David Huynh Full Tutorial (2009- still relevant)

Gotchas -
typos,
disambiguation (making sure these john smiths are the same or not, which Manhattan, etc)

Excel limits - watch for files with exactly
1,048,576
65,536
32,000
means they ran out because they exceeded the limits - more data, have to go back and get it.  Get on phone and talk to people.


QUESTIONS
Q:  How do you clean it, who is smarter?
A:  Call them and discuss it - they like to be alerted.  Pro-publica, data store
they've chosen not to charge for cleaned data from FOIA searches.

Q:  Have you used Gap Minder?  Plug data in and create a movable graphic.
A:  One shortcoming - doesn't allow you to do annotations  - Hans Rosling -

Q:  How village people in Anchorage maintain connections to villages through food - getting village food to urban areas - how might you approach that from a data perspective? Fish and Game doesn't break things out by indigineous groups and non-
A:  Find other overlapping data - can you use place? 

Q:  How do you vet sites?
A:  Generally look at the organizations collecting the data.  Talk directly to the source of first level collection - NGO's, Govt, even campaign reporting.  Pols have to report to agencies.  Even Pro-publica, need to check it out.  The Sunlight Foundation - they're trustworthy - take govt data and make it more usable.
Who entity is, what they're collecting, and the methodology.  Watch out for orgs that take data from different sources and try to mesh it.

Q:  Good tools or sites for government contracts?
A:  USA Spending.gov (from audience)

Q:  Who reviews - like peer review - your stuff?
A:  Times - trust our reporters.  Editors job to check and challenge the reporter.  For those who 'are' the newsroom, constantly check yourself.  I talked to a lot of experts, friends in academia and check with them.

Two ways to look for numbers:

1.  Look for outliers - what's this weird thing?
2.  Look for the numbers that don't change while everything else is changing?  Journalists trained to look for the movement, but maybe the thing that doesn't change is the real story.
3.  Comparisons - how it looks compared to other states?

Amanda Cox  - statistician for NYTimes, worked at Bureau of Labor Statistics - has a fan club of colleagues who love her.  She's spectacular.  Thinks creatively.

Q:  Find what others have done?
A:  IRE - Investigative Reporters and Editors - search  - they will come and train you free.

Q:  Work at museum and we have a big archive we'd like to share.
A:  Digital?  Look at models:
NY Public Library David Reardon
Google
British Library Photostream


Quartz



No comments:

Post a Comment

Comments will be reviewed, not for content (except ads), but for style. Comments with personal insults, rambling tirades, and significant repetition will be deleted. Ads disguised as comments, unless closely related to the post and of value to readers (my call) will be deleted. Click here to learn to put links in your comment.