the story of how n why i had to import a 1.6 million line CSV file into libreoffice calc (thread)
my dumbass wanted to know what neighborhoods it was safest to jog in without catching covid. so what do I do? i download the AZDHS's zip code data from https://adhsgis.maps.arcgis.com/apps/opsdashboard/index.html#/84b7f701060641ca8bd9ea0717790906
but this is what that data looked like
we got zip codes on the right, and case counts on the left. ignore the fact that half this shitty ass data is REDACTED (thank u doug ducey)
but uh, all i got here is zip codes and case counts. how am i supposed to calculate the density of covid cases when all i have is the zip code??

ah yeah, this lil guy https://www.census.gov/geographies/reference-files/time-series/geo/gazetteer-files.html
that data's nice, i'm a fan of it. apparently the GEOID column is _supposed_ to not line up exactly with ZIP codes but every ZIP i looked at lined up real well. had to clean it up a lil bit formatting wise but who care
pop that shit into a separate spreadsheet and use the most convoluted ass LOOKUP command to reference the area and we good to go

here's that command pasted here btw lol:

=LOOKUP(A40,$'ZIP Code Area'.A:A,$'ZIP Code Area'.D:D)+LOOKUP(A40,$'ZIP Code Area'.A:A,$'ZIP Code Area'.E:E)
if u live in zip code 85363 you are probably old as hell and not on twitter but also i am so sorry
but then i was like "hmmm, the AZDHS doesn't provide data on covid cases per capita per zip code"

ok guess it's my time to shine
finding a dataset for populations by zip code is HARD. i searched for at least an hour. that shit is HIDDEN -- the US census doesn't generally delineate areas by ZIP code, they delineate by CDP (census designated places) or counties
google seemed so promising
yet alas, this is the census population for zip codes in the city of LA and nothing else
that link for the full data set? shit don't work
you do not know how many pages i scrolled through until i figured it was useless
so i found out that Google themselves got a dataset search page. for research only. https://datasetsearch.research.google.com/search?query=population%20by%20zip%20code&docid=J4KuZ00tbZazynllAAAAAA%3D%3D
lo and behold, some random ass researcher named the "US Census Bureau" uploaded all the population-by-zip-code data to fuckin Kaggle. of course. it's nowhere to be found on the official census website. but ohhhh boy it's here.
this dataset does some real silly and quirky things, like

assigning a gender to each zip code
the three genders: male, female, and #VALUE!
so i downloaded that shit and i slapped it into libreoffice calc as fast as i could. you do NOT know how excited i was
and, uh
ah. okay. gotcha.
so if I can't open this file as a spreadsheet, I guess I'll just write some code to parse it out and extract the AZ zipcodes?

nah

i write code all day give me a damn break
database time :^)
so.... time to set out to figure out how to import a CSV into a libreoffice base. but uh. looks like that's gonna be a stretch
so i did what any reasonable person would do, and scoured the menus for something that resembled a CSV file import

lo and behold....
yes. YES.
thank u for importing my, uh, "address book"
lol
now all that's left is to re-export this as a CSV......
tried to jump to the end of the table to see how many rows there are and I realized very quickly that I made a grave mistake
I AM FIGHTING TOOTH AND NAIL TO NOT HAVE TO WRITE ANOTHER LINE OF CODE
WE BALL
(will continue this thread after work tomorrow -- it's 3 AM here and i'm in a haze!)
You can follow @DataKinds.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: