the story of how n why i had to import a 1.6 million line CSV file into libreoffice calc (thread)
my dumbass wanted to know what neighborhoods it was safest to jog in without catching covid. so what do I do? i download the AZDHS& #39;s zip code data from https://adhsgis.maps.arcgis.com/apps/opsdashboard/index.html#/84b7f701060641ca8bd9ea0717790906">https://adhsgis.maps.arcgis.com/apps/opsd...
we got zip codes on the right, and case counts on the left. ignore the fact that half this shitty ass data is REDACTED (thank u doug ducey)
but uh, all i got here is zip codes and case counts. how am i supposed to calculate the density of covid cases when all i have is the zip code??
ah yeah, this lil guy https://www.census.gov/geographies/reference-files/time-series/geo/gazetteer-files.html">https://www.census.gov/geographi...
ah yeah, this lil guy https://www.census.gov/geographies/reference-files/time-series/geo/gazetteer-files.html">https://www.census.gov/geographi...
that data& #39;s nice, i& #39;m a fan of it. apparently the GEOID column is _supposed_ to not line up exactly with ZIP codes but every ZIP i looked at lined up real well. had to clean it up a lil bit formatting wise but who care
pop that shit into a separate spreadsheet and use the most convoluted ass LOOKUP command to reference the area and we good to go
here& #39;s that command pasted here btw lol:
=LOOKUP(A40,$& #39;ZIP Code Area& #39;.A:A,$& #39;ZIP Code Area& #39;.D:D)+LOOKUP(A40,$& #39;ZIP Code Area& #39;.A:A,$& #39;ZIP Code Area& #39;.E:E)
here& #39;s that command pasted here btw lol:
=LOOKUP(A40,$& #39;ZIP Code Area& #39;.A:A,$& #39;ZIP Code Area& #39;.D:D)+LOOKUP(A40,$& #39;ZIP Code Area& #39;.A:A,$& #39;ZIP Code Area& #39;.E:E)
if u live in zip code 85363 you are probably old as hell and not on twitter but also i am so sorry
but then i was like "hmmm, the AZDHS doesn& #39;t provide data on covid cases per capita per zip code"
ok guess it& #39;s my time to shine
ok guess it& #39;s my time to shine
finding a dataset for populations by zip code is HARD. i searched for at least an hour. that shit is HIDDEN -- the US census doesn& #39;t generally delineate areas by ZIP code, they delineate by CDP (census designated places) or counties
that link for the full data set? shit don& #39;t work
so i found out that Google themselves got a dataset search page. for research only. https://datasetsearch.research.google.com/search?query=population%20by%20zip%20code&docid=J4KuZ00tbZazynllAAAAAA%3D%3D">https://datasetsearch.research.google.com/search...
lo and behold, some random ass researcher named the "US Census Bureau" uploaded all the population-by-zip-code data to fuckin Kaggle. of course. it& #39;s nowhere to be found on the official census website. but ohhhh boy it& #39;s here.
(link: https://www.kaggle.com/census/us-population-by-zip-code/data?select=population_by_zip_2010.csv)">https://www.kaggle.com/census/us...
the three genders: male, female, and #VALUE!
so i downloaded that shit and i slapped it into libreoffice calc as fast as i could. you do NOT know how excited i was
so if I can& #39;t open this file as a spreadsheet, I guess I& #39;ll just write some code to parse it out and extract the AZ zipcodes?
nah
i write code all day give me a damn break
nah
i write code all day give me a damn break
so.... time to set out to figure out how to import a CSV into a libreoffice base. but uh. looks like that& #39;s gonna be a stretch
so i did what any reasonable person would do, and scoured the menus for something that resembled a CSV file import
lo and behold....
lo and behold....
now all that& #39;s left is to re-export this as a CSV......
tried to jump to the end of the table to see how many rows there are and I realized very quickly that I made a grave mistake