117M plant records after a spatial join with the map of "Terrestrial Ecoregions of the World" (URL: https://www.worldwildlife.org/publications/terrestrial-ecoregions-of-the-world) by Olson et al. (2001). The spatial join was done with PostgreSQL/PostGIS, and took around 6 hours.
PostGIS facilitates this kind of large spatial joins by integrating its own commands with SQL. In this case, ST_Intersects compares the geometries of the species records and ecoregion polygons to yield TRUE when a record of "plantae" intersects with a polygon of "ecoregions".
Even though the screen capture was done in pgAdmin4, the operation was done from Rstudio with the package "RPostgreSQL". It allows to establish a connection with a PostgreSQL database, and use either dplyr or SQL code chunks in an .Rmd file to process the data stored in the ddbb.
So, from a single .Rmd file I have been working with: Spark to clean the data efficiently; with the system console to setup the PostgreSQL/PostGIS database; and with the database to do the spatial operations, so far.
I have been working with GRASS GIS on the side to prepare environmental data, but yesterday I learned I should have been using the R package rgrass7 to do that from the .Rmd as well. Anyway, now these rasters I prepared will be swallowed by PostgreSQL/PostGIS as well.
You can follow @BlasBenito.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: