Over the next two weeks, I will tweet about this term's @uocommonlaw "Data Science for Lawyers" projects.

I start with a training assignment where students had to investigate 107 revisions of the Canadian #Immigration Regulation (SOR/2002-227) each over 40k words long. (1/5)
First, they needed to use regular expressions (which search for patterns rather than keywords) from @datascience4law Lesson 3 to extract years from filenames and plot the number of revisions by year. (2/5)
They then calculated and plotted the number of words in each revision. It turns out that #immigration regs have become longer over time. (3/5)
But how can we tell meaningful revisions from small tweaks? By calculating textual similarity following @datascience4law Lesson 6. The resulting similarity heat map reveals three large groupings over time where the texts changed drastically. (4/5)
By selecting dates within each of the groupings, we can then use @CanLII to compare the wording of the selected regulations to find out what changed. The first transition, for example, is linked to the introduction of a new section on biometric information. (5/5)
I am grateful to @jdneilbouwer who pointed me to the study of federal regulations three years ago. I have been having fun ever since.
You can follow @w_alschner.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: