Recently came across a quick read from @laroyo & @cawelty (2015) in AI Magazine on 7 "myths" about data collection. Their perspective is still worth thinking about b/c...

1/10
-Many of these questions are still heavily discussed (in #NLProc)
-synergies abound with current ideas about "fair and unbiased" data collection
-NLP/ML/AI moves quickly. We don't want to reinvent the wheel over and over and over and...

2/10
In their words: "We have discovered the following myths that directly influence the practice of collecting human annotated data. Like most myths, they are based in fact but have grown well beyond it, and need to be revisited in the context of the new changing world..."

3/10
"Myth One: One Truth
Most data collection efforts assume that there is one correct interpretation for every input example. "

4/10
"Myth Two: Disagreement Is Bad
To increase the quality of annotation data, disagreement among the annotators should be avoided or reduced. "

5/10
"Myth Three: Detailed Guidelines Help
When specific cases continuously cause disagreement, more instructions are added to limit interpretations."

6/10
"Myth Four: One Is Enough
Most annotated examples are evaluated by one person."

7/10
"Myth Five: Experts Are Better
Human annotators with domain knowledge provide better annotated data."

8/10
"Myth Six: All Examples Are Created Equal
The mathematics of using ground truth treats every example the same; either you match the correct result or not."

9/10
"Myth Seven: Once Done, Forever Valid
Once human annotated data is collected for a task, it is used over and over with no update. New annotated data is not aligned with previous data."

(feel free to share your reactions in comments on each myth or ...un-myth?)

10/10
A final note: the myths are all linked to the fact that annotation is not a passive checking-over or an idealized process. Instead, our annotators are actively interpreting the world through their own eyes (cf @MilagrosMiceli et al. 2020 https://dl.acm.org/doi/10.1145/3415186)
11/10
You can follow @adinamwilliams.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: