Annotation Hell


I hope you like my happy title for this post. After developing annotation manuals that formally explain how to code the behaviors we observe in our experimental data, the next task is to annotate the data according to those manuals. For this particular experiment, we decided to code the behavior in Microsoft Excel, but there also exists specialized software that synchronizes each code to a specific time of video and audio, called ELAN. I mention ELAN software because that’s what we typically use, but below I have included a diagram of an annotated Excel spreadsheet to demonstrate what is meant by annotation in this particular round.


The Q’s and R’s above represent “questions” and “responses”. These were annotated using my intelligent colleague Michelina Astle’s annotation manual. If we want to determine whether we are observing a real phenomenon, after having the same file coded by two different people, we delete the utterances and timestamps, and we replace all the codes with numbers, either 1 or 2, and blank spaces are replaced with 0’s. Then we submit this into a website,, which uses statistics (Krippendorf’s Alpha value) to determine whether the phenomenon is real and we’re not just randomly annotating random noise. After we’re sure the phenomenon is real, then we go and annotate many, many files. As you can see, each annotation of a file takes time, typically at elast an hour, because we have to read and think about each utterance and sometimes go back to the manual to determine what the right code is. It’s tedious, but it’s worth it. The product of this hard work is a coded file that is easy to import into the statistical software R Studio, and can be analyzed in a few minutes to determine correlations and whatnot.

I’m describing the annotation process because this is what I’ve spent nearly all my hours last week doing. I hope I was clear in that the process takes longer that it may at first seem to take, but is rewarding.

