manydogs_etal_2024.Rmd

---
title             : "Data from ManyDogs 1"
shorttitle        : "Data from ManyDogs 1"

author:
  - name: ManyDogs Project
    affiliation: ''
    email: manydogsproject@gmail.com
  - name: Julia Espinosa
    affiliation: '1'
    # role:
    #   - Conceptualization
    #   - Data curation
    #   - Formal analysis
    #   - Funding acquisition
    #   - Methodology
    #   - Project administration
    #   - Supervision
    #   - Writing - original draft
    #   - Writing - review & editing
  - name: Elizabeth Hare
    affiliation: '2'
    # role:
    #   - Conceptualization
    #   - Data curation
    #   - Formal analysis
    #   - Methodology
    #   - Project administration
    #   - Software
    #   - Validation
    #   - Writing - original draft
    #   - Writing - review & editing
  - name: Daniela Alberghina
    affiliation: '3'
    # role:
    #   - Investigation
    #   - Validation
    #   - Writing - original draft
    #   - Writing - review & editing
  - name: Bryan Mitchel Perez Valverde
    affiliation: '4'
    # role:
    #   - Investigation
    #   - Validation
    #   - Writing - original draft
    #   - Writing - review & editing
  - name: Jeffrey R. Stevens
    affiliation: '5'
    # role:
    #   - Conceptualization
    #   - Data curation
    #   - Formal analysis
    #   - Methodology
    #   - Project administration
    #   - Software
    #   - Supervision
    #   - Visualization
    #   - Writing - original draft
    #   - Writing - review & editing
    email: jeffrey.r.stevens@gmail.com
    corresponding: yes
    address: B83 East Stadium, University of Nebraska-Lincoln, Lincoln, Nebraska 68588, USA

affiliation:
  - id: '1'
    institution: Department of Human Evolutionary Biology, Harvard University, Cambridge,
      MA, USA
  - id: '2'
    institution: Dog Genetics LLC, Astoria, NY, USA
  - id: '3'
    institution: Department of Veterinary Sciences, University of Messina, Messina,
      Italy
  - id: '4'
    institution: The Graduate Center, City University of New York, New York City, New York, USA
  - id: '5'
    institution: Department of Psychology, Center for Brain, Biology & Behavior, University
      of Nebraska-Lincoln, Lincoln, Nebraska, USA

authornote: |
  

abstract: |
  The ManyDogs 1 study is the first multi-site collaborative study of dogs’ responses to human pointing. It addressed whether dogs perceive the gesture as socially communicative and are therefore more likely to follow the point when it is paired with additional social signals (ManyDogs Project, et al., 2023b). Researchers from 20 research sites across eight countries collected data from 704 dogs. Here, we present not only the behavior data on the dogs’ responses to experimental pointing conditions but also guardian responses to survey questions, including the Canine Behavior and Research Questionnaire (C-BARQ, Hsu and Serpell, 2003). This dataset allows for assessing associations among C-BARQ measures as well as connections to the experimental task data, research site metadata, and other dog and guardian characteristic data.
  
keywords          : "Canine; Dog; Interspecies interaction; Pointing; Social communication"
# wordcount         : "X"

bibliography      : "manydogs_etal_2024.bib"
csl               : "apa7_chron.csl"

floatsintext      : yes
linenumbers       : yes
draft             : no
mask              : no

figurelist        : no
tablelist         : no
footnotelist      : no

classoption       : "man"
output            : papaja::apa6_pdf
header-includes:
   - \usepackage{pdflscape}
---

```{r setup, include = FALSE}
library(knitr)
library(kableExtra)
library(tidyverse)
library(papaja)
library(flextable)
r_refs("r-references.bib")

# Load files
data <- read_csv("manydogs_etal_2024_data.csv", show_col_types = FALSE)
data_rows <- nrow(data)
codebook <- read_csv("manydogs_etal_2024_codebook.csv", show_col_types = FALSE, col_select = c(1:5)) |> 
  mutate(description = sub("\\('", "\\(\"", description),
         description = sub("'\\)", "\"\\)", description))
stopifnot(names(data) == codebook$variable_name)
```

```{r analysis-preferences}
# Seed for random number generation
set.seed(42)
knitr::opts_chunk$set(cache.extra = knitr::rand_seed)
```


## (1) Background
<!-- Provide background to the data (1000 words maximum). This might include aim(s), main question(s) intend to addressed, topics covered and related theory including literature references and acknowledgement of all major uses of the data. This section should not constitute a full literature review of the topic area, but instead help highlight the literature that informed the motivation and purpose behind collection of the data and potential for use. Where broad comments are made and a wide variety of sources are available to evidence these claims e.g., that humour can lead to laughter, or a specific measure has questionable psychometric properties, preference should be given to acknowledge sources of evidence created by communities who are currently being, or have historically been, marginalised or discriminated against. -->
ManyDogs is an international research consortium of scientists with a shared interest in the factors driving canine behavior and cognition [@ManyDogsProject.etal.2023a]. This consortium actively fosters a diverse community and formalizes a transparent and equitable process for engaging in multi-site collaborative projects related to canine behavior and cognition. In the first ManyDogs study---named ManyDogs 1 [@ManyDogsProject.etal.2023]---we investigated a question of theoretical importance in canine science: Do dogs act on human pointing signals as though they are communicative social cues? Domestic dogs (_Canis familiaris_) have become a popular animal model for investigating behavioral and cognitive evolution due to their shared ecological niche with humans and because they are plentiful, easy-to-access research subjects in many parts of the world. Unlike humans’ more closely related primate relatives (e.g., chimpanzees, _Pan troglodytes_) and laboratory-bred rodent models of behavior and cognition, dogs are embedded in the human environment, living in our homes and navigating our workplaces. Dogs have been intentionally bred to live in these spaces and interact with humans, making them a ready comparison species in which to investigate the origins of cognitive processes. Interest in their putatively innate ability to interact and cooperate with humans has made them particularly popular in comparative studies, especially as they appear to respond to human communicative cues---such as pointing---more accurately and flexibly than other species [e.g., @Brauer.etal.2006a]. Though point following behavior in dogs has been widely observed and studied over recent decades [@Miklosi.etal.1998a; @Soproni.etal.2001a; @Hare.etal.2002a; @Kaminski.Nitzschner.2013a], there is still disagreement as to the underlying motivation for the behavior. Do dogs respond to pointing because they interpret the gesture as socially communicative [@Hare.Tomasello.1999;@Soproni.etal.2001a;@Kaminski.Nitzschner.2013a]? Or rather, because dogs have learned to associate human pointing with food rewards [e.g., @Wynne.etal.2008a]?

To investigate this question, we used a big team science, single-study approach, modeled after other groups such as ManyBabies [@Frank.etal.2017] and ManyPrimates [@ManyPrimates.etal.2019a]. Big team science involves "endeavors in which an unusually large number of researchers — often dispersed across institutions and world regions — self-organize to pool intellectual and material resources in pursuit of a common goal" [@Coles.etal.2022]. With this approach, multiple research teams followed the same experimental protocol, sharing the high cost of behavioral data collection and striving to implement the method in an identical manner. This approach replicated the study simultaneously in different research environments and with different populations. Big team science is important in animal cognition work generally because it greatly increases sample sizes and diversity and enhances task design [@Alessandroni.etal.2024]. This approach is particularly important in canine cognition because, due to the larger and more diverse samples, big team science allows us to answer new questions previously unattainable with smaller, more homogeneous samples [@ManyDogsProject.etal.2023a]. This includes the role of breed, life history, training, and geographical location on behavior.

Under our main hypothesis, we predicted that when dogs saw a pointing gesture paired with _ostensive_ signals, such as dog-oriented eye gaze and dog-directed speech (i.e., calling the dog’s name), they would be more likely to follow the gesture than when no such ostensive cues accompanied the point. If we observed this response across dogs, the result would lend support to the idea that explicitly communicative cues help dogs understand the intention behind the gesture. Such an outcome would suggest that dogs find ostensive cues necessary for understanding pointing, similar to human children [@Behne.etal.2005a].  On the other hand, if no difference was observed in point following across the ostensive and non-ostensive conditions (pointing without additional voice or gaze cues), this outcome would suggest that dogs indiscriminately follow pointing. Such a result would suggest that dogs raised by humans may learn to associate pointing limbs with rewards and not necessarily perceive any communicative intention underlying the gesture.

In addition to testing our main hypothesis, we took the opportunity offered by multiple research teams in different sites collaborating on the same study to collect data on sources of inter-site variability that could influence the results. Often, studies by different groups produce inconsistent results [@Rodriguez.etal.2021a]. The impact of cultural differences in scientific practice, dog training norms across regions, and of course variation in heritable traits across dog breeds have complicated replication studies conducted by isolated groups, making it difficult to pinpoint the reasons for inconsistent results. By collecting extensive and detailed information about the testing environments and subject population, we achieved a rich and robust dataset that would support investigation about multiple influences on dogs’ behavior previously out of reach.


## (2) Methods
<!-- Describe the methods used for data collection including all sub-headings below as deemed relevant. There is no word limit to this section; we are looking for a carefully detailed account of the data collection methods with sufficient detail to allow replication and meaningful secondary use of the data. -->

## 2.1 Study design
<!-- **Please describe the overall study design as requested here. This section is not about surveys but general study characteristics.** -->
<!-- _Provide a clear overview of the study design. This should involve broader features, for example whether the data is cross-sectional or longitudinal, from online questionnaires or face-to-face interviews. A clear overview of the central foci and variables of the data should be presented, including differentiation of between- and within-participant dimensions wherever relevant._ -->

The ManyDogs 1 study used a cross-sectional, multi-method approach to collecting data. Dog guardians were recruited through the individual research sites’ existing databases and via their respective outreach methods (e.g., social media). Prior to participating in the behavioral tasks at a research site, guardians completed an online survey, providing basic environment and demographic information along with a validated assessment of canine temperament and behavior---the Canine Behavioral Assessment and Research Questionnaire [C-BARQ, @Hsu.Serpell.2003a]. The behavioral tasks included a short series of object-choice warm-ups that acclimated the dog to the space, followed by two experimental pointing conditions. Using a within-subjects design, dogs were tested on two different pointing cues by a trained researcher, ostensive and non-ostensive, in counterbalanced orders across subjects. Response rates to these two styles of pointing were compared within subjects, while additional between-subject variables derived from the survey data supported investigating variability in behavior as a function of demographic and environmental factors.

## 2.2 Time of data collection
<!-- Specific time periods in which the data was collected e.g., December 2020-Febuary 2021, or 1st-15th of May 2019. Date of data collection will not be considered an evaluative criteria of the work.  -->
Data for the study were collected over 13 months, between `r as.character(month(sort(data$date)[1], label = TRUE, abbr = FALSE))` `r year(sort(data$date)[1])` and `r as.character(month(sort(data$date[!is.na(data$date)], decreasing = TRUE)[1], label = TRUE, abbr = FALSE))` `r year(sort(data$date[!is.na(data$date)], decreasing = TRUE)[1])`. Within this time window, research sites were able to decide when to implement the protocol according to the guardian and staff availability (collection dates available in dataset).

## 2.3 Location of data collection
<!-- List regions or countries covered by the data with as much detail as is possible e.g. Data was collected in South-East London, UK or at the University of Bologna, Italy. -->
For the main study, data were collected in 20 research sites across eight countries (Argentina, Canada, Croatia, Hungary, Italy, Poland, UK, USA) on three continents (Figure \@ref(fig:countries)). In addition, an Austrian site recorded only pilot data and is not represented in this dataset. A full list and description of research sites is available in Table \@ref(tab:sites).

```{r countries, fig.align="center", out.width="80%", fig.cap="ManyDogs 1 data presented here were collected from 20 research sites in eight countries: Argentina, Canada, Croatia, Hungary, Italy, Poland, UK, USA (dark blue). Pilot data not included in this dataset were collected from a site in Austria (light blue)."}
include_graphics("md1_countries.png")
```

```{r sites}
site <- c("Animal Health and Welfare Research Centre", 
          "Arizona Canine Cognition Center", 
          "Auburn Canine Performance Sciences", 
          "Boston Canine Cognition Center", 
          "Brown Dog Lab", 
          "Canid Behavior Research Group", 
          "Canine Cognition and Human Interaction Lab", 
          "Canine Cognition Center at Yale", 
          "Canine Companions", 
          "Canine Research Unit", 
          "Clever Dog Lab$^{*}$", 
          "Comparative Cognition Lab", 
          "Comparative Cognitive Science Lab", 
          "Consultorio Comportamentale", 
          "Department of Psychology and Individual Differences", 
          "Dog Cognition Centre", 
          "Duke Canine Cognition Center", 
          "Leader Dogs for the Blind", 
          "Social Cognition Lab", 
          "The Family Dog Project", 
          "Thinking Dog Center")
site_abbr <- c("ucs", "accc", "auburn", "bccc", "bdl", "icoc", "cchil", "yale", "cci", "crumun", "cdl", "manitoba", "urijeka", "umessina", "uwarsaw", "dcc", "duke", "ldbtdc", "queensu", "eltebuda", "tdc")
location <- c("Winchester, United Kingdom", "Tuscon, AZ, USA", "Auburn, AL, USA", "Boston, MA, USA", "Providence, RI, USA", "Buenos Aires, Argentina", "Lincoln, NE, USA", "New Haven, CT, USA", "Santa Rosa, CA, USA", "St. John’s, NL, Canada", "Vienna, Austria", "Winnipeg, MB, Canada", "Rijeka, Croatia", "Messina, Italy", "Warsaw, Poland", "Portsmouth, United Kingdom", "Durham, NC, USA", "Rochester, MI, USA", "Dundalk, ON, Canada", "Budapest, Hungary", "New York City, NY, USA")
site_info <- data.frame(site, location, site_abbr)

site_latex <- site_info |> kable(booktab = TRUE, format = "latex", escape = FALSE, linesep = "", table.envir = "table",
        caption = "Site information",
        col.names = c("Site", "Location", "Data abbreviation")
  ) |> 
  # kable_styling(latex_options = "scale_down") |> 
  column_spec(1, width = "8cm") |>
  # column_spec(3, width = "3.75cm") |>
  kableExtra::footnote(symbol = c("Clever Dog Lab participated only in the pilot data collection."))

site_html <- site_info |>
  mutate(site = sub("\\$\\^\\{\\*\\}\\$", "", site)) |> 
  flextable::flextable() |> 
  set_header_labels(site = "Site", location = "Location", site_abbr = "Data abbreviation") |> 
  fontsize(size = 10) |> 
  font(part = "all", fontname = "Times New Roman") |> 
  set_caption("Site information") |> 
  footnote(i = 11, j = 1, value = as_paragraph("Clever Dog Lab participated only in the pilot data collection.")) |> 
  font(part = "footer", fontname = "Times New Roman") |> 
  autofit()

if (knitr::is_latex_output()) {
  site_latex
} else {
  site_html
}

```

```{r}
status_count <- data |> 
  count(experiment_status) |> 
  drop_na()
```

## 2.4 Sampling, sample and data collection
<!--Describe the sample, including any basic demographic information collected, such as number of respondents, age (M, SD), educational background, socio-economic status, religion, and any other descriptive factors collected that are relevant to study design e.g., length of tenure if occupational in focus. The sampling strategy adopted should be fully detailed, including any payment or benefits offered to participants for participation. Any data that did not contribute to the final data set e.g. missing data or response rates, should also be reported here where relevant. -->
Across all sites, teams behaviorally tested `r data_rows` dogs (M:F = `r nrow(data[data$sex=="Male",])`:`r nrow(data[data$sex=="Female",])`, mean ± SD age = `r round(mean(data$age, na.rm = TRUE), digits = 1)` ± `r apa_num(sd(data$age, na.rm = TRUE), digits = 1)` years [range = `r apa_num(min(data$age, na.rm = TRUE), digits = 1)`-`r apa_num(max(data$age, na.rm = TRUE), digits = 1)`]). Approximately `r apa_num(mean(data$desexed=="Yes", na.rm = TRUE) * 100, digits = 1)`% of the dogs were spayed or neutered, `r apa_num(mean(data$purebred=="Yes", na.rm = TRUE) * 100, digits = 1)`% were of single-breed ancestry (comprising `r length(unique(data$breed))` distinct breeds), `r apa_num(mean(data$owned_status=="Private home", na.rm = TRUE) * 100, digits = 1)`% lived in private homes, `r apa_num(mean(data$owned_status=="Group housing (e.g., working dog kennel)", na.rm = TRUE) * 100, digits = 1)`% lived in group/kennel housing, and `r apa_num(mean(data$owned_status=="Other", na.rm = TRUE) * 100, digits = 1)`% lived in other housing. We excluded `r status_count$n[status_count$experiment_status=="Incomplete"]` dogs because they did not complete the behavioral testing and `r status_count$n[status_count$experiment_status=="Error"]` dogs because of experimenter errors. Thus, complete behavioral data were collected from `r status_count$n[status_count$experiment_status=="Included"]` dogs, and complete survey data were collected from `r length(data$experiment_status[!is.na(data$cbarq_miscellaneous_26) & !is.na(data$cbarq_miscellaneous_27)])` dogs. Guardians identified as female (`r apa_num(mean(data$guardian_gender=="Female", na.rm = TRUE) * 100, digits = 1)`%), male (`r apa_num(mean(data$guardian_gender=="Male", na.rm = TRUE) * 100, digits = 1)`%), and nonbinary/other (`r apa_num(mean(data$guardian_gender=="Other", na.rm = TRUE) * 100, digits = 1)`%) with a modal guardian age range of `r sub(" - ", "-", slice_max(count(data, guardian_age), order_by = n)$guardian_age)` years. All labs that started data collection met our criteria for inclusion, so no labs were excluded.

## 2.5 Materials/Survey instruments
<!-- **Please describe all survey instruments as requested here. This should describe all materials in the survey for the complete dataset provided here (not just the MD1 dataset).** -->
<!-- _Describe the study materials, constructs measured, stimuli, number of items, participant instructions, and, if applicable, factors in the experimental design. Here, a high level of detail is expected to allow replication, and links to stored copies of the exact materials used are expected (unless under copyright or other such restriction)._ -->

The guardian survey was hosted on Qualtrics (complete survey available at <https://doi.org/10.17605/osf.io/7rwpc>). The survey included dog demographics (name, living situation, sex, neuter status, birth date, breed information, acquisition type), training information (communication style and frequency, training experience, research experience), guardian demographics (gender, age, community type), and C-BARQ. The C-BARQ trainability scale (eight items) was presented first and was included in the pre-registered analysis of pointing [@ManyDogsProject.etal.2023]. After answering the trainability questions, guardians could decide to submit their responses or continue to complete the remaining six behavior assessment scales from the C-BARQ. If they continued, they answered questions about aggression (28 questions), fear (18 questions), separation-related behavior (9 questions), excitability (7 questions), attachment/attention-seeking (7 questions), and miscellaneous behavior problems (28 questions), including chasing, chewing, begging, pulling, urinating, defecating, barking, and licking. Most questions used a 5-point Likert scale with a Not Observed option. Some categories included open-ended questions for additional explanations of their dog’s behavior, but we did not include them in our dataset to protect guardian anonymity.

To facilitate replication of the methodology, the detailed experimental protocol is open-access and available with the original scientific report [@ManyDogsProject.etal.2023]. Behavioral data were collected at individual research sites, where guardians brought the dogs in for test sessions. The study was designed to take 30 minutes or less and had two stages, warm-ups and test trials. After the dogs acclimated to the testing room, they participated in a series of warm-up object-choice tasks. The first task piqued the dogs’ interest in food rewards and gauged their willingness to approach the experimenter and pick up visible food from the floor. Each dog completed two visible food placement trials. The second task built up an association between cups and food. In this task dogs completed a minimum of three trials in rapid succession without being recalled to the start line. There were no performance requirements for the first two warm ups, only that the dog should retrieve food and make contact with the cups, showing a willingness to engage in the task and approach the experimenter. The third and fourth warm up tasks scaffolded the more formal trial structure and familiarized the dog with the two lateral search locations on either side of the experimenter. The third task used one cup with visible baiting at each of the lateral search locations and dogs completed four trials, two per side in alternating order. The fourth and final warm up used two cups and the same visible baiting procedure as in one-cup warm-up. In two-cup warm-up, dogs had to choose the visibly baited cup over the empty cup on four out of six consecutive trials in a sliding window of opportunity to progress to the test trials. A maximum of 20 two-cup trials were allowed. All warm-up tasks required two individuals: an experimenter to bait and place the cups and a handler to release the dog to make a choice and recall for subsequent trials (handlers could be either trained researchers or the dog’s guardian).

Once meeting the two-cup warm-up task criteria, the dogs moved on to two experimental conditions and were required to complete eight trials per condition (condition order counterbalanced between subjects). In the non-ostensive condition, the experimenter looked at the floor and cleared their throat while holding a piece of food in front of their body for the dog to notice before placing the food underneath one of two cups behind a visual occluder. They then removed the occluder and moved each of the cups to one of the lateral search locations. When the cups were in place, the experimenter cleared their throat and made a contralateral momentary point to the baited cup, holding the gesture for 2 seconds before retracing their hand and the handler released the dog to make a choice. The ostensive condition used the exact same baiting procedure and pointing gesture, but instead of clearing their throat and looking down, the experimenter used two ostensive cues to get the dogs’ attention. These cues, dog-directed speech and dog-directed gaze, were modeled on previous work where dogs had followed intentional, direct cues from the experimenter [@Miklosi.etal.1998a; @Soproni.etal.2001a; @Hare.etal.2002a; @Kaminski.Nitzschner.2013a; @Tauzin.etal.2015a]. The vocal cue the experimenter gave was “[dog name], look!”, and they gazed at the dog while showing the food and giving the pointing gesture. The two test conditions were separated by a one-minute play break and re-familiarization with the testing situation. After the two experimental conditions, the dogs completed an odor control condition with a similar set-up as the ostensive condition, except no point cue was given. The control was intended to determine whether the dogs were using olfactory instead of visual cues to solve the task.


## 2.6 Quality control
<!-- **Please describe the quality control components as requested here. As it states below, describe pilot study, first session video checks, reliability checks, etc.** -->
<!-- _Please list the methods used for quality control in the production of the data. This could include pilot work, attention checks, quality checks (e.g. reliability estimates), lab logs, item non-response management, etc._ -->

Collecting high-quality data was a key objective of ManyDogs 1.  To validate the study design and analysis plan, we conducted a pilot experiment at a single site with 91 dogs. We pre-registered the pilot study at the Open Science Framework (https://osf.io/gz5pj/). The pilot data are not included in this dataset.

For the primary study presented here, we pre-registered the hypotheses, methods, and analysis plan as a registered report at _Animal Behavior and Cognition_ (https://doi.org/10.31234/osf.io/f86jq). Because this study involved multiple sites running the same protocol, we sought to ensure consistent implementation across sites. During a researcher training phase, participating sites were required to submit videos of their team performing the protocol, as well as the full set of videos from the first dog tested. Two project administrators reviewed the videos for all sites and provided feedback on each site's implementation to improve consistency across sites.

Behavioral tests were video recorded and experimenters also live-coded the dog’s responses on paper. Data were compiled across sites through a data entry survey hosted on Qualtrics. Using a survey protected the resulting data file from errors associated with multiple individuals directly editing the file. To measure inter-rater reliability of the live coding of experimental sessions, each site had a research assistant blind to the project’s focus recode a subset of sessions. This recoding resulted in an overall Cohen's kappa of 0.98 with individual sites ranging from kappa = 0.92-1.00.


## 2.7 Data anonymization and ethical issues
<!--Please provide clarification on the ethical approval obtained for the data collection (e.g. which Institutional Review Board). All primary data should be captured under ethical approval. Please list any steps taken to anonymise the data and indicate other issues concerning research ethics (e.g., informed consent, use of pseudonyms, etc.).-->

Each research site participating in this study obtained approval from their respective institutional ethics committee [see Table S1 of @ManyDogsProject.etal.2023]. All guardians provided informed consent to participate and were free to discontinue from the study at any time.

All identifiable information has been removed from the dataset, including replacing dog names with ID numbers.


## 2.8 Existing use of data
<!--Please list any publications or outputs that have originated from this data. This list should be exhaustive to reflect the contributions made by the data to-date.-->
The behavioral data and a portion of the guardian data collected for the ManyDogs 1 study was used and published in:

ManyDogs Project, Espinosa, J., Stevens, J.R., Alberghina, D., Barela, J., Bogese, M., Bray, E., Buchsbaum, D., Byosiere, S.-E., Cavalli, C., Dror, S., Fitzpatrick, H., Freeman, M.S., Frinton, S., Gnanadesikan, G., Guran, C.-N.A., Glover, M., Hare, B., Hare, E., Hickey, M., Horschler, D., Huber, L., Jim, H.-L., Johnston, A., Kaminski, J., Kelly, D., Kuhlmeier, V.A., Lassiter, L., MacLean, E., Ostojic, L., Pelgrim, M.H., Pellowe, S., Salomons, H., Santos, L., Silver, Z.A., Silverman, J.M., Sommese, A., Völter, C., Walsh, C.,
Worth, Y.A., Zipperling, L.M.I., Żołędziewska, B., and Zylberfuden, S. G. (2023). ManyDogs 1: A multi-lab replication study of dogs’ pointing comprehension. _Animal Behavior and Cognition_, 10(3), 232-286.
https://doi.org/10.26451/abc.10.03.03.2023

## (3) Dataset description and access
<!-- The following section should relate specifically to the data file(s) being shared. This section has no word limits. Where you have multiple versions e.g., raw and processed data, please provide the following details for each file. -->
The dataset contains `r nrow(data)` observations of `r ncol(data)` variables described in a codebook and Table \@ref(tab:displayDescription). The dataset contains variables supplied by a survey as well as experimental variables. Data provided by each dog's guardian include demographic information about the dog and guardian, responses to questions about the types and frequencies of the dog's training activities, and answers to the C-BARQ. 

In addition to the data provided by guardians, experimental variables are included in this dataset.  These include information about  whether the dog completed the experiment and was used in the analysis, experimental conditions, and trial-by-trial data on correct choices (choosing the cup baited with a treat).

## 3.1 Repository location
<!-- Please include a permanent identifier, such as a DOI, that points to the online location of the dataset. If this has already been accessible prior, a new DOI does not need to be made. -->
The dataset for this study is available on the Open Science Framework at <https://osf.io/7rwpc/> (DOI: [10.17605/osf.io/7rwpc](https://doi.org/10.17605/osf.io/7rwpc)) and on GitHub at <https://github.com/ManyDogsProject/md1_data>.


## 3.2 Object/file name
<!-- Please note the exact name of the file or file set in the repository e.g., Raw_Data.csv -->
The file name for the dataset is `manydogs_etal_2024_data.csv` and the codebook is `manydogs_etal_2024_codebook.csv`.

## 3.3 Data type
<!-- Please describe the type of data using one or more terms e.g. primary data, secondary data, processed data, interpretation of data, or final report. -->
This dataset includes processed data from the ManyDogs 1 study. We have removed identifiable information, recoded data values for consistency, renamed and reordered columns for clarity, and combined survey data submitted by guardians via Qualtrics and behavioral data submitted by research teams via Qualtrics.


## 3.4 Format names and versions
<!-- Please note file format e.g., ASCII, CSV, SPSS, SAS, JPEG, Excel, SQL, etc., and any software required to access the file. -->
The dataset and codebook are provided in a comma-separated (`.csv`) plain text format. There is one version of the dataset with no anticipated additional versions, as data collection has ended.

## 3.5 Language
<!-- Language the data is stored as e.g., American English. -->
The variable names and text values are in English. Though data were collected in other languages (Croatian, Hungarian, Italian, Polish, and Spanish), the Qualtrics surveys were coded to save responses in English.

## 3.6 License
<!-- The open license under which the data has been deposited (e.g. CC0).  -->
The ManyDogs 1 dataset is available under a [CC BY 4.0 license](https://creativecommons.org/licenses/by/4.0/), which allows users to share (copy and redistribute the material in any medium or format for any purpose, even commercially) and adapt (remix, transform, and build upon the material for any purpose, even commercially) this material as long as they give appropriate credit, provide a link to the license, indicate if changes were made, and do not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

## 3.7 Limits to sharing
<!-- If the data is currently under embargo, please give the length and date at which the data will be made open. Otherwise, please note any potential barriers to full sharing of the data e.g., if it contains identifying information, how gatekeeping is maintained. Please note that you will need to provide full access to the journal editors and reviewers for the purposes of peer review, in full confidentiality. -->
The dataset is freely available for download on the [Open Science Framework](https://doi.org/10.17605/osf.io/7rwpc). There are no limits to sharing beyond those described in the license.

## 3.8 Publication date
<!-- If already public, please state the date the dataset was published in the repository (dd/mm/yyyy). -->
The dataset was uploaded to the [Open Science Framework](https://doi.org/10.17605/osf.io/7rwpc) on 2024-02-06 and updated on 2024-05-02.

## 3.9 FAIR data/Codebook
<!-- Please provide details of how you have made the data conform to FAIR guidelines (Findability, Accessibility, Interoperability, and Reuse; see https://en.wikipedia.org/wiki/FAIR_data). This includes reporting any relevant meta-data and as a minimum should include the details of a data codebook to help independent parties interpret your data file. -->

This dataset is _findable_ through the persistent identifier on the Open Science Framework (DOI: [10.17605/osf.io/7rwpc](https://doi.org/10.17605/osf.io/7rwpc)), _accessible_ through free availability on Open Science Framework and GitHub, _interoperable_ by using plain-text CSV data files, and _reusable_ with the CC-BY 4.0 license. Metadata are included as codebook here (Table \@ref(tab:displayDescription)) and with the data on Open Science Framework and GitHub.


## (4) Reuse potential
The original data from ManyDogs 1 [@ManyDogsProject.etal.2023] focuses on dog responses in the two-alternative object-choice task across warm-up, ostenstive, non-ostenstive, and odor control trials. In addition, that dataset includes basic demographics on the dog and guardian, as well as the mean trainability score from the C-BARQ. The current dataset adds information on dog origin and household, dog training experience, guardian communication practices, and the complete C-BARQ profile. The C-BARQ data are quite rich, with sections on training, aggression, fear, separation-related behavior, excitability, attachment and attention seeking, and miscellaneous problem behaviors. Thus, this dataset allows for assessing associations among all of the C-BARQ measures as well as connections to the experimental task data and the other dog and guardian characteristic data.

A key strength of this dataset is its diversity. The data were collected by 20 different research sites in eight countries, allowing the assessment of site effects as well as cultural differences. In addition, while most dogs are kept in private homes, the dataset also includes a subset of dogs kept in group housing at working dog facilities. Finally, breed is included, allowing the exploration of breed differences.

```{r}
cbarq_completion <- data |> 
  count(site, continue_cbarq) |> 
  mutate(freq = prop.table(n), .by = site) |> 
  filter(continue_cbarq == "Yes")
```


One limitation of this dataset is that, though the C-BARQ training survey questions were compulsory for all guardians, the remaining questions were optional to ease the survey burden. As a result, `r length(data$continue_cbarq[data$continue_cbarq == "Yes"])` of the `r data_rows` guardians elected to continue on to the optional questions (though not all completed the survey). Importantly, the completion rate varied across research sites, ranging from `r print_num(min(cbarq_completion$freq) * 100, digits = 1)` to `r print_num(max(cbarq_completion$freq) * 100, digits = 1)`%, potentially introducing bias in responses to the optional questions across sites.

Despite these limitations, this dataset provides valuable data on dog point-following behavior in the face of conflicting interpretations in the literature as informative or associative [@Wynne.etal.2008a; @Topal.etal.2009; @Kaminski.etal.2012; @Kaminski.Nitzschner.2013a; @Wobber.Kaminski.2011]. Moreover, it provides critical large-scale data investigating particular methodologies used in these tasks (namely contralateral, momentary pointing), which can result in weaker following behavior in dogs [@Lyn.etal.2024]. The large sample size and the rich demographic data provides one of the most extensive and diverse researcher-collected datasets on dog behavior and cognition. Our hope is that this dataset will inspire canine scientists to strive for large sample sizes, work across research sites, and collect thorough demographic data to better characterize dog behavior in a way to improve dog welfare and the dog-human bond.

## Contribution Statement 
<!-- _Please list all contributions towards this manuscript, including the contributions of all individuals who helped to collect the data (who may also not be an author of the data paper), including their roles and affiliations at the time of data collection._ -->

The authors made the following contributions. Julia Espinosa: Conceptualization, Data curation, Formal analysis, Funding acquisition, Methodology, Project administration, Supervision, Writing - original draft, Writing - review & editing; Elizabeth Hare: Conceptualization, Data curation, Formal analysis, Methodology, Project administration, Software, Validation, Writing - original draft, Writing - review & editing; Daniela Alberghina: Investigation, Validation, Writing - original draft, Writing - review & editing; Brian Perez: Investigation, Validation, Writing - original draft, Writing - review & editing; Jeffrey R. Stevens: Conceptualization, Data curation, Formal analysis, Methodology, Project administration, Software, Supervision, Visualization, Writing - original draft, Writing - review & editing.

For the original ManyDogs 1 study, data were collected by: D. Alberghina., H.E.E. Alway, J.D. Barela, E.E. Bray, S.-E. Byosiere, C.M. Cavalli, L.M. Chaudoir, C. Collins-Pisano, H.J. DeBoer, L.E.L.C. Douglas, S. Dror, M.V. Dzik, B. Ferguson, L. Fisher, H.C. Fitzpatrick, M.S. Freeman, S.N. Frinton, M.K. Glover, J.E.P. Goacher, M. Golańska, M.
Hickey, H.-L. Jim, D.M. Kelly, V.A. Kuhlmeier, L. Lassiter, L. Lazarowski, J. Leighton-Birch, K. Maliszewska, V. Marra, L.I. Montgomery, M.S. Murray, E.K. Nelson, L. Ostojić, S.G. Palermo, A.E. Parks Russell, M.H. Pelgrim, S.D. Pellowe, A. Reinholz, L.A. Rial, E.M. Richards, M.A. Ross, L.G. Rothkoff, H.Salomons, J.K. Sanger, A.R. Schirle, S.J. Shearer, J.M. Silverman, A. Sommese, T. Srdoc, H. St. John-Mosse, K. Vékony, Y.A. Worth, L.M.I. Zipperling, B. Żołędziewska, and S.G. Zylberfuden.

## Acknowledgments
<!-- Please add any relevant acknowledgements to anyone else who supported the project in which the data was collected, but did not work directly on the data itself. -->
We are grateful to all of the research teams and dog guardians who helped generate these data. We are grateful to James Serpell for allowing us to use the C-BARQ questionnaire.

## Conflict of Interest
The author(s) declare no conflict of interest associated with the publication of this manuscript.

## Funding statement
<!-- If the data, or the project from which it came, required funding, please provide clear detail of this here. For example, if funded by a research council this would include the year of successful receipt of funding, the name of the funding council, and the grant number. -->
We are grateful to the Big Team Science Conference for funding the article processing fee via a grant to JE.

\newpage


# References

::: {#refs custom-style="Bibliography"}
:::

\newpage
\small

``` {r, displayDescription, results="asis", echo = FALSE, messages = FALSE, warning = FALSE}
### display NAs as blank
options(knitr.kable.NA = "")
### human readable column names
colnames(codebook) <- c("Category of Variable", "Variable Name", "Description",
                        "Question Text", "Possible Response Values")

### convert to kable
codebook_table_latex <- kable(codebook,
                   format = "latex", booktabs = TRUE, longtable = TRUE, linesep = "",
                   caption = "Data codebook for ManyDogs 1 study data") |> 
  kable_styling(latex_options = c("repeat_header", "scale_down"), font_size = 8) |> 
  column_spec(1, width = "1.25in") |> 
  column_spec(2, monospace = TRUE) |> 
  column_spec(3, width = "1.8in") |> 
  column_spec(4:5, width = "2.25in") |> 
  landscape(margin = "1cm")

# codebook_table_latex

codebook_table_html <- flextable::flextable(codebook) |>
  width(j = 1, 1.45) |> 
  width(j = 2, 1.6) |> 
  width(j = 3, 1.8) |> 
  width(j = 4:5, 2.25) |>
  fontsize(size = 10) |> 
  font(part = "all", fontname = "Times New Roman") |> 
  set_caption("Data description for complete ManyDogs 1 study data")

if (knitr::is_latex_output()) {
  codebook_table_latex
} else {
  codebook_table_html
}

```