Skip to content

openwashdata/ploswater

Repository files navigation

ploswater

License: CC BY 4.0 R-CMD-check DOI

The goal of ploswater is to make available data on publication trends from the PLOS Water Journal.

Installation

You can install the development version of ploswater from GitHub with:

# install.packages("devtools")
devtools::install_github("openwashdata/ploswater")
## Run the following code in console if you don't have the packages
## install.packages(c("dplyr", "knitr", "readr", "stringr", "gt", "kableExtra"))
library(dplyr)
library(knitr)
library(readr)
library(stringr)
library(gt)
library(kableExtra)

Alternatively, you can download the individual datasets as a CSV or XLSX file from the table below.

  1. Click Download CSV. A window opens that displays the CSV in your browser.
  2. Right-click anywhere inside the window and select “Save Page As…”.
  3. Save the file in a folder of your choice.
dataset CSV XLSX
ploswater Download CSV Download XLSX

Data

The package provides access to data from publications in the PLOS Water Journal. This data includes information on publication types, SDGs targeted, authors, institutions, and more.

library(ploswater)

ploswater

The dataset ploswater contains data about … It has 240 observations and 1506 variables

ploswater |> 
  select(1:10) |>  # Select first 10 columns
  head(3) |> 
  gt::gt() |>
  gt::as_raw_html()
id doi title display_name publication_year publication_date language type type_crossref indexed_in
https://openalex.org/W4281638009 https://doi.org/10.1371/journal.pwat.0000026 Water, sanitation, and women’s empowerment: A systematic review and qualitative metasynthesis Water, sanitation, and women’s empowerment: A systematic review and qualitative metasynthesis 2022 2022-06-07 en review journal-article "crossref"
https://openalex.org/W4311557053 https://doi.org/10.1371/journal.pwat.0000058 Water remains a blind spot in climate change policies Water remains a blind spot in climate change policies 2022 2022-12-15 en article journal-article "crossref"
https://openalex.org/W4225499001 https://doi.org/10.1371/journal.pwat.0000007 Operationalizing a routine wastewater monitoring laboratory for SARS-CoV-2 Operationalizing a routine wastewater monitoring laboratory for SARS-CoV-2 2022 2022-02-15 en article journal-article "crossref"

For an overview of the variable names, see the following table.

variable_name variable_type description
id character Unique Identifier
doi character DOI of the study
title character Title of the study
display_name character Displayed Name in OpenAlex
publication_year numeric Year of publication
publication_date Date Date of publication
language character Original publication language
type character Indicates the type of scholarly work
type_crossref character The work type as specified by Crossref
indexed_in character NA

Example

library(ploswater)
library(ggplot2)
library(lubridate)

# Convert publication_date to Date type
ploswater$publication_date <- as.Date(ploswater$publication_date)

# Create year-month column and count publications
monthly_counts <- ploswater %>%
  mutate(publication_month = floor_date(publication_date, "month")) %>%
  count(publication_month) %>%
  arrange(publication_month)

# Create bar chart
ggplot(monthly_counts, aes(x = publication_month, y = n)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(x = "",
       y = "Number of Publications",
       title = "Number of Publications per Month") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_x_date(date_breaks = "3 month", 
               date_labels = "%b %Y")

# Create citation impact visualization
ggplot(ploswater, aes(x = publication_date, y = cited_by_count)) +
  geom_point(aes(size = fwci), alpha = 0.6) +
  scale_size_continuous(name = "Field-Weighted\nCitation Impact") +
  labs(x = "Publication Date",
       y = "Number of Citations",
       title = "Citation Impact Over Time") +
  theme_minimal()

library(gt)

# Compute summary statistics
summary_stats <- data.frame(
  Statistic = c(
    "Mean Citations",
    "Median Citations",
    "Mean FWCI",
    "Total Publications",
    "Correlation: Citations & FWCI"
  ),
  Value = c(
    mean(ploswater$cited_by_count, na.rm = TRUE),
    median(ploswater$cited_by_count, na.rm = TRUE),
    mean(ploswater$fwci, na.rm = TRUE),
    round(nrow(ploswater), 0),
    cor(ploswater$cited_by_count, ploswater$fwci, use = "complete.obs")
  )
)

# Create and print the table using gt
summary_stats %>%
  gt() %>%
  tab_header(
    title = "Summary Statistics"
  ) %>%
  gt::as_raw_html()
Summary Statistics
Statistic Value
Mean Citations 3.0583333
Median Citations 1.0000000
Mean FWCI 1.3637913
Total Publications 240.0000000
Correlation: Citations & FWCI 0.4791594

License

Data are available as CC-BY.

Citation

Please cite this package using:

citation("ploswater")
#> To cite package 'ploswater' in publications use:
#> 
#>   Dubey Y (2025). "ploswater: Data on Publications in PLOS Water
#>   Journal." doi:10.5281/zenodo.14616993
#>   <https://doi.org/10.5281/zenodo.14616993>,
#>   <https://github.com/openwashdata/ploswater>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Misc{dubey:2025,
#>     title = {ploswater: Data on Publications in PLOS Water Journal},
#>     author = {Yash Dubey},
#>     year = {2025},
#>     doi = {10.5281/zenodo.14616993},
#>     url = {https://github.com/openwashdata/ploswater},
#>     abstract = {Provides access to publishing trends from PLOS Water Journal.},
#>     version = {0.0.0.9000},
#>   }