The goal of ploswater is to make available data on publication trends from the PLOS Water Journal.
You can install the development version of ploswater from GitHub with:
# install.packages("devtools")
devtools::install_github("openwashdata/ploswater")
## Run the following code in console if you don't have the packages
## install.packages(c("dplyr", "knitr", "readr", "stringr", "gt", "kableExtra"))
library(dplyr)
library(knitr)
library(readr)
library(stringr)
library(gt)
library(kableExtra)
Alternatively, you can download the individual datasets as a CSV or XLSX file from the table below.
- Click Download CSV. A window opens that displays the CSV in your browser.
- Right-click anywhere inside the window and select “Save Page As…”.
- Save the file in a folder of your choice.
dataset | CSV | XLSX |
---|---|---|
ploswater | Download CSV | Download XLSX |
The package provides access to data from publications in the PLOS Water Journal. This data includes information on publication types, SDGs targeted, authors, institutions, and more.
library(ploswater)
The dataset ploswater
contains data about … It has 240 observations
and 1506 variables
ploswater |>
select(1:10) |> # Select first 10 columns
head(3) |>
gt::gt() |>
gt::as_raw_html()
id | doi | title | display_name | publication_year | publication_date | language | type | type_crossref | indexed_in |
---|---|---|---|---|---|---|---|---|---|
https://openalex.org/W4281638009 | https://doi.org/10.1371/journal.pwat.0000026 | Water, sanitation, and women’s empowerment: A systematic review and qualitative metasynthesis | Water, sanitation, and women’s empowerment: A systematic review and qualitative metasynthesis | 2022 | 2022-06-07 | en | review | journal-article | "crossref" |
https://openalex.org/W4311557053 | https://doi.org/10.1371/journal.pwat.0000058 | Water remains a blind spot in climate change policies | Water remains a blind spot in climate change policies | 2022 | 2022-12-15 | en | article | journal-article | "crossref" |
https://openalex.org/W4225499001 | https://doi.org/10.1371/journal.pwat.0000007 | Operationalizing a routine wastewater monitoring laboratory for SARS-CoV-2 | Operationalizing a routine wastewater monitoring laboratory for SARS-CoV-2 | 2022 | 2022-02-15 | en | article | journal-article | "crossref" |
For an overview of the variable names, see the following table.
variable_name | variable_type | description |
---|---|---|
id | character | Unique Identifier |
doi | character | DOI of the study |
title | character | Title of the study |
display_name | character | Displayed Name in OpenAlex |
publication_year | numeric | Year of publication |
publication_date | Date | Date of publication |
language | character | Original publication language |
type | character | Indicates the type of scholarly work |
type_crossref | character | The work type as specified by Crossref |
indexed_in | character | NA |
library(ploswater)
library(ggplot2)
library(lubridate)
# Convert publication_date to Date type
ploswater$publication_date <- as.Date(ploswater$publication_date)
# Create year-month column and count publications
monthly_counts <- ploswater %>%
mutate(publication_month = floor_date(publication_date, "month")) %>%
count(publication_month) %>%
arrange(publication_month)
# Create bar chart
ggplot(monthly_counts, aes(x = publication_month, y = n)) +
geom_bar(stat = "identity", fill = "steelblue") +
labs(x = "",
y = "Number of Publications",
title = "Number of Publications per Month") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_date(date_breaks = "3 month",
date_labels = "%b %Y")
# Create citation impact visualization
ggplot(ploswater, aes(x = publication_date, y = cited_by_count)) +
geom_point(aes(size = fwci), alpha = 0.6) +
scale_size_continuous(name = "Field-Weighted\nCitation Impact") +
labs(x = "Publication Date",
y = "Number of Citations",
title = "Citation Impact Over Time") +
theme_minimal()
library(gt)
# Compute summary statistics
summary_stats <- data.frame(
Statistic = c(
"Mean Citations",
"Median Citations",
"Mean FWCI",
"Total Publications",
"Correlation: Citations & FWCI"
),
Value = c(
mean(ploswater$cited_by_count, na.rm = TRUE),
median(ploswater$cited_by_count, na.rm = TRUE),
mean(ploswater$fwci, na.rm = TRUE),
round(nrow(ploswater), 0),
cor(ploswater$cited_by_count, ploswater$fwci, use = "complete.obs")
)
)
# Create and print the table using gt
summary_stats %>%
gt() %>%
tab_header(
title = "Summary Statistics"
) %>%
gt::as_raw_html()
Summary Statistics | |
Statistic | Value |
---|---|
Mean Citations | 3.0583333 |
Median Citations | 1.0000000 |
Mean FWCI | 1.3637913 |
Total Publications | 240.0000000 |
Correlation: Citations & FWCI | 0.4791594 |
Data are available as CC-BY.
Please cite this package using:
citation("ploswater")
#> To cite package 'ploswater' in publications use:
#>
#> Dubey Y (2025). "ploswater: Data on Publications in PLOS Water
#> Journal." doi:10.5281/zenodo.14616993
#> <https://doi.org/10.5281/zenodo.14616993>,
#> <https://github.com/openwashdata/ploswater>.
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Misc{dubey:2025,
#> title = {ploswater: Data on Publications in PLOS Water Journal},
#> author = {Yash Dubey},
#> year = {2025},
#> doi = {10.5281/zenodo.14616993},
#> url = {https://github.com/openwashdata/ploswater},
#> abstract = {Provides access to publishing trends from PLOS Water Journal.},
#> version = {0.0.0.9000},
#> }