title | tags | authors | affiliations | date | bibliography | breaks | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Empirical: A scientific software library for research, education, and public engagement |
|
|
|
13 February 2024 |
paper.bib |
false |
Empirical is a C++ library designed to promote open science and facilitate the development of scientific software that is efficient, reliable, and easily distributable to researchers and non-experts alike. Specifically, the library sets out to fulfill the following goals:
- Utility: Empirical tools streamline common scientific computing tasks such as configuration, end-to-end data management, and mathematical manipulations.
- Efficiency: Empirical implements general-purpose data structures and algorithms that emphasize computational efficiency to support scientific computing workloads.
- Reliability: Empirical provides sophisticated debug-mode instrumentation including audited memory management and safety-checked versions of standard library containers.
- Distributability: Empirical is highly portable, uses common data formats, and facilitates compile-to-web app development with object-oriented bindings for Emscripten/WebAssembly GUI elements, all with the goal of building broadly accessible scientific software.
High quality open-science tools improve code quality, scientific rigor, and ease of replication or extension for scientific software. Empirical's debugging suite combats C++ programming pitfalls, such as iterator invalidation, memory leakage, and out-of-bounds indexing. Throughout, library design achieves both performance and safety through compile-time toggling of checks for undefined or incorrect behavior.
Unfortunately, in practice, scientific software is often difficult to obtain, install, or use. Modern web-based interfaces give computational research the potential to better embody open science objectives by empowering easier and more complete access [@woelfle2011open]. Empirical leverages modern web technology to provide browser-based interactive interfaces for C++ source code.
Empirical components are subjected to structured code review, unit testing with coverage tracking, and other best practices detailed in our documentation. Effort invested into optimization of the library's utilities enables developer-users to more easily produce safe and efficient software, especially for new developers. We provide a template project that streamlines laying out crosscompilation boilerplate.
As an example of Emprical's utility, the library provides a configuration framework that includes utilities to
- create documented configuration parameters with default values in a single line of C++ code,
- adjust parameters via configuration files, command line flags, URL query parameters, or in-browser GUIs,
- perform on-the-fly configuration adjustments, and
- support independent configuration subsystems.
High-quality software needs a robust, inclusive, and diverse community of users and contributors. Our development practices reflect this priority.
Educational editions of scientific software promote classroom learning and citizen science. The Emscripten compiler enables an existing native codebase to additionally compile to the web [@zakai2011emscripten]. Browser-based delivery can yield particularly effective public-facing apps due to easy access and compelling interfaces.
Empirical amplifies Emscripten by fleshing out its interface for interaction with browser elements.
DOM elements are bound to corresponding C++ objects (e.g., emp::Button
manages a <button>
and emp::Canvas
manages a <canvas>
) and are easily manipulated from within C++.
Empirical also packages collections of prefabricated web widgets (e.g., configuration managers or collapsible data displays).
These tools simplify generating a mobile-friendly, web-based GUI.
A live demo of Empirical widgets, presented alongside their source C++ code, is available here.
WebAssembly's runtime efficiency --- achieving 50% to 90% of native performance [@jangda2019not] --- has driven adoption in web development [@haas2017bringing] and enabled new possibilities for browser-based scientific computation. For example, Avida-ED leverages WebAssembly to incorporate sophisticated agent-based evolution models into classroom activities.
More broadly, Empirical provides optimized tools for performance-critical tasks.
For example, emp::BitArray
and emp::BitVector
are faster drop-in replacements for their standard library equivalents (std::bitset
and std::vector<bool>
) with extensive additional functionality.
More fundamentally, Empirical's header-only design prioritizes ease of use and runtime performance, albeit at the cost of longer compilation times.
Although performant, C++'s permissiveness to out-of-bounds indexing or memory management errors can undermine the validity of generated data and analyses.
Standard library vendors --- like libstdc++
, libc++
, and stl
--- provide some runtime safety features, but these are incomplete and poorly documented1.
Empirical supplements vendor offerings with debug mode stand-ins for standard library containers and even raw pointers that can identify memory leaks and invalid memory access.
Developers typically compensate for C++'s missing guardrails with external toolchains like Valgrind, GDB, and sanitizers. Although mature, such tooling suffers substantial limitations1, particularly for WASM compiled with Emscripten. Although Emscripten provides some sanitizer support and other debugging features, Empirical's safety features offset remaining limitations, such as the lack of a steppable debugger.
Empirical remains under active development. Current priorities include web-friendly refinements (e.g., file management, rich text handling) and additional step-by-step tutorials for new users. That said, Empirical has largely converged to API stability, and releases are archived on Zenodo for those who depend on them [@empirical_2020].
Empirical already underlies major projects within digital evolution, artificial life, and genetic programming. To benefit the broader scientific software and open science community, we look forward to welcoming new collaborations and supporting a wider collection of end-users.
Several projects pursue objectives related to Empirical's.
RepastHPC, accessible at https://repast.github.io/, is a C++ modeling framework targeted to high-performance computing [@collier2013parallel; @north2013complex]. A Java-based counterpart, Repast Simphony, provides interactive GUI support.
Boost C++ Libraries, available at https://www.boost.org/, implement a broad portfolio of software components. However, Boost lacks tools for web-based GUI, configuration management, or data management tailored to scientific software.
Emscripten provides cross-compilation from C++ to WebAssembly and available at https://emscripten.org/ [@zakai2011emscripten]. Empirical furnishes a complementary high-level interface to Emscripten intrinsics.
Cheerp, another C++ to WebAssembly compiler, is available at https://leaningtech.com/cheerp/. Like Emscripten, Cheerp provides primarily low-level APIs for browser interaction.
- AAGOS [@gillespie2018changing]: model to test impact of environmental change on genetic architecture evolution.
- Conduit [@moreno2022best]: library for best-effort communication in high-performance computing.
- DISHTINY [@moreno2019toward]: agent-based model to study major transitions in evolution.
- ecology in evolutionary computation explorer [@dolson2018evological]: interactive visualization of ecological interaction networks in evolutionary computation.
- Symbulation [@vostinar2017suicide]: agent-based model for evolution of parasitism, mutualism, and commensalism.
- SignalGP [@Lalejini2018-GECCO; @moreno2021signalgp]: an event-driven genetic programming substrate.
- PhylotrackPy [@dolson2024phylotrack]: a phylogeny-tracking tool for agent-based evolution, closely integrated with Empirical codebase.
- Model of cancer evolution on an oxygen gradient.
This research was supported in part by NSF grants DEB-1655715 and DBI-0939454, by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1424871, by Michigan State University through the computational resources provided by the Institute for Cyber-Enabled Research, and by the Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship, a Schmidt Futures program. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Footnotes
-
For example, neither GCC 10.3 nor Clang 12.0.0 detect
std::vector
iterator invalidation when appending to astd::vector
happens to fall within existing allocated buffer space (GCC live example; Clang live example). Clang 12.0.0's sanitizers also fail to detect this iterator invalidation (live example). ↩ ↩2