Skip to content

Commit

Permalink
Merge pull request #42 from vshiv18/main
Browse files Browse the repository at this point in the history
Python bindings
  • Loading branch information
ishmeals authored Jul 2, 2024
2 parents d623835 + f3e3a7b commit 52b5bb9
Show file tree
Hide file tree
Showing 8 changed files with 190 additions and 5 deletions.
4 changes: 4 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[submodule "extern/pybind11"]
path = extern/pybind11
url = ../../pybind/pybind11
branch = stable
36 changes: 32 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,17 @@
## What is the Digest library?
- a `C++` library that supports various sub-sampling schemes for $k$-mers in DNA sequences.
- `Digest` library utilizes the rolling hash-function from [ntHash](https://github.com/bcgsc/ntHash) to order the $k$-mers in a window.

- a set of Python bindings that allow the user to run functions from the C++ library in Python.

## How to install and build into your project?
<img width="600" alt="image2" src="https://github.com/oma219/digest/assets/32006908/7cea427e-c22a-4271-a234-a2aafeb45413">

### Step 1: Install library

After cloning from GitHub, we use the [Meson](https://mesonbuild.com) build-system to install the library.
- `PREFIX` is an absolute path to library files will be install (`*.h` and `*.a` files)
- **IMPORTANT**: `PREFIX` should not be the root directory of the `Digest/` repo to avoid any issues with installation.
- **IMPORTANT**: `PREFIX` should not be the root directory of the `digest/` repo to avoid any issues with installation.
- We suggest using `--prefix=$(pwd)/build` from within the root directory of the `digest/` repo.
- These commands generate an `include` and `lib` folders in `PREFIX` folder

```bash
Expand All @@ -36,10 +38,10 @@ If your coding project uses `Meson` to build the executable(s), you can include

#### (b) Using `g++`:

To use Digest in your C++ project, you just need to include the header files (`*.h`) and library file (`*.a`) that were installed in the first step. Assuming that `install/` is the directory you installed them in, here is how you can compile.
To use Digest in your C++ project, you just need to include the header files (`*.h`) and library file (`*.a`) that were installed in the first step. Assuming that `build/` is the directory you installed them in, here is how you can compile.

```bash
g++ -std=c++17 -o main main.cpp -I install/include/ -L install/lib -lnthash
g++ -std=c++17 -o main main.cpp -I build/include/ -L build/lib -lnthash
```

## Detailed Look at Example Usage (2 ways):
Expand Down Expand Up @@ -75,6 +77,32 @@ std::vector<std::pair<size_t, size_t>> output;
digester.roll_minimizer(100, output);
```
## Python binding support
Included in the library are function bindings for each sub-sampling scheme for use in Python. To install the Python module, first install the library with `meson` (see above for detailed instructions), and install with `pip`. For this setup, the `meson` prefix must be set to `--prefix=/$DIGEST_REPO/build`:
```
meson setup --prefix=$(pwd)/build --buildtype=release build
meson install -C build
pip install .
```
Alternatively, copy the `lib` and `include` directories from the earlier meson installation to a directory in the repo called `build`, and run `pip install .`
We recommend using a conda or python virtual environment.
Once installed, you can import and use the Digest library in Python:
```
>>> from Digest import window_minimizer, syncmer, modimizer
>>> window_minimizer('ACGTACGTAGCTAGCTAGCTAGCTGATTACATACTGTATGCAAGCTAGCTGATCGATCGTAGCTAGTGATGCTAGCTAC', k=5, w=11)
[4, 5, 16, 19, 21, 26, 27, 35, 39, 49, 57, 63, 68]
>>> modimizer('ACGTACGTAGCTAGCTAGCTAGCTGATTACATACTGTATGCAAGCTAGCTGATCGATCGTAGCTAGTGATGCTAGCTAC', k=5, mod=5)
[23, 34, 38, 40, 62, 67]
>>> syncmer('ACGTACGTAGCTAGCTAGCTAGCTGATTACATACTGTATGCAAGCTAGCTGATCGATCGTAGCTAGTGATGCTAGCTAC', k=5, w=15)
[0, 3, 4, 5, 7, 12, 13, 27, 35, 49]
>>> modimizer('ATCGTGCATCA', k=4, mod=2, include_hash=True)
[(0, 1122099596), (2, 249346952), (4, 227670418), (7, 123749036)]
>>> seq = 'ACGTACGTAGCTAGCTAGCTAGCTGATTACATACTGTATGCAAGCTAGCTGATCGATCGTAGCTAGTGATGCTAGCTAC'
>>> [seq[p:p+5] for p in window_minimizer(seq, k=5, w=11)]
['ACGTA', 'CGTAG', 'AGCTA', 'TAGCT', 'GCTGA', 'TTACA', 'TACAT', 'GTATG', 'GCAAG', 'TGATC', 'CGTAG', 'TAGTG', 'ATGCT']
```
<!---
# Implementation
Expand Down
1 change: 1 addition & 0 deletions extern/pybind11
Submodule pybind11 added at 01ab93
3 changes: 2 additions & 1 deletion meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ project(
'optimization=3',
'werror=true',
'warning_level=3',
'buildtype=release'
# 'b_sanitize=thread',
],
)
Expand Down Expand Up @@ -81,4 +82,4 @@ if get_option('buildtype') != 'release'
else
warning('Documentation disabled without doxygen')
endif
endif
endif
15 changes: 15 additions & 0 deletions pybind/bindings.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <digest_utils.hpp>

namespace py = pybind11;

PYBIND11_MODULE(Digest, m) {
m.doc() = "bindings for Digest";
m.def("window_minimizer", &window_minimizer, "A function that runs window minimizer digestion",
py::arg("seq"), py::arg("k") = 31, py::arg("w") = 11, py::arg("include_hash") = false);
m.def("modimizer", &modimizer, "A function that runs mod-minimizer digestion",
py::arg("seq"), py::arg("k") = 31, py::arg("mod") = 100, py::arg("include_hash") = false);
m.def("syncmer", &syncmer, "A function that runs syncmer digestion",
py::arg("seq"), py::arg("k") = 31, py::arg("w") = 11, py::arg("include_hash") = false);
}
50 changes: 50 additions & 0 deletions pybind/digest_utils.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#include <digest/window_minimizer.hpp>
#include <digest/syncmer.hpp>
#include <digest/mod_minimizer.hpp>
#include <variant>

std::variant<std::vector<uint32_t>, std::vector<std::pair<uint32_t, uint32_t>>> window_minimizer (
const std::string &seq, unsigned k, unsigned large_window, bool include_hash=false) {
digest::WindowMin<digest::BadCharPolicy::SKIPOVER, digest::ds::Adaptive> digester (seq, k, large_window);
if (include_hash) {
std::vector<std::pair<uint32_t, uint32_t>> output;
digester.roll_minimizer(seq.length(), output);
return output;
}
else {
std::vector<uint32_t> output;
digester.roll_minimizer(seq.length(), output);
return output;
}
}
//std::vector<std::pair<size_t, size_t>> output;

std::variant<std::vector<uint32_t>, std::vector<std::pair<uint32_t, uint32_t>>> modimizer (
const std::string &seq, unsigned k, uint32_t mod, bool include_hash=false) {
digest::ModMin<digest::BadCharPolicy::SKIPOVER> digester (seq, k, mod);
if (include_hash) {
std::vector<std::pair<uint32_t, uint32_t>> output;
digester.roll_minimizer(seq.length(), output);
return output;
}
else {
std::vector<uint32_t> output;
digester.roll_minimizer(seq.length(), output);
return output;
}
}

std::variant<std::vector<uint32_t>, std::vector<std::pair<uint32_t, uint32_t>>> syncmer (
const std::string &seq, unsigned k, unsigned large_window, bool include_hash=false) {
digest::Syncmer<digest::BadCharPolicy::WRITEOVER, digest::ds::Adaptive> digester (seq, k, large_window);
if (include_hash) {
std::vector<std::pair<uint32_t, uint32_t>> output;
digester.roll_minimizer(seq.length(), output);
return output;
}
else {
std::vector<uint32_t> output;
digester.roll_minimizer(seq.length(), output);
return output;
}
}
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[build-system]
requires = ["setuptools", "pybind11", "ninja", "meson"]
build-backend = "setuptools.build_meta"
83 changes: 83 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
### Adapted from Uncalled setup.py by Sam Kovaka

from setuptools import setup, find_packages, Extension
from setuptools.command.build_ext import build_ext
import os
import subprocess
import sys

try:
from pybind11.setup_helpers import Pybind11Extension, ParallelCompile, naive_recompile
ParallelCompile("NPY_NUM_BUILD_JOBS", default=4, needs_recompile=naive_recompile).install()
except:
from setuptools import Extension as Pybind11Extension

ROOT_DIR = os.getcwd()

class get_pybind_include(object):
"""Helper class to determine the pybind11 include path
The purpose of this class is to postpone importing pybind11
until it is actually installed, so that the ``get_include()``
method can be invoked. """

def __str__(self):
import pybind11
from pybind11.setup_helpers import Pybind11Extension
return pybind11.get_include()

digest = Pybind11Extension(
"Digest",
sources = ['pybind/bindings.cpp'],
include_dirs = [
'build/include',
'pybind',
get_pybind_include()
],
library_dirs=['build/lib'],
define_macros = [("PYBIND", None)],
extra_compile_args=['-std=c++17', '-fPIC']
)

# class MesonBuildExt(build_ext):
# def run(self):
# # Check for Meson and Ninja installation
# try:
# subprocess.check_output(['meson', '--version'])
# except FileNotFoundError:
# raise RuntimeError("Meson must be installed to build the extensions")

# try:
# subprocess.check_output(['ninja', '--version'])
# except FileNotFoundError:
# raise RuntimeError("Ninja must be installed to build the extensions")

# def build_extension(self):
# build_temp = os.path.abspath(self.build_temp)
# # ext_fullpath = self.get_ext_fullpath(ext.name)
# # ext_dir = os.path.abspath(os.path.dirname(ext_fullpath))
# ext_dir = os.path.abspath(os.path.join(ROOT_DIR, 'build'))
# meson_build_dir = os.path.join(build_temp, 'meson_build')

# # Create build directory if it doesn't exist
# if not os.path.exists(meson_build_dir):
# os.makedirs(meson_build_dir)

# meson_args = [
# 'meson', 'setup', '--prefix', ext_dir,
# '--buildtype=release', meson_build_dir
# ]

# subprocess.check_call(meson_args)
# subprocess.check_call(['meson', 'install', '-C', meson_build_dir])

setup(
name = 'Digest',
version = '0.2',
python_requires=">=3.8",
setup_requires=['setuptools', 'pybind11>=2.6.0', 'meson','ninja'],
install_requires=['pybind11>=2.6.0'],
packages=find_packages(),
ext_modules = [digest],
include_package_data=True,
# cmdclass={'build_ext': MesonBuildExt},
)

0 comments on commit 52b5bb9

Please sign in to comment.