- Hong Kong
- https://x.com/Andrew_WXY
DSci@cli
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
Open-source scientific and technical publishing system built on Pandoc.
An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.
Search and browse documents and data; find the people and companies you look for.
All files of thecleverprogrammer.com
A curated list of Polars talks, tools, examples & articles. Contributions welcome !
Data validation using Python type hints
π― Personal data science and machine learning toolbox
Free Data Engineering course!
Convert PDF to markdown + JSON quickly with high accuracy
OCR, layout analysis, reading order, table recognition in 90+ languages
ACLED v5 (1997-2014) Conflict Dataset (http://www.acleddata.com/data/version-5-data-1997-2014) Visualization
An open source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI artifact.
π πΈ Easily build, backtest and deploy your algo in just a few lines of code. Trade stocks, cryptos, and forex across exchanges w/ one package.
Introduction to the Command Line for Genomics
Data Cleaning with OpenRefine for Ecologists
Label, clean and enrich text datasets with LLMs.
TerminusDB is a distributed database with a collaboration model
An Open-Source Package for Textual Adversarial Attack.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Python library for building highly effective data science workflows
Synthetic data generators for tabular and time-series data