Skip to content
View andrewwxy's full-sized avatar

Block or report andrewwxy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

DSci@cli

116 repositories

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

Rust 31,413 2,046 Updated Jan 17, 2025

Open-source scientific and technical publishing system built on Pandoc.

JavaScript 4,100 332 Updated Jan 17, 2025

An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.

Python 1,570 177 Updated Sep 9, 2024

Search and browse documents and data; find the people and companies you look for.

JavaScript 4 Updated Jun 12, 2023

All files of thecleverprogrammer.com

Jupyter Notebook 109 180 Updated May 19, 2024

A curated list of Polars talks, tools, examples & articles. Contributions welcome !

799 28 Updated Jan 10, 2025

Data validation using Python type hints

Python 22,051 1,972 Updated Jan 17, 2025

🎯 Personal data science and machine learning toolbox

Python 364 76 Updated Feb 4, 2020

Breaking Into Data Handbook

339 47 Updated Jun 29, 2024

Free Data Engineering course!

Jupyter Notebook 27,784 5,820 Updated Jan 17, 2025

qpdf: A content-preserving PDF document transformer

C++ 3,649 287 Updated Jan 5, 2025

🌊 Online machine learning in Python

Python 5,165 554 Updated Dec 6, 2024

Financial datasets for LLMs πŸ§ͺ

Python 293 40 Updated May 27, 2024

Simple VTXXX-compatible linux terminal emulator

Python 668 106 Updated Dec 11, 2024

Convert PDF to markdown + JSON quickly with high accuracy

Python 19,367 1,153 Updated Jan 17, 2025

OCR, layout analysis, reading order, table recognition in 90+ languages

Python 15,510 1,001 Updated Jan 16, 2025

ACLED v5 (1997-2014) Conflict Dataset (http://www.acleddata.com/data/version-5-data-1997-2014) Visualization

JavaScript 1 Updated Apr 12, 2023

An open source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI artifact.

Go 609 62 Updated Jan 16, 2025

πŸš€ πŸ’Έ Easily build, backtest and deploy your algo in just a few lines of code. Trade stocks, cryptos, and forex across exchanges w/ one package.

Python 2,201 278 Updated Dec 30, 2024

Introduction to the Command Line for Genomics

65 189 Updated Jan 9, 2025

Data Cleaning with OpenRefine for Ecologists

26 112 Updated Jan 14, 2025

OpenRefine for Social Science Data

23 46 Updated Jan 14, 2025

Label, clean and enrich text datasets with LLMs.

Python 2,139 152 Updated Jan 16, 2025

TerminusDB is a distributed database with a collaboration model

Prolog 2,818 111 Updated Nov 4, 2024

An Open-Source Package for Textual Adversarial Attack.

Python 703 127 Updated Jul 20, 2023

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Python 38,322 14,542 Updated Jan 17, 2025

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 2,150 163 Updated Jan 9, 2025

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

Jupyter Notebook 1,979 133 Updated Jan 13, 2025

Python library for building highly effective data science workflows

Python 949 74 Updated Jul 20, 2023

Synthetic data generators for tabular and time-series data

Jupyter Notebook 1,479 243 Updated Dec 10, 2024