signalhunter / juicer Public

Notifications You must be signed in to change notification settings
Fork 0
Star 2

ripgrep but for gzip-compressed files over http

2 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
scan.go		scan.go
utils.go		utils.go

Repository files navigation

Juicer

It's ripgrep but for Gzip-compressed files over HTTP!

This tool was primarily designed to scan thru the Common Crawl dataset for URLs without spending a fortune on AWS.

Features:

Extremely fast regex engine (Intel Hyperscan)
Scan thru terabytes of data without writing them to disk
Concurrent scanning of multiple files

TODO:

Client/server for handing out scanning tasks
Zstandard support? (for IA WARCs)

About

ripgrep but for gzip-compressed files over http

Report repository

Releases 4

v1.3.0-youtube Latest

Packages

No packages published

Languages

Go 100.0%