Document scraper for getting invoices automagically as pdf (useful for taxes or DMS)
🏠 Homepage
- npm >=9.1.2
- node >=18.12.1
All settings can be changed via CLI
, env variable (even when using docker).
Setting | Description | Default value |
---|---|---|
AMAZON_USERNAME | Your Amazon username | null |
AMAZON_PASSWORD | Your amazon password | null |
AMAZON_TLD | Amazon top level domain | de |
AMAZON_YEAR_FILTER | Only extracts invoices from this year (i.e. 2023) | 2023 |
AMAZON_PAGE_FILTER | Only extracts invoices from this page (i.e. 2) | null |
ONLY_NEW | Tracks already scraped documents and starts a new run at the last scraped one | true |
FILE_DESTINATION_FOLDER | Destination path for all scraped documents | ./documents/ |
FILE_FALLBACK_EXTENSION | Fallback extension when no extension can be determined | .pdf |
DEBUG | Debug flag (sets the log level to DEBUG) | false |
SUBFOLDER_FOR_PAGES | Creates sub folders for every scraped page/plugin | false |
LOG_PATH | Sets the log path | ./logs/ |
LOG_LEVEL | Log level (see https://github.com/winstonjs/winston#logging-levels) | info |
RECURRING | Flag for executing the script periodically. Needs 'RECURRING_PATTERN' to be set. Default true when using docker container |
false |
RECURRING_PATTERN | Cron pattern to execute periodically. Needs RECURRING to true | */30 * * * * |
TZ | Timezone used for docker environments | Europe/Berlin |
⚠️ Attention: There is no need to install this locally. Just usenpx
🔨 Make sure you have an
.env
file present (with the variables from above) in the work directory or use the appropriate cli arguments.
🚑 If you want to use an
.env
file, make sure you useenv-cmd
(https://www.npmjs.com/package/env-cmd)
$ npx docudigger COMMAND
running command...
$ npx docudigger (--version)
@disane-dev/docudigger/2.0.2 linux-x64 node-v18.16.1
$ npx docudigger --help [COMMAND]
USAGE
$ docudigger COMMAND
Scrapes all websites periodically (default for docker environment)
USAGE
$ npx docudigger scrape all [--json] [--logLevel trace|debug|info|warn|error] [-d] [-l <value>] [-c <value> -r]
FLAGS
-c, --recurringCron=<value> [default: * * * * *] Cron pattern to execute periodically
-d, --debug
-l, --logPath=<value> [default: ./logs/] Log path
-r, --recurring
--logLevel=<option> [default: info] Specify level for logging.
<options: trace|debug|info|warn|error>
GLOBAL FLAGS
--json Format output as json.
DESCRIPTION
Scrapes all websites periodically
EXAMPLES
$ docudigger scrape all
Used to get invoices from amazon
USAGE
$ npx docudigger scrape amazon -u <value> -p <value> [--json] [--logLevel trace|debug|info|warn|error] [-d] [-l
<value>] [-c <value> -r] [--fileDestinationFolder <value>] [--fileFallbackExentension <value>] [-t <value>]
[--yearFilter <value>] [--pageFilter <value>] [--onlyNew]
FLAGS
-c, --recurringCron=<value> [default: * * * * *] Cron pattern to execute periodically
-d, --debug
-l, --logPath=<value> [default: ./logs/] Log path
-p, --password=<value> (required) Password
-r, --recurring
-t, --tld=<value> [default: de] Amazon top level domain
-u, --username=<value> (required) Username
--fileDestinationFolder=<value> [default: ./data/] Amazon top level domain
--fileFallbackExentension=<value> [default: .pdf] Amazon top level domain
--logLevel=<option> [default: info] Specify level for logging.
<options: trace|debug|info|warn|error>
--onlyNew Gets only new invoices
--pageFilter=<value> Filters a page
--yearFilter=<value> Filters a year
GLOBAL FLAGS
--json Format output as json.
DESCRIPTION
Used to get invoices from amazon
Scrapes amazon invoices
EXAMPLES
$ docudigger scrape amazon
docker run \
-e AMAZON_USERNAME='[YOUR MAIL]' \
-e AMAZON_PASSWORD='[YOUR PW]' \
-e AMAZON_TLD='de' \
-e AMAZON_YEAR_FILTER='2020' \
-e AMAZON_PAGE_FILTER='1' \
-e LOG_LEVEL='info' \
-v "C:/temp/docudigger/:/home/node/docudigger" \
ghcr.io/disane87/docudigger
npm install
[Change created .env for your needs]
npm run start
👤 Marco Franke
- Website: http://byte-style.de
- Github: @Disane87
- LinkedIn: @marco-franke-799399136
Contributions, issues and feature requests are welcome!
Feel free to check issues page. You can also take a look at the contributing guide.
Give a ⭐️ if this project helped you!
This README was generated with ❤️ by readme-md-generator