Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated for stability #180

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
39 changes: 27 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,16 @@ prior to deletion. In fact you can actually turn off deletion all together and j
about Shreddit) but this will increase how long it takes the script to run as it will be going over all of your messages
every run.

## Important New Changes (as of Dec 2016)

Due to deprecation of the PRAW 3.x library, Shreddit is using PRAW 4. This requires that OAuth be used to authenticate.
Thankfully, however, it is much easier than in previous versions. If you are upgrading, [please review the usage section
to ensure that you have set up credentials correctly.](#configuring-credentials)
I added some changes. Namely, the command will run to completion (ie, it's fire and forget now). I also took the liberty
of making the "batch_limit" configurable in the "shreddit.yaml" file. However, that option is pretty much unnecessary because
I fixed the API limit error from breaking shreddit. Instead, the exception will be handled gracefully, and the process will
continue until all comments and posts have been considered for deletion.

## Pip Installation

I personally recommend using the manual instructions. This is a forked brank of shreddit containing fixes not in the
main branch. Your mileage will vary if you use pip to install it.

`pip install -U shreddit` will install the package and its dependencies, and it will add a `shreddit` command line
utility to your PATH. This is typically either run in a virtualenv or using administrative privileges for global
installation.
Expand All @@ -29,6 +31,9 @@ installation.
3. Run `python setup.py install` to install the package and the `shreddit` command line utility. This is typically
either run in a virtualenv or using administrative privileges for global installation.

Note: The original author limited some of the requirement versions for packages. I found most of those errors are
resolved running "pip install <package-name> --upgrade".

## Usage

After installing the `shreddit` command line utility, the first step is setting up the tool's configuration files.
Expand Down Expand Up @@ -57,7 +62,7 @@ client ID and secret, follow these steps (taken from
[PRAW documentation](http://praw.readthedocs.io/en/latest/getting_started/authentication.html#script-application)):

1. Open your Reddit application preferences by clicking [here](https://www.reddit.com/prefs/apps/).
2. Add a new application. It doesn't matter what it's named, but calling it "shreddit" makes it easier to remember.
2. Add a new application. It doesn't matter what it's named, but calling it "shreddit" makes it easier to remember. The button will probably say something about being a developer, don't worry, its fine.
3. Select "script".
4. Redirect URL does not matter for script applications, so enter something like http://127.0.0.1:8080
5. Once created, you should see the name of your application followed by 14 character string. Enter this 14 character
Expand Down Expand Up @@ -124,6 +129,10 @@ optional arguments:

## For Windows users

I highly recommend installing WSL and using the manual installation instructions.

Or (from the original author):

1. Make sure you have Python installed.
[Click here for the Python download page](https://www.python.org/downloads/).
- **Note:** Install either `python 2.x` or `python 3.x`, not both.
Expand All @@ -133,13 +142,19 @@ optional arguments:

## Caveats

- Certain limitations in the Reddit API and the PRAW library make it difficult to delete more than 1,000 comments.
While deleting >1000 comments is planned, it is necessary right now to rerun the program until they are all deleted.

- We are relying on Reddit admin words that they do not store edits, deleted posts are still stored in the database
they are merely inaccessible to the public.

## Donate
- Uses a plaintext configuration by default; it might be nice to add some command line parameters for the authentication.

- If you make changes to "shreddit.py", re-running "python setup.py install" will not update the `shreddit` command with
the new changes. You must directly replace the file in the python libraries.

- The original author has not updated their repository in about 7 years (since 2016), and many of the requirement version
from the packages in "requirements.txt" don't work. Most of these can be fixed by running
- `pip install <package> --upgrade`

- After July 1st, 2023; I have no idea if this will continue working without paying for API access (I think there is a
severely hampered "free" access to the API, so it might take longer to run after that date, or it might not at all)

A few people have asked about donating so here's a Bitcoin address, any amount is appreciated, spread amongst recent
contributors and if there's enough interest a web service may be on the horizon! `1PbeYK7FonjVmgWxf4ieKmvwtomZR1K1Qu`
- The subreddit whitelist option seems to be case sensitive. I'd like to change this later.
20 changes: 3 additions & 17 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
<<<<<<< HEAD
arrow==0.9.0
decorator==4.0.10
praw==4.2.0
PyYAML==3.12
requests==2.12.1
six==1.10.0
backports-abc==0.4
tornado==4.3
update-checker==0.15
wheel==0.24.0
appdirs==1.4.3
=======
appdirs==1.4.3
arrow==0.10.0
backports-abc==0.5
Expand All @@ -20,11 +7,10 @@ idna==2.5
praw==5.0.0
prawcore==0.11.0
python-dateutil==2.6.0
PyYAML==3.12
requests==2.18.1
PyYAML
requests
shreddit==6.0.7
six==1.10.0
tornado==4.5.1
update-checker==0.16
urllib3==1.21.1
>>>>>>> 772df35c68a6782b2bd733801458023545977166
urllib3
1 change: 1 addition & 0 deletions shreddit.yml.example
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ wordlist: []

# Batch cooldown
# This controls how long (in seconds) to wait between each set of 1000 deletions.
batch_size: 1000
batch_cooldown: 10

# vim: syntax=yaml ts=2
45 changes: 29 additions & 16 deletions shreddit/shredder.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,14 @@ class Shredder(object):
"""This class stores state for configuration, API objects, logging, etc. It exposes a shred() method that
application code can call to start it.
"""
_batch_size=1000

def __init__(self, config, user):
logging.basicConfig()
self._logger = logging.getLogger("shreddit")
self._logger.setLevel(level=logging.DEBUG if config.get("verbose", True) else logging.INFO)
self.__dict__.update({"_{}".format(k): config[k] for k in config})

self._batch_size = config.get("batch_size", 1000)
self._user = user
self._connect()

Expand Down Expand Up @@ -67,7 +69,7 @@ def __init__(self, config, user):
def shred(self):
deleted = self._remove_things(self._build_iterator())
self._logger.info("Finished deleting {} items. ".format(deleted))
if deleted >= 1000:
if deleted >= self._batch_size:
# This user has more than 1000 items to handle, which angers the gods of the Reddit API. So chill for a
# while and do it again.
self._logger.info("Waiting {} seconds and continuing...".format(self._batch_cooldown))
Expand Down Expand Up @@ -128,20 +130,31 @@ def _remove_comment(self, comment):
comment.edit(replacement_text)

def _remove(self, item):
if self._keep_a_copy and self._save_directory:
self._save_item(item)
if not self._trial_run:
if self._clear_vote:
try:
item.clear_vote()
except BadRequest:
self._logger.debug("Couldn't clear vote on {item}".format(item=item))
if isinstance(item, Submission):
self._remove_submission(item)
elif isinstance(item, Comment):
self._remove_comment(item)
if not self._trial_run:
item.delete()
while 1:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while 1:
MAX_RETRIES_COUNT = 5
for retries_count in range(MAX_RETRIES_COUNT):

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering how long the script takes to run, limiting max retries is not ideal. Add an option to run indefinitely if you're really trying to change it. I understand the distaste for while 1 but the real experience is that it will not run to completion if you have a lot of posts because of the rate limiting.

try:
if self._keep_a_copy and self._save_directory:
self._save_item(item)
if not self._trial_run:
if self._clear_vote:
try:
item.clear_vote()
except BadRequest:
self._logger.debug(f"Couldn't clear vote on {item}")
if isinstance(item, Submission):
self._remove_submission(item)
elif isinstance(item, Comment):
self._remove_comment(item)
if not self._trial_run:
item.delete()
break
except BadRequest as e:
self._logger.debug(
'''Encountered a problem with the API,
probably ratelimiting thanks to bad admins'''
)
self._logger.error(f"Exception: {e}")
self._logger.info(f"Waiting {self._batch_cooldown} seconds")
time.sleep(self._batch_cooldown)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
time.sleep(self._batch_cooldown)
time.sleep(self._batch_cooldown)
else:
self.__logger.error(
"Max retry attempts reached. Unable to complete the operation"
)


def _remove_things(self, items):
self._logger.info("Loading items to delete...")
Expand Down