This project uses the free tier of GitHub Actions to run a webscraper. The YML files controling the scraping are located under .github/workflows. The parsed content is saved on rss_feeds_combined.txt and nyt.txt.
One of the YML files is pasted below as an example.
name: Use the standard Python library to parse RSS Feeds
on:
workflow_dispatch:
schedule:
- cron: '7 */12 * * *'
jobs:
scheduled:
runs-on: ubuntu-latest
steps:
- name: Check out this repo
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Commit and push if it changed
run: |
wget -O npr.xml https://feeds.npr.org/1001/rss.xml
wget -O arstechnica.xml https://feeds.arstechnica.com/arstechnica/index
wget -O wgrznews.xml https://www.wgrz.com/feeds/syndication/rss/news/local
python3 pythonstdlibraryrss.py
git config user.name "Automated"
git config user.email "[email protected]"
git add -A
timestamp=$(date -u)
git commit -m "Latest data: ${timestamp}" || exit 0
git push