Skip to content

jeffthomasweb/GitHubActionsWebScraping

Repository files navigation

This project uses the free tier of GitHub Actions to run a webscraper. The YML files controling the scraping are located under .github/workflows. The parsed content is saved on rss_feeds_combined.txt and nyt.txt.

One of the YML files is pasted below as an example.

name: Use the standard Python library to parse RSS Feeds

on:
  workflow_dispatch:
  schedule:
    - cron:  '7 */12 * * *'
jobs:
  scheduled:
    runs-on: ubuntu-latest
    steps:
    - name: Check out this repo
      uses: actions/checkout@v4
      with:
        fetch-depth: 0
    - name: Commit and push if it changed
      run: |
        wget -O npr.xml https://feeds.npr.org/1001/rss.xml
        wget -O arstechnica.xml https://feeds.arstechnica.com/arstechnica/index
        wget -O wgrznews.xml https://www.wgrz.com/feeds/syndication/rss/news/local
        python3 pythonstdlibraryrss.py
        git config user.name "Automated"
        git config user.email "[email protected]"
        git add -A
        timestamp=$(date -u)
        git commit -m "Latest data: ${timestamp}" || exit 0
        git push

About

GitHubActionsWebScraping

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages