GitHub - jeffthomasweb/GitHubActionsWebScraping: GitHubActionsWebScraping

This project uses the free tier of GitHub Actions to run a webscraper. The YML files controling the scraping are located under .github/workflows. The parsed content is saved on rss_feeds_combined.txt and nyt.txt.

One of the YML files is pasted below as an example.

name: Use the standard Python library to parse RSS Feeds

on:
  workflow_dispatch:
  schedule:
    - cron:  '7 */12 * * *'
jobs:
  scheduled:
    runs-on: ubuntu-latest
    steps:
    - name: Check out this repo
      uses: actions/checkout@v4
      with:
        fetch-depth: 0
    - name: Commit and push if it changed
      run: |
        wget -O npr.xml https://feeds.npr.org/1001/rss.xml
        wget -O arstechnica.xml https://feeds.arstechnica.com/arstechnica/index
        wget -O wgrznews.xml https://www.wgrz.com/feeds/syndication/rss/news/local
        python3 pythonstdlibraryrss.py
        git config user.name "Automated"
        git config user.email "[email protected]"
        git add -A
        timestamp=$(date -u)
        git commit -m "Latest data: ${timestamp}" || exit 0
        git push

Name		Name	Last commit message	Last commit date
Latest commit History 1,880 Commits
.github/workflows		.github/workflows
LICENSE		LICENSE
README.md		README.md
arstechnica.xml		arstechnica.xml
npr.xml		npr.xml
nyt.txt		nyt.txt
output.txt		output.txt
pythonstdlibraryrss.py		pythonstdlibraryrss.py
requirements.txt		requirements.txt
rss_feeds_combined.txt		rss_feeds_combined.txt
scrape.py		scrape.py
wgrznews.xml		wgrznews.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

jeffthomasweb/GitHubActionsWebScraping

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages