pyallris - an ALLRIS-Scraper in python

This scraper is work in progress and builds on the information found in this wiki document: ScrapingAllrisHowto

It's based on the XML output which needs to be enabled for it to work. Unfortunately not everything is accessible via the XML output.

Installation / Requirements

The best way to make it work is to install virtualenv and create a virtual environment for the scraper to run in:

mkdir scraper
cd scraper
virtualenv .
source bin/activate

Then clone the repo:

git clone https://github.com/mrtopf/pyallris.git

and develop it:

cd pyallris
python setup.py develop

This will install all requirements as well.

After that you can look into sitzungen.py to check how it's supposed to work. There are also some experiments in the experiments/ folder.

Ubuntu 12.04 installation

Install python 2.7 and some libs:

sudo apt-get install python2.7-dev
sudo apt-get install libxml2-dev libxslt-dev

Install the mongo dbms:

sudo apt-get install mongodb

Install pip for the virtualenv:

sudo apt-get install python-pip
sudo pip-2.7 install virtualenv

Use VirtualEnvironment: Navigate to a folder where the project root folder will be and run:

virtualenv-2.7 .
source bin/activate

Clone the git project as a subdirectory of the virtualenv folder:

git clone https://github.com/mrtopf/pyallris.git

Initialize the bootstrap git submodule:

git submodule init
git submodule update

Initialize:

python setup.py develop

Now you can run the scrapers with:

python sitzungen.py 
python meetings.py 
python persons.py

Notice

Please note that the URLs in use are right now hard coded for the ALLRIS in Aachen. This might change soon though.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
bootstrap @ 949c7cc		bootstrap @ 949c7cc
osm-import		osm-import
pyallris		pyallris
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pyallris - an ALLRIS-Scraper in python

Installation / Requirements

Ubuntu 12.04 installation

Notice

About

Releases

Packages

Contributors 2

Languages

comlounge/pyallris

Folders and files

Latest commit

History

Repository files navigation

pyallris - an ALLRIS-Scraper in python

Installation / Requirements

Ubuntu 12.04 installation

Notice

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages