Skip to content

Latest commit

 

History

History
19 lines (13 loc) · 1.01 KB

README.md

File metadata and controls

19 lines (13 loc) · 1.01 KB

README

The purpose of this set of the scripts is to:

  • Retrieve a set of the tweets from Twitter, given a query and target language
  • Detects the most-likely-intersting tweet based on the Naive-Bayes classifier

One still needs to create training data, which can (and should) be accomplished by manually classifying an already obtained tweets.

This git repo contains following files:

  • data.yaml: trained data is stored.
  • naive_bayes_retweet.py: given the data.yaml and a file with list of the tweets in question, it then takes the tweets with the highest naive-bayes score and retweets that tweet on the Twitter account, accositate to the API keys in twitter_conf.yaml.
  • requirements: required packages are listed.
  • search_tweets.py: Twitter search results would be returned given a query and target language.
  • train.py: based on the true and false sets of the tweets, it conducts n-gram tokenization and stores the results to data.yaml.
  • twitter_conf.yaml: Twitter-API keys (needed to run the script) are stored.