Friday, August 31, 2012

Downloading Tweets Using Twitter's API & Python

In my last post I showed how to get a list of all your Twitter followers and followees (that's a word, right?) using python.  I have changed that code, renamed it, and have written a new tool called tweet_grab.py to download the tweets of any user found via the contact_grab.py tool.  The command takes an input that specifies how many days of tweets to download, with the start time being when the command is run.

Download tweet_grab.py

For example let's say I am trying to download the tweets from all my contacts over the last 7 days.  To do this I would use the following commands

contact_grab.py gptreb
tweet_grab.py gptreb 7

contact_grab.py creates a file with the details of all my twitter contacts.  tweet_grab.py uses this file to gather all the tweets over a period of time.  The command line output is a short summary of what has been returned, just to indicate the progress of the script.  The output is a JSON formatted file that contains all tweets for the time period specified.  I have only just started using JSON files and I am really starting to like them.  In the past I've struggled with XML files and trying to parse, traverse, and search them, but JSON in python seems so simple to use.  It also seems to be more compact.  For future projects I will probably choose JSON over XML, but I only ever really deal with small projects and it may be a whole different story on something more complex.

After listening to an episode of The Engineering Commons about software, I've had a bit of a go in this project to handle exceptions and improve my commenting.  Don't get me wrong, I've still got a long way to go, but for an electronics guy who hadn't seen python a year ago I'm not doing too bad.

My next step is to take the output file containing the tweets and do some further processing on it to derive some metrics.  I don't think the next step will be too difficult though, getting the data was the hard part.

Note If your reading this after March 2013 this script won't work.  From that point on Twitter requires all access to its API to be authenticated using OAuth, and this script doesn't support it.  There are ways to do it, but as this is just for a bit of fun really, I don't know if I want to invest the time to get authentication working.  I may revisit it in 6 months if I feel like it.

getting tweets from the linux command line
A sample of the command line


JSON file containing tweet information
A sample of the output JSON file

Monday, August 20, 2012

Review Your Followers Using the Twitter API and Python

Over time, the list of people that I follow on Twitter has slowly grown and it's at a point where I need to go through and clean house to improve the quality of my time-line.  I know some of the accounts are inactive and some are companies I followed back who have since unfollowed me.  To make this job a bit easier I decided to put together a tool that could show me my relation to users that I'm connected to, and the time that they last updated their status.  There's probably already something out there to do this but by doing it myself I get a chance to sharpen my Python skills.

twitter_scalpal code

Querying the Twitter API with Python
Sample output showing the least active users first


The tool is a simple Python script that takes a Twitter screen name of a target user as an input and then retrieves who they follow and who follows them.  For each user found it displays three things.

How the users are related to each other

          If the target user and the found user follow each other 'linked to' is displayed.
          If the target user only follows the found user 'following' is displayed
          If the target user is only followed by the found user 'followed by' is displayed.

The users screen name

The time that the user last updated their status

          The time of the last update in local time
          If the user is protected 'protected' is displayed
          If the user hasn't tweeted 'no tweets' is displayed


The script uses parts of the Twitter API that don't require authentication.  This means you can analyse any user that isn't protected but you will be unable to retrieve data on found users that are protected.  As this is for my own use I'm not too worried about implementing authentication as I only follow one protected user.  It shouldn't be too hard to adapt the script to use authentication.  The code is a little rough around the edges and may fail under certain circumstances, but generally it does the job.

To make the results easier to analyse they are sorted by the time that the user last updated their status, with least active accounts at the top.

As the twitter API is rate limited the script may take a while to complete.  A delay of 24 seconds is added after each request to prevent being blacklisted.  As a rough guess the script should take 1.25 seconds per follower and followee.

Querying the Twitter API with Python
The output showing most active, protected, and users without tweets


Querying the Twitter API with Python
The script acquiring data


# twitter_scalpal.py
#
# given a Twitter user name, display their
# followers and who they follow along with
# how they are related and the last time
# their status was updated

#! /usr/bin/env python
# -*- coding: utf-8 -*-


import sys
import argparse
import os
import shutil
import urllib2
import json
import math
import time
import email.utils
import datetime


#define constants
time_delay = 24    #time betweet requests to the twitter API
followerURL = "https://api.twitter.com" + \
              "/1/followers/ids.json?cursor=-1&screen_name="
friendURL = "https://api.twitter.com/1/friends/ids.json?cursor=-1&screen_name="
lookupURL = "https://api.twitter.com/1/users/lookup.json?user_id="
entityURL = "&include_entities=true"

#Create command line parser
parser = argparse.ArgumentParser(description='Analyse twitter users')
parser.add_argument('user', type=str, help='twitter user to analyse')
args = parser.parse_args()

userName = args.user


#delete previous data for user and and create a location for new data
shutil.rmtree(args.user,ignore_errors=True)
try:
    os.mkdir(args.user);
except OSError:
    print "System error"
    sys.exit(1)
    

#Get followers
print ""
print "getting followers"

try:
    followers_response = urllib2.urlopen((followerURL + userName)).read()
except urllib2.HTTPError, e:
    print "HTTP Error " + str(e.code) + ". Check if user exists"
    sys.exit(1)
except urllib2.URLError, e:
    print "URL Error " + str(e.args)
    sys.exit(1)

time.sleep(time_delay)


#Parse follower response and print follower count
followdat = json.loads(followers_response);
followers = followdat['ids']
print str(len(followers)) + " found"
print ""


#write followers to file
try:
    with open("./" + args.user + '/followers.json','wb') as f:
        f.write(followers_response)
except IOError:
    print "Couldn't write followers to file"
    sys.exit(1)


#Get friends
print "getting friends"

try:
    friends_response = urllib2.urlopen((friendURL + userName)).read()
except urllib2.HTTPError, e:
    print "HTTP Error " + str(e.code) + ". Check if user exists"
    sys.exit(1)
except urllib2.URLError, e:
    print "URL Error " + str(e.args)
    sys.exit(1)

time.sleep(time_delay)


#Parse follower response and print follower count
frienddat = json.loads(friends_response);
friends = frienddat['ids']
print str(len(friends)) + " found"
print ""


#write friends to file
try:
    with open("./" + args.user + '/friends.json','wb') as f:
        f.write(friends_response)
except IOError:
    print "Couldn't write followers to file"
    sys.exit(1)


#calculate number of unique contacts
contacts = list(set(followers + friends))
num_of_contacts = len(contacts)
print "calculating unique contacts"
print str(num_of_contacts) + " found"
print ""


#initialise variable
all_contacts = []


#calculate the number of contact requests to make (20 at a time)
num_of_contact_requests = (num_of_contacts-1+20)/20


#lookup user information
for i in range(num_of_contact_requests):


    #assemble a URL of 20 contacts to lookup in a single request 
    namerequest = ""
    namerequest = str(contacts[i*20])    #first contact to lookup
    for j in range(1,20):    #add remaining contacts.
        index = i*20 + j
        if(index < num_of_contacts):    #prevent out of bounds access
            namerequest = namerequest + "," + str(contacts[index])
    idurl = lookupURL + namerequest + entityURL


    #lookup contact information    
    print "getting contact details " + str(i+1)
    try:
        contacts_response = urllib2.urlopen((idurl)).read()
    except urllib2.HTTPError, e:
        print "HTTP Error " + str(e.code) + ". Check if user exists"
        sys.exit(1)
    except urllib2.URLError, e:
        print "URL Error " + str(e.args)
        sys.exit(1)

    time.sleep(time_delay)


    #parse contacts and add them to a list of all contacts
    contactdat = json.loads(contacts_response)
    for single_contact in contactdat:
        all_contacts.append(single_contact)


#collect and format data for display
output_list = []
for single_contact in all_contacts:


    #Assemble a string for the last status date.
    #Can be 'no tweets' or 'protected'
    is_contact_protected = single_contact['protected']
    status_count = single_contact['statuses_count']
    if ((is_contact_protected == False) and (status_count > 0)):


        #The following converts the UTC time to local time
        created_at = single_contact['status']['created_at']
        parsed_date = email.utils.parsedate_tz(created_at)
        date_string = datetime.datetime.fromtimestamp(
                      email.utils.mktime_tz(parsed_date)).strftime(
                      '%Y-%m-%d %H:%M:%S')

    elif (is_contact_protected == True):
        date_string = "protected"

    elif (single_contact['statuses_count'] == 0):
        date_string = "no tweets"


    #Create a space padded string of the contacts screen name
    contact_name = single_contact['screen_name']
    contact_name = contact_name + " " * (17-contact_name.__len__())


    #Create a string that describes the contacts relationship
    relationship_string = "following    " 
    if single_contact['id'] in followers:
        relationship_string = "followed by  "
        if single_contact['id'] in friends:
            relationship_string = "linked to    "


    #put the strings in a tuple and add it to a list
    contact_details = (date_string, relationship_string, contact_name)
    output_list.append(contact_details)

     
#sort the output list.  List sorted by first
#element of the tuple, the date string
output_list.sort()


#print the results
print ""
for contact_strings in output_list:
    print contact_strings[1] + contact_strings[2] + contact_strings[0]


#
#output = open("./" + args.user + '/contacts.json','wb')
#output.write(json.dumps(all_contacts))
#output.close()



#write contacts to file
try:
    with open("./" + args.user + '/contacts.json','wb') as f:
        f.write(json.dumps(all_contacts))
except IOError:
    print "Couldn't write contacts to file"
    sys.exit(1)

Thursday, August 9, 2012

Troubleshooting AVR Interrupt Timing

My efforts towards making a temperature data logger ran into a bit of a wall during the week and it took me a while to figure out why.  The problem was related to the button used to start and stop data logging.  Most of the time it would work but occasionally it just wouldn't register the button press.  My error was a bit subtle but blindingly obvious once spotted.  The problem, to quote Doc Brown, was that I wasn't thinking fourth dimensionally.

  1: //Problem code fragment
  2: 
  3: if (BIT_GET(flags, START_STOP_LOG) && !BIT_GET(driveStatus, STA_NOINIT)) {
  4:         if (BIT_GET(flags, LOGGING)) {
  5:                 BIT_CLEAR(flags, LOGGING);
  6:         } else {
  7:                 BIT_SET(flags, LOGGING);
  8:         }
  9: }
 10: BIT_CLEAR(flags, START_STOP_LOG);


Shown above is the code I was having trouble with.  It's part of the main function that handles the button press event.  The software design uses a timer based interrupt service routine (ISR) that's responsible for updating the SD card status and de-bouncing the input buttons.  It updates the SD card status every 10 ms and scans the buttons every 100 ms.  Once it de-bounces the button it sets a START_STOP_LOG flag to indicate that a button press has occurred.  The main routine handles the flag and either sets or clears a LOGGING flag. I try to keep the code in an ISR to a minimum and just leave it to handle time critical tasks and setting flags, all of the other stuff is then done in the main routine.  I should point out that this is only skeleton code to allow me to get the program flow right.  It doesn't actually do anything yet.

The problem with the way I was going about it was that if the ISR runs and detects a button press just after the first if statement is evaluated, nothing will happen.  This is because once the ISR is finished, the flag is then cleared immediately on line 10 to say that the event has been handled, when in reality it hasn't.  It's kind of like programming something with multiple threads, where variables can seemingly change on their own.  The fix however is pretty straight forward and is shown below.

  1: //Corrected code fragment
  2: 
  3: if (BIT_GET(flags, START_STOP_LOG)) {
  4:         if (!BIT_GET(driveStatus, STA_NOINIT)) {
  5:                 if (BIT_GET(flags, LOGGING)) {
  6:                         BIT_CLEAR(flags, LOGGING);
  7:                 } else {
  8:                         BIT_SET(flags, LOGGING);
  9:                 }
 10:         }
 11:         BIT_CLEAR(flags, START_STOP_LOG);
 12: }

I've moved clearing of the flag into the if statement so that it can only happen if the flag is set.  However the if statement is also responsible for handling the button press.  So if the drive is initialized it will start or stop the logging then clear the flag, if it isn't initialized it will just clear the flag, as you can't log anything if the drive isn't ready.  By doing things this way the ISR can't jump in and set the flag before it's cleared because the only way to get to that point on line 11 is if the flag is already set and the event has already been handled.  It now works a treat, registering every button press.

BIT_SET, BIT_GET, BIT_CLEAR are preprocessor macros to increase code readability.  While START_STOP_LOG, STA_NOINIT, and LOGGING are preprocessor defines.