Monday, August 20, 2012

Review Your Followers Using the Twitter API and Python

Over time, the list of people that I follow on Twitter has slowly grown and it's at a point where I need to go through and clean house to improve the quality of my time-line.  I know some of the accounts are inactive and some are companies I followed back who have since unfollowed me.  To make this job a bit easier I decided to put together a tool that could show me my relation to users that I'm connected to, and the time that they last updated their status.  There's probably already something out there to do this but by doing it myself I get a chance to sharpen my Python skills.

twitter_scalpal code

Querying the Twitter API with Python
Sample output showing the least active users first


The tool is a simple Python script that takes a Twitter screen name of a target user as an input and then retrieves who they follow and who follows them.  For each user found it displays three things.

How the users are related to each other

          If the target user and the found user follow each other 'linked to' is displayed.
          If the target user only follows the found user 'following' is displayed
          If the target user is only followed by the found user 'followed by' is displayed.

The users screen name

The time that the user last updated their status

          The time of the last update in local time
          If the user is protected 'protected' is displayed
          If the user hasn't tweeted 'no tweets' is displayed


The script uses parts of the Twitter API that don't require authentication.  This means you can analyse any user that isn't protected but you will be unable to retrieve data on found users that are protected.  As this is for my own use I'm not too worried about implementing authentication as I only follow one protected user.  It shouldn't be too hard to adapt the script to use authentication.  The code is a little rough around the edges and may fail under certain circumstances, but generally it does the job.

To make the results easier to analyse they are sorted by the time that the user last updated their status, with least active accounts at the top.

As the twitter API is rate limited the script may take a while to complete.  A delay of 24 seconds is added after each request to prevent being blacklisted.  As a rough guess the script should take 1.25 seconds per follower and followee.

Querying the Twitter API with Python
The output showing most active, protected, and users without tweets


Querying the Twitter API with Python
The script acquiring data


# twitter_scalpal.py
#
# given a Twitter user name, display their
# followers and who they follow along with
# how they are related and the last time
# their status was updated

#! /usr/bin/env python
# -*- coding: utf-8 -*-


import sys
import argparse
import os
import shutil
import urllib2
import json
import math
import time
import email.utils
import datetime


#define constants
time_delay = 24    #time betweet requests to the twitter API
followerURL = "https://api.twitter.com" + \
              "/1/followers/ids.json?cursor=-1&screen_name="
friendURL = "https://api.twitter.com/1/friends/ids.json?cursor=-1&screen_name="
lookupURL = "https://api.twitter.com/1/users/lookup.json?user_id="
entityURL = "&include_entities=true"

#Create command line parser
parser = argparse.ArgumentParser(description='Analyse twitter users')
parser.add_argument('user', type=str, help='twitter user to analyse')
args = parser.parse_args()

userName = args.user


#delete previous data for user and and create a location for new data
shutil.rmtree(args.user,ignore_errors=True)
try:
    os.mkdir(args.user);
except OSError:
    print "System error"
    sys.exit(1)
    

#Get followers
print ""
print "getting followers"

try:
    followers_response = urllib2.urlopen((followerURL + userName)).read()
except urllib2.HTTPError, e:
    print "HTTP Error " + str(e.code) + ". Check if user exists"
    sys.exit(1)
except urllib2.URLError, e:
    print "URL Error " + str(e.args)
    sys.exit(1)

time.sleep(time_delay)


#Parse follower response and print follower count
followdat = json.loads(followers_response);
followers = followdat['ids']
print str(len(followers)) + " found"
print ""


#write followers to file
try:
    with open("./" + args.user + '/followers.json','wb') as f:
        f.write(followers_response)
except IOError:
    print "Couldn't write followers to file"
    sys.exit(1)


#Get friends
print "getting friends"

try:
    friends_response = urllib2.urlopen((friendURL + userName)).read()
except urllib2.HTTPError, e:
    print "HTTP Error " + str(e.code) + ". Check if user exists"
    sys.exit(1)
except urllib2.URLError, e:
    print "URL Error " + str(e.args)
    sys.exit(1)

time.sleep(time_delay)


#Parse follower response and print follower count
frienddat = json.loads(friends_response);
friends = frienddat['ids']
print str(len(friends)) + " found"
print ""


#write friends to file
try:
    with open("./" + args.user + '/friends.json','wb') as f:
        f.write(friends_response)
except IOError:
    print "Couldn't write followers to file"
    sys.exit(1)


#calculate number of unique contacts
contacts = list(set(followers + friends))
num_of_contacts = len(contacts)
print "calculating unique contacts"
print str(num_of_contacts) + " found"
print ""


#initialise variable
all_contacts = []


#calculate the number of contact requests to make (20 at a time)
num_of_contact_requests = (num_of_contacts-1+20)/20


#lookup user information
for i in range(num_of_contact_requests):


    #assemble a URL of 20 contacts to lookup in a single request 
    namerequest = ""
    namerequest = str(contacts[i*20])    #first contact to lookup
    for j in range(1,20):    #add remaining contacts.
        index = i*20 + j
        if(index < num_of_contacts):    #prevent out of bounds access
            namerequest = namerequest + "," + str(contacts[index])
    idurl = lookupURL + namerequest + entityURL


    #lookup contact information    
    print "getting contact details " + str(i+1)
    try:
        contacts_response = urllib2.urlopen((idurl)).read()
    except urllib2.HTTPError, e:
        print "HTTP Error " + str(e.code) + ". Check if user exists"
        sys.exit(1)
    except urllib2.URLError, e:
        print "URL Error " + str(e.args)
        sys.exit(1)

    time.sleep(time_delay)


    #parse contacts and add them to a list of all contacts
    contactdat = json.loads(contacts_response)
    for single_contact in contactdat:
        all_contacts.append(single_contact)


#collect and format data for display
output_list = []
for single_contact in all_contacts:


    #Assemble a string for the last status date.
    #Can be 'no tweets' or 'protected'
    is_contact_protected = single_contact['protected']
    status_count = single_contact['statuses_count']
    if ((is_contact_protected == False) and (status_count > 0)):


        #The following converts the UTC time to local time
        created_at = single_contact['status']['created_at']
        parsed_date = email.utils.parsedate_tz(created_at)
        date_string = datetime.datetime.fromtimestamp(
                      email.utils.mktime_tz(parsed_date)).strftime(
                      '%Y-%m-%d %H:%M:%S')

    elif (is_contact_protected == True):
        date_string = "protected"

    elif (single_contact['statuses_count'] == 0):
        date_string = "no tweets"


    #Create a space padded string of the contacts screen name
    contact_name = single_contact['screen_name']
    contact_name = contact_name + " " * (17-contact_name.__len__())


    #Create a string that describes the contacts relationship
    relationship_string = "following    " 
    if single_contact['id'] in followers:
        relationship_string = "followed by  "
        if single_contact['id'] in friends:
            relationship_string = "linked to    "


    #put the strings in a tuple and add it to a list
    contact_details = (date_string, relationship_string, contact_name)
    output_list.append(contact_details)

     
#sort the output list.  List sorted by first
#element of the tuple, the date string
output_list.sort()


#print the results
print ""
for contact_strings in output_list:
    print contact_strings[1] + contact_strings[2] + contact_strings[0]


#
#output = open("./" + args.user + '/contacts.json','wb')
#output.write(json.dumps(all_contacts))
#output.close()



#write contacts to file
try:
    with open("./" + args.user + '/contacts.json','wb') as f:
        f.write(json.dumps(all_contacts))
except IOError:
    print "Couldn't write contacts to file"
    sys.exit(1)

2 comments:

  1. When I use this with python.exe the terminal opens for a moment and closes almost immediately (too quick for me to read the info)and nothing else happens so it is not working for me. I'll try and check this out...

    ReplyDelete
    Replies
    1. I'll assume you're using Windows from the python.exe bit. I've had this trouble before too. I think if you run the program from a command line window that is already open you should be fine.

      I think however you may run into trouble with this as Twitter have since changed how you can connect to them. Their API requires you to authenticate and I didn't include this feature in the code. I was just having a bit of fun and I think this would have been a lot of effort for not much reward.

      Delete