Monday, March 21, 2011

What Size Is A European Sham?

Calendar Pyrkon 2011


Calendar



Code This code Helped me to automate most of the work. Last year I used GData Directly, but this year I got so frustrated I Decided That I Need Something Simpler. The tradeoff is That the events are not Described in details as they used to be. This year I'm not making a mistake of not keeping the code, so here it is.There are basically three steps:
  
Scrape the website to gather information

Build a shell command to call googlecl - google command line interface that provides some ability to use Google services from command line.
  
Run the command in the terminal. Keep retrying if it failed for any reason. There were some issues:

Sometimes starting times where off by an hour or two. For example by 1 hour on Friday, 1 hour on Saturday, 2 hours on Sunday. WTF? I noticed it happened on calendars that have numbers in their names, but most of them do, so I might be just imagining it.

#!/usr/bin/python
    # -*- coding:utf8 -*-
  1. from BeautifulSoup import BeautifulSoup
  2. import urllib2
  3. import re
  4. import subprocess
# Download the page
    # You may want to save the page in the browser and use a local copy
  • # for example: 'file:///home/daniel/Pobrane/pyrkon.html'
  • page = urllib2.urlopen('http://www.pyrkon.pl/2011/index.php?go2=program')
  • soup = BeautifulSoup(page)
  • # Find div with the content
  • content = soup.find('div', id='content')
# Get all his children which are divs too
 divs = content.findAll('div') 

# Set starting index in case you wanted to start in the middle after some interruption
start_from = 0
i = 0
l = len(divs) - start_from

for div in divs[start_from:]:
# Name and lecturer are easy
tytul = div.contents[1].b.string
prowadzacy = div.contents[1].i.string
# I can never understand when I need to decode/encode from/to utf-8.
# This was done by trial and error.

# Madafaking new lines are contents too, so
# div.contents[2] == U '\\ n'

# Place
place = re.search ('^ \u0026lt;b> place: \u0026lt;/ b> (? P \u0026lt;miejsce> .+?)\u0026lt; br />' div . contents [3]. RenderContents (), re.M). group ('position') = place
miejsce.decode ('utf-8')

# show some progress information and

+ = 1 print '[ % d /% d]% s:% s '% (i, l, place, title)

# Event start time
time = re.search (' ^ \u0026lt;b> date: \u0026lt;/ b> (? P \u0026lt;day> Fri div.contents [3]. RenderContents (), re.M)
day, hour, minute = czas.group ('day', 'hour', 'minutes')
hours = int (hour)
minutes = int ( minutes)

# Conversion from name of the day is a number of the day if the day
== 'Fri':

day = 25 elif day == 'Sat':

day = 26 elif day == 'nd' :

day = 27 else: raise ValueError
('Bad day')

# How Long it Lasts
length = re.search ('^ \u0026lt;b> duration: \u0026lt;/ b> (? P \u0026lt;hours> ; \\ d +):(? \u0026lt;minut> P \\ d {2}) h \u0026lt;br /> ', div.contents [3]. RenderContents (), re.M)
hours, minutes = dlugosc.group (' hour ',' minutes')
hours = int (hours) minutes =
int (minutes)

# Build command shell for googlecl - google command line interface (available at code.google.com)
# uses the "Quick Add" command syntax
='''google calendar add - inch ='% s ''% s -% s on% d/03/2011% d:% 02d minutes for% d in% s'''% (place, title, leading, day, hour, minute, hour * 60 + minutes place)

# Keep calling the shell command until it Succeeds
# Sometimes it throws gdata.service.RequestError with 302 status and reason 'Redirect received, but redirects_remaining \u0026lt;= 0'
return_code = 1
while return_code! = 0: print command

return_code = subprocess.call (command, shell = True)



0 comments:

Post a Comment