How to do well at Google Code Jam: Python Code Jam Wrapper

In the later rounds of the code jam, I like to have all of the pesky file input/output stuff sorted, so I can just concentrate on solving the problems. Here I will share the current state of my python wrapper for Code Jam.

It provides the following functionality:

By command line arguments (-t [default], -s <number> or -l) specify that I want to use file A-test.in, A-small.attempt0.in or A-large.in
By command line arguments (-r [default] or -v) specify that I want to run my code on the .in file, creating a similar file called .out, or I want to check that my code would turn the .in file to the .out file that already exists.
I can specify the command line argument -m <number>, and it will automatically do multiprocessing with that number of processes, handing each case on a single process.

My general use case will be as follows:

Copy the sample in and out data into my directory (as A-test.in and A-test.out)
Open a python file I have already created containing
def reader(inFile):
return 0

def solver(case):
return 0

if __name__ == "__main__":

from GCJ import GCJ

GCJ(reader, solver, "/Users/lpebody/gcj/setup/", "a").run()
Write the reader code and the solver code.
Run the script to test on the test data (using python a.py -v -t)
If this shows issues in my method, goto 3.
Download small data (as a-small-attempt0.in)
Run the script on the small data (using python a.py -s 0)
Assuming no issues, upload the output.
If issues arose (exceptions, took too long, output was wrong) return to 3 (incrementing the number in 7 when necessary). Otherwise revel in my new found points!
(Think for a while about whether my code is good enough to work on the large data set) - optional.
If I think it is already good enough, jump forward to step 14.
Rewrite the slow bits of code
Check that it is working on the test data (using python a.py -v -t) and the small data (using python a.py -v -s <number>). If it isn't, goto 12.
Once I am sure that it will work, think about whether I need multiprocessing.
Download large data (as a-large.in)
Run the script on the large data (using python a.py -l)
Upload the output
(Worry for less than two and half hours on whether the output is correct.) - optional

Note: the checking currently does exact string checking, which is likely to fail for (e.g.) floats. I do intend at some point to alter this code to check floats up to a specified tolerance.

In order to use the wrapper, you need to provide two functions, a reader and a solver.

The reader takes a single parameter, called inFile, and should return a data structure containing all of the information needed to solve the next case in the data set. inFile is a wrapper around python's own file class that currently provides the following methods:

getInt() - read a line containing one integer, and return the integer
getInts() - read a line containing whitespace separated integers, and return the integers
getFloat() and getFloats() - the same as getInt() and getInts() but with floats
getWords() - return the result of running python's split() on the next line of text
readline() - return the next line of text

For example, for the problem Diamond Inheritance from 2012 Round 1C, the reader function could read:

def reader(inFile):

return [[z - 1 for z in inFile.getInts()[1:]] for k in xrange(inFile.getInts())]

The solver is fed in the data structure that the reader created. It should return the solution. The output file will just consist of Case numbers, and whatever the solver returns turned into a string. So again, for Diamond Inheritance, the solver function could read:

def solver(dependents):

N = len(dependents)

for i in xrange(N):

dep = [False] * N

q = [i]

while (len(q)):

for j in dependents[q.pop()]:

if (dep[j]):

return "Yes"

dep[j] = True

q.append(j)

return "No"

Then, finally, to wrap it all together,

if __name__ == "__main__":

from GCJ import GCJ

GCJ(reader, solver, "/Users/lpebody/gcj/setup/", "a").run()

The third and fourth directories are whatever directory your input and output data should be in, and the letter (A, B, C, D, E or F) describing the problem within the contest.

Without any further ado, here is my wrapper code.

from getopt import getopt, GetoptError

from multiprocessing import Pool, Queue, Manager

from os import getpid

from sys import argv, stderr

from time import strftime

class FileWrapper:

def __init__(self, file):

self.file = file

def getInt(self):

return int(self.file.readline())

def getInts(self):

return [int(z) for z in self.file.readline().split()]

def getFloat(self):

return float(self.file.readline())

def getFloats(self):

return [float(z) for z in self.file.readline().split()]

def getWords(self):

return self.file.readline().split()

def readline(self):

return self.file.readline().strip()

def close(self):

self.file.close

class GCJ:

"""

Wrapper for a lot of functionality that is useful when trying to solve a

Google Code jam question.

For the case of Problem A of the 2008 Qualification Round, example code

would be:

def parse(inFile):

N = int(inFile.readline())

searchEngines = [inFile.readline().strip() for z in xrange(N)]

Q = int(inFile.readline())

queries = [inFile.readline().strip() for z in xrange(Q)]

return [searchEngines, queries]

def solve([searchEngines, queries]):

N = len(searchEngines)

dict = {searchEngines[i]: i for i in xrange(N)}

qs = [dict[k] for k in queries]

used = [False] * N

numberused = 0

resets = 0

for k in qs:

if not used[k]:

if numberused == N - 1:

resets += 1

numberused = 0

used = [False] * N

used[k] = True

numberused += 1

return str(resets)

if __name__ == "__main__":

from GCJ import GCJ

GCJ(parse, solve, "c:\\temp", "a").run()

This would give a script with command line options -t, -s, -l, -r, -v, -m:

-t (default)

input file = "c:\\temp\\a-test.in"

output file = "c:\\temp\\a-test.out"

error file = "c:\\temp\\a-test.err"

(the directory "c:\\temp" and the character "a" at the start of the

filenames come from arguments to GCJ in the source code)

-s <number>

input file = "c:\\temp\\a-small-attempt<number>.in"

output file = "c:\\temp\\a-small-attempt<number>.out"

error file = "c:\\temp\\a-small-attempt<number>.err"

-l

input file = "c:\\temp\\a-large.in"

output file = "c:\\temp\\a-large.out"

error file = "c:\\temp\\a-large.err"

-r (default)

read the input file. Read the first line to get the number of cases N

included in the input. Run parse(infile) N times to get N case objects,

and then run solve(case object) on each of them. Output the strings

returned from solve, prepended with "Case #<case #>: ". Put output in

output file, overwriting it if it exists.

-v

Do the same as above, but check if the contents of the output file now

are as they would be otherwise. If it is, say "OK". Otherwise, output all

differences, and write the output into the error file, overwriting it if

it exists.

-m <number>

Set up a pool of <number> worker threads, and put all of the problems

(along with their problem numbers) on a queue. Each thread, when finished

with a problem, will take the next case from the queue.

"""

def __init__(self, reader, solver, directory, question):

self.reader = reader

self.solver = solver

self.question = question

print >> stderr, "GCJ Wrapper initiated."

print >> stderr, "Reading command line arguments"

try:

opts, args = getopt(argv[1:],

"vs:tlm:",

["validate", "small", "test", "large", "multi"])

except GetoptError, err:

print str(err)

exit(2)

datatype = "test"

self.job = "run"

self.multi = 1

for o,a in opts:

if o in ("-s", "--small"):

datatype = "small-attempt" + str(int(a))

elif o in ("-l", "--large"):

datatype = "large"

elif o in ("-v", "--validate"):

self.job = "validate"

elif o in ("-m", "--multi"):

self.multi = int(a)

filepref = directory + "/" + question + "-" + datatype

self.infile = filepref + ".in"

self.outfile = filepref + ".out"

self.errfile = filepref + ".err"

if (self.job == "run"):

print >> stderr, "Creating " + self.outfile + " from " + self.infile

else:

print >> stderr, "Testing that " + self.outfile + " would create " + self.infile

print >> stderr, "Storing output in " + self.errfile + " otherwise"

def run(self):

data = self.read()

if (self.multi == 1):

answers = self.process(data)

else:

answers = self.multiprocess(data, self.multi)

text = "".join(["Case #%d: %s\n" % (k + 1, answers[k]) for k in xrange(len(answers))])

if False:

print text

self.output(text)

def read(self):

inh = FileWrapper(open(self.infile, 'r'))

N = inh.getInt()

data = [self.reader(inh) for k in xrange(N)]

inh.close()

return data

def process(self, data):

N = len(data)

answers = [None] * N

for k in xrange(N):

print "%s:Working on Case %d" % (strftime("%X"), k + 1)

answers[k] = self.solver(data[k])

print "%s:Dealt with Case %d: %s" % (strftime("%X"), k + 1, answers[k])

return answers

def multiprocess(self, data, numprocs):

N = len(data)

manager = Manager()

queue = manager.Queue()

data = [[k + 1, data[k], self.solver, queue] for k in xrange(N)]

pool = Pool(numprocs)

pool.map_async(multiCase, data, 1)

answers = [None] * N

workerPids = []

workingOn = []

numTodo = N

numDoing = 0

numDone = 0

for k in xrange(2 * N):

report = queue.get()

pid = report[0]

if pid in workerPids:

idx = workerPids.index(pid)

else:

idx = len(workerPids)

workerPids += [pid]

workingOn += [None]

if report[1] == 0:

workingOn[idx] = report[2]

numTodo -= 1

numDoing += 1

else:

workingOn[idx] = None

answers[report[2] - 1] = report[3]

numDoing -= 1

numDone += 1

print "%s:then %s now %s soon %s | %s" % (strftime("%X"), clean(numDone), clean(numDoing),

clean(numTodo), " ".join([clean(z) for z in workingOn]))

return answers

def output(self, text):

if self.job == "run":

open(self.outfile, 'w').write(text)

else:

text2 = open(self.outfile, 'r').read()

if (text.strip() == text2.strip()):

print "OK"

else:

print "Not OK"

open(self.errfile, 'w').write(text)

print "new output written to %s" % self.errfile

print "Differences:"

text = text.strip().split("\n")

text2 = text2.strip().split("\n")

for k in xrange(min(len(text), len(text2))):

if text[k] != text2[k]:

print "Line %d: '%s' vs '%s'" % (k + 1, text[k], text2[k])

if len(text) != len(text2):

print "Files have different number of lines: %d vs %d" % (len(text), len(text2))

def clean(r):

return " " if (r == None) else ("%3d" % r)

def multiCase(inputList):

[caseNumber, case, solver, queue] = inputList

queue.put([getpid(), 0, caseNumber])

answer = solver(case)

queue.put([getpid(), 1, caseNumber, answer])

How to do well at Google Code Jam

Saturday, May 12, 2012

Python Code Jam Wrapper

No comments:

Post a Comment