It provides the following functionality:
- By command line arguments (-t [default], -s <number> or -l) specify that I want to use file A-test.in, A-small.attempt0.in or A-large.in
- By command line arguments (-r [default] or -v) specify that I want to run my code on the .in file, creating a similar file called .out, or I want to check that my code would turn the .in file to the .out file that already exists.
- I can specify the command line argument -m <number>, and it will automatically do multiprocessing with that number of processes, handing each case on a single process.
My general use case will be as follows:
- Copy the sample in and out data into my directory (as A-test.in and A-test.out)
- Open a python file I have already created containing
def reader(inFile):
return 0
def solver(case):
return 0if __name__ == "__main__":from GCJ import GCJGCJ(reader, solver, "/Users/lpebody/gcj/setup/", "a").run() - Write the reader code and the solver code.
- Run the script to test on the test data (using python a.py -v -t)
- If this shows issues in my method, goto 3.
- Download small data (as a-small-attempt0.in)
- Run the script on the small data (using python a.py -s 0)
- Assuming no issues, upload the output.
- If issues arose (exceptions, took too long, output was wrong) return to 3 (incrementing the number in 7 when necessary). Otherwise revel in my new found points!
- (Think for a while about whether my code is good enough to work on the large data set) - optional.
- If I think it is already good enough, jump forward to step 14.
- Rewrite the slow bits of code
- Check that it is working on the test data (using python a.py -v -t) and the small data (using python a.py -v -s <number>). If it isn't, goto 12.
- Once I am sure that it will work, think about whether I need multiprocessing.
- Download large data (as a-large.in)
- Run the script on the large data (using python a.py -l)
- Upload the output
- (Worry for less than two and half hours on whether the output is correct.) - optional
Note: the checking currently does exact string checking, which is likely to fail for (e.g.) floats. I do intend at some point to alter this code to check floats up to a specified tolerance.
The reader takes a single parameter, called inFile, and should return a data structure containing all of the information needed to solve the next case in the data set. inFile is a wrapper around python's own file class that currently provides the following methods:
- getInt() - read a line containing one integer, and return the integer
- getInts() - read a line containing whitespace separated integers, and return the integers
- getFloat() and getFloats() - the same as getInt() and getInts() but with floats
- getWords() - return the result of running python's split() on the next line of text
- readline() - return the next line of text
For example, for the problem Diamond Inheritance from 2012 Round 1C, the reader function could read:
def reader(inFile):
return [[z - 1 for z in inFile.getInts()[1:]] for k in xrange(inFile.getInts())]
The solver is fed in the data structure that the reader created. It should return the solution. The output file will just consist of Case numbers, and whatever the solver returns turned into a string. So again, for Diamond Inheritance, the solver function could read:
def solver(dependents):
N = len(dependents)
for i in xrange(N):
dep = [False] * N
q = [i]
while (len(q)):
for j in dependents[q.pop()]:
if (dep[j]):
return "Yes"
dep[j] = True
q.append(j)
return "No"
Then, finally, to wrap it all together,
if __name__ == "__main__":
from GCJ import GCJ
GCJ(reader, solver, "/Users/lpebody/gcj/setup/", "a").run()
The third and fourth directories are whatever directory your input and output data should be in, and the letter (A, B, C, D, E or F) describing the problem within the contest.
Without any further ado, here is my wrapper code.
from getopt import getopt, GetoptError
from multiprocessing import Pool, Queue, Manager
from os import getpid
from sys import argv, stderr
from time import strftime
class FileWrapper:
def __init__(self, file):
self.file = file
def getInt(self):
return int(self.file.readline())
def getInts(self):
return [int(z) for z in self.file.readline().split()]
def getFloat(self):
return float(self.file.readline())
def getFloats(self):
return [float(z) for z in self.file.readline().split()]
def getWords(self):
return self.file.readline().split()
def readline(self):
return self.file.readline().strip()
def close(self):
self.file.close
class GCJ:
"""
Wrapper for a lot of functionality that is useful when trying to solve a
Google Code jam question.
For the case of Problem A of the 2008 Qualification Round, example code
would be:
def parse(inFile):
N = int(inFile.readline())
searchEngines = [inFile.readline().strip() for z in xrange(N)]
Q = int(inFile.readline())
queries = [inFile.readline().strip() for z in xrange(Q)]
return [searchEngines, queries]
def solve([searchEngines, queries]):
N = len(searchEngines)
dict = {searchEngines[i]: i for i in xrange(N)}
qs = [dict[k] for k in queries]
used = [False] * N
numberused = 0
resets = 0
for k in qs:
if not used[k]:
if numberused == N - 1:
resets += 1
numberused = 0
used = [False] * N
used[k] = True
numberused += 1
return str(resets)
if __name__ == "__main__":
from GCJ import GCJ
GCJ(parse, solve, "c:\\temp", "a").run()
This would give a script with command line options -t, -s, -l, -r, -v, -m:
-t (default)
input file = "c:\\temp\\a-test.in"
output file = "c:\\temp\\a-test.out"
error file = "c:\\temp\\a-test.err"
(the directory "c:\\temp" and the character "a" at the start of the
filenames come from arguments to GCJ in the source code)
-s <number>
input file = "c:\\temp\\a-small-attempt<number>.in"
output file = "c:\\temp\\a-small-attempt<number>.out"
error file = "c:\\temp\\a-small-attempt<number>.err"
-l
input file = "c:\\temp\\a-large.in"
output file = "c:\\temp\\a-large.out"
error file = "c:\\temp\\a-large.err"
-r (default)
read the input file. Read the first line to get the number of cases N
included in the input. Run parse(infile) N times to get N case objects,
and then run solve(case object) on each of them. Output the strings
returned from solve, prepended with "Case #<case #>: ". Put output in
output file, overwriting it if it exists.
-v
Do the same as above, but check if the contents of the output file now
are as they would be otherwise. If it is, say "OK". Otherwise, output all
differences, and write the output into the error file, overwriting it if
it exists.
-m <number>
Set up a pool of <number> worker threads, and put all of the problems
(along with their problem numbers) on a queue. Each thread, when finished
with a problem, will take the next case from the queue.
"""
def __init__(self, reader, solver, directory, question):
self.reader = reader
self.solver = solver
self.question = question
print >> stderr, "GCJ Wrapper initiated."
print >> stderr, "Reading command line arguments"
try:
opts, args = getopt(argv[1:],
"vs:tlm:",
["validate", "small", "test", "large", "multi"])
except GetoptError, err:
print str(err)
exit(2)
datatype = "test"
self.job = "run"
self.multi = 1
for o,a in opts:
if o in ("-s", "--small"):
datatype = "small-attempt" + str(int(a))
elif o in ("-l", "--large"):
datatype = "large"
elif o in ("-v", "--validate"):
self.job = "validate"
elif o in ("-m", "--multi"):
self.multi = int(a)
filepref = directory + "/" + question + "-" + datatype
self.infile = filepref + ".in"
self.outfile = filepref + ".out"
self.errfile = filepref + ".err"
if (self.job == "run"):
print >> stderr, "Creating " + self.outfile + " from " + self.infile
else:
print >> stderr, "Testing that " + self.outfile + " would create " + self.infile
print >> stderr, "Storing output in " + self.errfile + " otherwise"
def run(self):
data = self.read()
if (self.multi == 1):
answers = self.process(data)
else:
answers = self.multiprocess(data, self.multi)
text = "".join(["Case #%d: %s\n" % (k + 1, answers[k]) for k in xrange(len(answers))])
if False:
print text
self.output(text)
def read(self):
inh = FileWrapper(open(self.infile, 'r'))
N = inh.getInt()
data = [self.reader(inh) for k in xrange(N)]
inh.close()
return data
def process(self, data):
N = len(data)
answers = [None] * N
for k in xrange(N):
print "%s:Working on Case %d" % (strftime("%X"), k + 1)
answers[k] = self.solver(data[k])
print "%s:Dealt with Case %d: %s" % (strftime("%X"), k + 1, answers[k])
return answers
def multiprocess(self, data, numprocs):
N = len(data)
manager = Manager()
queue = manager.Queue()
data = [[k + 1, data[k], self.solver, queue] for k in xrange(N)]
pool = Pool(numprocs)
pool.map_async(multiCase, data, 1)
answers = [None] * N
workerPids = []
workingOn = []
numTodo = N
numDoing = 0
numDone = 0
for k in xrange(2 * N):
report = queue.get()
pid = report[0]
if pid in workerPids:
idx = workerPids.index(pid)
else:
idx = len(workerPids)
workerPids += [pid]
workingOn += [None]
if report[1] == 0:
workingOn[idx] = report[2]
numTodo -= 1
numDoing += 1
else:
workingOn[idx] = None
answers[report[2] - 1] = report[3]
numDoing -= 1
numDone += 1
print "%s:then %s now %s soon %s | %s" % (strftime("%X"), clean(numDone), clean(numDoing),
clean(numTodo), " ".join([clean(z) for z in workingOn]))
return answers
def output(self, text):
if self.job == "run":
open(self.outfile, 'w').write(text)
else:
text2 = open(self.outfile, 'r').read()
if (text.strip() == text2.strip()):
print "OK"
else:
print "Not OK"
open(self.errfile, 'w').write(text)
print "new output written to %s" % self.errfile
print "Differences:"
text = text.strip().split("\n")
text2 = text2.strip().split("\n")
for k in xrange(min(len(text), len(text2))):
if text[k] != text2[k]:
print "Line %d: '%s' vs '%s'" % (k + 1, text[k], text2[k])
if len(text) != len(text2):
print "Files have different number of lines: %d vs %d" % (len(text), len(text2))
def clean(r):
return " " if (r == None) else ("%3d" % r)
def multiCase(inputList):
[caseNumber, case, solver, queue] = inputList
queue.put([getpid(), 0, caseNumber])
answer = solver(case)
queue.put([getpid(), 1, caseNumber, answer])
No comments:
Post a Comment