wrong output in arff
I am using a Python script and writing the results (calculated using ntlk)
to an arff file. The information that needs to go into the arff file is
letters and words (nothing numerical). However, whenever I run my script I
get an arff file containing zeros.. like this:
0,0.0,0.0,0
This is the piece of my code that writes to the arff:
for fileid in corpus.fileids():
cat = str(fileid.split('/')[0])
text = corpus.words(fileid)
text2 = corpus.raw(fileid)
text3 = ngrams(text2, 3)
text4 = ngrams(text2, 4)
lijst = [frequencycount(text, freq)] + [frequencycount(text3,
chartrigramfreq)] + [frequencycount(text4, chartetragramfreq)]
merged = list(itertools.chain.from_iterable(lijst))
merged2 = ','.join(merged)
filet.write("%s\n" % merged2)
counter += 1
print counter, fileid, time()-tijd
filet.close()
No comments:
Post a Comment