Skip to content

Conversation

@jeffcasavant
Copy link

I had this issue writing a WARC record to a file:

[jeff@lamarzocco warcs]$ ./cleanwarc.py  in.warc.gz filtered.warc.gz
Traceback (most recent call last):
  File "./cleanwarc.py", line 86, in <module>
    main()
  File "./cleanwarc.py", line 82, in main
    filter_warc(args.infile, args.outfile)
  File "./cleanwarc.py", line 61, in filter_warc
    output_warc.write_record(record)
  File "/usr/lib/python2.7/site-packages/warc/warc.py", line 268, in write_record
    warc_record.write_to(self.fileobj)
  File "/usr/lib/python2.7/site-packages/warc/warc.py", line 161, in write_to
    f.write(self.payload)
  File "/usr/lib/python2.7/site-packages/warc/gzip2.py", line 71, in write
    BaseGzipFile.write(self, data)
  File "/usr/lib/python2.7/gzip.py", line 240, in write
    if len(data) > 0:
AttributeError: FilePart instance has no attribute '__len__'

I added a __len__ function to FilePart to fix this, but got this error:

Traceback (most recent call last):
  File "./cleanwarc.py", line 90, in <module>
    main()
  File "./cleanwarc.py", line 86, in main
    filter_warc(args.infile, args.outfile)
  File "./cleanwarc.py", line 65, in filter_warc
    output_warc.write_record(record)
  File "/usr/lib/python2.7/site-packages/warc/warc.py", line 268, in write_record
    warc_record.write_to(self.fileobj)
  File "/usr/lib/python2.7/site-packages/warc/warc.py", line 161, in write_to
    f.write(self.payload)
  File "/usr/lib/python2.7/site-packages/warc/gzip2.py", line 71, in write
    BaseGzipFile.write(self, data)
  File "/usr/lib/python2.7/gzip.py", line 241, in write
    self.fileobj.write(self.compress.compress(data))
TypeError: must be string or read-only buffer, not instance

This PR fixes both issues by passing the buf attribute of the FilePart (rather than the whole FilePart) to gzip.

@jeffcasavant jeffcasavant changed the title Fix Fix WARC writing bug Jun 10, 2016
@wolfgangmeyers
Copy link

This would be great to merge. Is the project abandoned?

@jeffcasavant
Copy link
Author

@wolfgangmeyers I guess? This has seen no attention since I submitted it, getting on a year ago. Figured it would be a no-brainer 😛 Who's the maintainer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants