[Cialug] Algorithm; Cutting Up A File
Matthew Nuzum
matthew.nuzum at canonical.com
Tue Dec 12 17:29:39 CST 2006
On Tue, 2006-12-12 at 16:52 -0600, Todd Walton wrote:
> Hey scripters,
>
> I'm having trouble concocting an algorithm to cut up a text file into
> blocks. I'm going to have text files that have three distinct blocks
> of information in them, and each block will be marked in some way. By
> HTML style tags, I suppose. For example:
>
> What I can assume about these files is that each will have three
> pre-defined blocks of text, enclosed by HTML style tags. The tags are
> on their own line. There may or may not be text outside of these
> three blocks. There may or may not be blank lines between the blocks.
> The blocks may or may not be in a given order. Etc.
>
> How can I read in the file's contents, take out the text between the
> tags (but not the tags!), and write that text to a file? I begin with
> datafile.txt, I run the script, and I end up with
> datafile-description.txt, datafile-procedure.txt, and
> datafile-reference.txt. Here's what I have so far:
Try this, which uses regex instead. It can be made more robust by making
the regex handle the blocks out of order, but it works with your
example:
#!/usr/bin/python
import re
data = """<description>
This is a data file. It holds data.
</description>
<procedure>
1. Read the file.
2. Ponder meaning of existence.
3. Write new file.
</procedure>
<reference>
/usr/dict/datafile
</reference>"""
regex =
'(?P<desc><description>.*</description>).*(?P<proc><procedure>.*</procedure>).*(?P<ref><reference>.*</reference>)'
m = re.search(regex, data, re.S)
print m.group('desc')
print m.group('proc')
print m.group('ref')
--
Matthew Nuzum
newz2000 on freenode
More information about the Cialug
mailing list