[Cialug] Algorithm; Cutting Up A File
Daniel A. Ramaley
daniel.ramaley at DRAKE.EDU
Wed Dec 13 09:47:00 CST 2006
This sounds like a homework-type problem, but it is a little interesting
nevertheless. Below is the best i can do in Perl. It expects the file
on standard input, though it would be trivial to change it to read a
file from disk. The script doesn't care what tags you use in the input
or what order they are in; datafiles will be created all the same. It
should be easy to translate this to any language that has decent regex
support.
#!/usr/bin/perl
use strict;
use warnings;
# Slurp the file
$/ = undef;
my $input = <>;
# Split it into multiple files
while ($input =~ m'^<([^/>][^>]*)>\n(.*?)^</\1>$'gms) {
open OUTPUT, '>', "datafile-$1.txt";
print OUTPUT $2;
close OUTPUT;
}
On Tuesday 12 December 2006 16:52, Todd Walton wrote:
>Hey scripters,
>
>I'm having trouble concocting an algorithm to cut up a text file into
>blocks. I'm going to have text files that have three distinct blocks
>of information in them, and each block will be marked in some way. By
>HTML style tags, I suppose. For example:
>
>~/filez> cat datafile.txt
><description>
>This is a data file. It holds data.
></description>
>
><procedure>
>1. Read the file.
>2. Ponder meaning of existence.
>3. Write new file.
></procedure>
>
><reference>
>/usr/dict/datafile
></reference>
>
>~/filez> _
>
>What I can assume about these files is that each will have three
>pre-defined blocks of text, enclosed by HTML style tags. The tags are
>on their own line. There may or may not be text outside of these
>three blocks. There may or may not be blank lines between the blocks.
> The blocks may or may not be in a given order. Etc.
>
>How can I read in the file's contents, take out the text between the
>tags (but not the tags!), and write that text to a file? I begin with
>datafile.txt, I run the script, and I end up with
>datafile-description.txt, datafile-procedure.txt, and
>datafile-reference.txt. Here's what I have so far:
>
>while datafile.position != end
> # The block for description.
> strLine = datafile.readNextLine
> if strLine contains "<description>" then
> until strLine = "</description>"
> strLine = datafile.readNextLine
> write strLine to datafile-description.txt
> end until
> end if
>
> # The block for procedure. (same as for description)
> # The block for reference. (same as for description)
>end while
>
>So, the script runs through the text file line by line, until it finds
>the opening description tag and then, starting with the next line,
>writes it all out to a new file until it comes to the end-description
>tag. Same for the other two. Will this work? If the blocks are out
>of order in the datafile will this still work? Should I change
>something?
>
>-todd
>_______________________________________________
>Cialug mailing list
>Cialug at cialug.org
>http://cialug.org/mailman/listinfo/cialug
--
------------------------------------------------------------------------
Dan Ramaley Dial Center 118, Drake University
Network Programmer/Analyst 2407 Carpenter Ave
+1 515 271-4540 Des Moines IA 50311 USA
More information about the Cialug
mailing list