[Cialug] Algorithm; Cutting Up A File
Tim Wilson
tim_linux at wilson-home.com
Tue Dec 12 18:31:59 CST 2006
I don't have a great deal of knowledge with Perl, but I hobbled
together this script:
#!/usr/bin/perl
open (FOO, "foo1.txt");
while (<FOO>) {
chomp;
if(/<description>/ || /<procedure>/ || /<reference>/) {
my_proc_file($_);
}
}
exit(0);
sub my_proc_file {
my $block = shift;
$block =~ s/<(.*)>/\1/;
$destfile = "foo1-" . $block . ".txt";
open(BAR, ">", $destfile) || die "Can't open $destfile";
LINE: while(<FOO>) {
if(/<\/description>/ || /<\/procedure>/ || /<\/reference>/) {
last LINE;
} else {
print BAR $_;
}
}
}
On 12/12/06, Todd Walton <tdwalton at gmail.com> wrote:
> Hey scripters,
>
> I'm having trouble concocting an algorithm to cut up a text file into
> blocks. I'm going to have text files that have three distinct blocks
> of information in them, and each block will be marked in some way. By
> HTML style tags, I suppose. For example:
>
> ~/filez> cat datafile.txt
> <description>
> This is a data file. It holds data.
> </description>
>
> <procedure>
> 1. Read the file.
> 2. Ponder meaning of existence.
> 3. Write new file.
> </procedure>
>
> <reference>
> /usr/dict/datafile
> </reference>
>
> ~/filez> _
>
> What I can assume about these files is that each will have three
> pre-defined blocks of text, enclosed by HTML style tags. The tags are
> on their own line. There may or may not be text outside of these
> three blocks. There may or may not be blank lines between the blocks.
> The blocks may or may not be in a given order. Etc.
>
> How can I read in the file's contents, take out the text between the
> tags (but not the tags!), and write that text to a file? I begin with
> datafile.txt, I run the script, and I end up with
> datafile-description.txt, datafile-procedure.txt, and
> datafile-reference.txt. Here's what I have so far:
>
> while datafile.position != end
> # The block for description.
> strLine = datafile.readNextLine
> if strLine contains "<description>" then
> until strLine = "</description>"
> strLine = datafile.readNextLine
> write strLine to datafile-description.txt
> end until
> end if
>
> # The block for procedure. (same as for description)
> # The block for reference. (same as for description)
> end while
>
> So, the script runs through the text file line by line, until it finds
> the opening description tag and then, starting with the next line,
> writes it all out to a new file until it comes to the end-description
> tag. Same for the other two. Will this work? If the blocks are out
> of order in the datafile will this still work? Should I change
> something?
>
> -todd
> _______________________________________________
> Cialug mailing list
> Cialug at cialug.org
> http://cialug.org/mailman/listinfo/cialug
>
--
Tim
More information about the Cialug
mailing list