[Cialug] multiline regular expression

Dave Weis djweis at internetsolver.com
Thu Jun 28 08:10:36 CDT 2007


Morris Dovey wrote:
> Dave Weis wrote:
> | I need to parse a text file that contains two lines per record in
> | this format:
> |      324     NOR  05/17/07 10:21A    0000000   999-999-9999
> |       COLUMBUS OH 1      .9        .0700        .0000        .0000
> |
> |       Y    CURRENT         00:00:54  0000000   999-999-9999 DES
> | MOINES IA
> |
> | There are other lines in the file that are similar like
> |     LATA   USGE-GP  DATE   TIME     DEST-CITY
> | --------DESTINATION--------   #RECS  MINUTES  AMOUNT-1     AMOUNT-2
> | VOL-AMT
> |      ANI   STATUS          ACT-DUR  ORIG-CITY
> | --------ORIGINATION--------           MISC-1               MISC-2
> | VOL-COD
> | that is the header.
> |
> | I'll be using the java regexp but if anyone can direct me on any
> | regexp setup I'll convert it myself.
> 
> Hmm. It'd help to know more about the problem context - I'd be
> inclined to do the parsing in C using something like
> http://www.iedu.com/mrd/c/tokenize.c and, depending on the size of the
> file, something like http://www.iedu.com/mrd/c/tokfile.c  to tokenize
> all the lines in the file in one shot...

It's a couple hundred thousand lines and will get longer every month - 
long distance bill.

The final destination is into a postgresql database for rating and billing.

dave



More information about the Cialug mailing list