[Cialug] OT - Python Question
Stuart Thiessen
thiessenstuart at aol.com
Thu Aug 13 09:17:29 CDT 2009
Worked on it last night and finally figured out a solution. Your
comment plus several comments from others focused on the encoding
issue, so I did a hex dump of the export file and found out what the
codes were and then used a replace (data.replace('\u____', u'\x__') to
switch for the equivalent in the lower ASCII set. That did the trick
to avoid any of the strange symbols. For some strange reason, RTF was
not reading the Unicode curly quotes, etc. So, it was just simpler to
replace them all with normal quotes, etc.
I did try pyrtf (an unmaintained RTF solution for Python) but it kept
crashing Word or OpenOffice. So I just downloaded the spec for RTF and
then took one of our files, converted it to RTF and tried to find the
codes that I needed for this specific implementation. Not as flexible,
but it works for the purpose it was needed for ... for now. :)
Thanks!
Stuart
On Aug 12, 2009, at 21:17 , Matthew Nuzum wrote:
> On Wed, Aug 12, 2009 at 5:29 PM, Stuart Thiessen<thiessenstuart at aol.com
> > wrote:
>> I have a Python question that is bugging me and I've tried to google
>> for a solution, but not having much success. Rather than bothering
>> everyone with all the details, if you do python programming and you
>> think you could let me "pick your brain" for some ideas on how to
>> solve it, I'd appreciate it. In short, it has to do with a RTF file
>> that my python app is creating based on exported data from another
>> application. It keeps replacing quotes with ’. I've seen what to
>> do
>> for html files, but not for this specific situation.
>
> Ah, go ahead and post details to the list if you can. It's not such a
> high traffic list that a few more emails will annoy anyone.
>
> I suspect your problem is character encoding related but I'm not sure
> and I've never worked with RTF before. However I've seen something
> just like this happen when converting between CP1252 and the more
> standard UTF or ISO8859-1 charsets.
>
> If that is the case Python has a wonderful solution called
> BeautifulSoup which can do a lot of automagical charset conversion
> with the UnicodeDammit class.
> http://www.crummy.com/software/BeautifulSoup/documentation.html
>
> --
> Matthew Nuzum
> newz2000 on freenode, skype, linkedin, identi.ca and twitter
> _______________________________________________
> Cialug mailing list
> Cialug at cialug.org
> http://cialug.org/mailman/listinfo/cialug
More information about the Cialug
mailing list