wxPython and Unicode and widgets

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

wxPython and Unicode and widgets

Donn Ingle
Hi,
 I just re-read the page at http://wiki.wxpython.org/UnicodeBuild and I am
embarking on the i18n of my little app.

 It is mentioned that (in the Unicode build) string args to a wxPython
object ( I assume all widgets ) will be decoded to Unicode internally.
 I am wondering how DecodeErrors are handled. Is there some way to tell
wxPython to use the 'ignore' or 'replace' flags so that there are no
errors?

 It sounds to me like I have to create a function between a potential input
and the wx Widget. I mean (again for unicode build):

1. Bad way:
msg = "Just trust me"
wx.MessageBox( msg, "", wx.OK )

2. Good way:
msg = "Could be a fatal string"
u_msg = var2unicode(msg) # handles error and uses ignore.
wx.MessageBox( u_msg, "", wx.OK )

That seems like a lot of overhead if you have plenty of widgets all over the
place. Not to mention encdecing the returned values as per the build.

I also assume that if the user has the ansi build that the function would
have to be called var2wxPython() (and wxPython2var() ) and it would do its
best to make sure that what comes in, goes out in the right format for the
build (while handling errors too).

Is this the right kind of idea? Or is there some better way?

\d


Reply | Threaded
Open this post in threaded view
|

Re: wxPython and Unicode and widgets

Chris Mellon
On Dec 19, 2007 12:39 PM, Donn Ingle <[hidden email]> wrote:

> Hi,
>  I just re-read the page at http://wiki.wxpython.org/UnicodeBuild and I am
> embarking on the i18n of my little app.
>
>  It is mentioned that (in the Unicode build) string args to a wxPython
> object ( I assume all widgets ) will be decoded to Unicode internally.
>  I am wondering how DecodeErrors are handled. Is there some way to tell
> wxPython to use the 'ignore' or 'replace' flags so that there are no
> errors?
>

Did you just try it and see what happens?

>  It sounds to me like I have to create a function between a potential input
> and the wx Widget. I mean (again for unicode build):
>
> 1. Bad way:
> msg = "Just trust me"
> wx.MessageBox( msg, "", wx.OK )
>
> 2. Good way:
> msg = "Could be a fatal string"
> u_msg = var2unicode(msg) # handles error and uses ignore.
> wx.MessageBox( u_msg, "", wx.OK )
>
> That seems like a lot of overhead if you have plenty of widgets all over the
> place. Not to mention encdecing the returned values as per the build.
>
The same rules apply as apply everywhere else that you need to deal
with string data. By far the best solution is to decode to/from
strings at the borders of your application and use only unicode
internally. Input that comes to you in bytes (via a network or files,
usually - in a unicode build values from widgets will be unicode
objects) should be decoded into a unicode object there, where you know
the encoding.

> I also assume that if the user has the ansi build that the function would
> have to be called var2wxPython() (and wxPython2var() ) and it would do its
> best to make sure that what comes in, goes out in the right format for the
> build (while handling errors too).
>
> Is this the right kind of idea? Or is there some better way?
>


Reply | Threaded
Open this post in threaded view
|

Re: wxPython and Unicode and widgets

Chris Barker - NOAA Federal
Chris Mellon wrote:
> On Dec 19, 2007 12:39 PM, Donn Ingle <[hidden email]> wrote:

 > The same rules apply as apply everywhere else that you need to deal
 > with string data. By far the best solution is to decode to/from
 > strings at the borders of your application and use only unicode
 > internally.

To make that clear with your example:

2. "Right" way:
msg = u"Could be a fatal string"
wx.MessageBox( msg, "", wx.OK )

so pass only unicode objects to wxPython.

Where did your "could be a fatal string" come from? when you get that is
when you should handle the decoding, as Chris (the other one) said.

>> I also assume that if the user has the ansi build

You're really better off just making your app a unicode app, and not
supporting the ANSI build at all.

-Chris


--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: wxPython and Unicode and widgets

Robin Dunn
Christopher Barker wrote:

> You're really better off just making your app a unicode app, and not
> supporting the ANSI build at all.

Especially since starting in wx 2.9 there will no longer be an ANSI
build, and in Python 3.0 there will no longer be the separate string and
unicode classes, the string class will be essentially what the unicode
class is today, and a new non string-like class will be available for
collections of bytes, such as when reading/writing from/to a file.


--
Robin Dunn
Software Craftsman
http://wxPython.org  Java give you jitters?  Relax with wxPython!


Reply | Threaded
Open this post in threaded view
|

Re: wxPython and Unicode and widgets

Donn Ingle
In reply to this post by Chris Barker - NOAA Federal
In reply to all:
> msg = u"Could be a fatal string"
> wx.MessageBox( msg, "", wx.OK )
>
> so pass only unicode objects to wxPython.
>
> Where did your "could be a fatal string" come from? when you get that is
> when you should handle the decoding, as Chris (the other one) said.
I am opening font files and they can have any kind of weird family names and
file names. I need to pass those strings into various places in the app.

But I take the hint : convert everything to Unicode. It's a little tricky
when I don't know the encoding of the strings within font family names, but
I'll use ignore (at a minimum) to get around that and decode everything
with UTF-8.

Thanks,
\d


Reply | Threaded
Open this post in threaded view
|

Re: wxPython and Unicode and widgets

jmf-2
In reply to this post by Donn Ingle
Christopher Barker wrote:

To make that clear with your example:

2. "Right" way:
msg = u"Could be a fatal string"
wx.MessageBox( msg, "", wx.OK )

so pass only unicode objects to wxPython.

Where did your "could be a fatal string" come from? when you get that is
when you should handle the decoding, as Chris (the other one) said.

----

Basically you are right. But you forget, in my mind, one important effect
which is coming from the Python side. Python "speaks" better iso-8859-1 than
cp1252 (win platform). This is especially true for the str <--> unicode
conversions.

If one works in a pure <str>-type mode, eg cp1252, on a win platform, using the
wxPython ansi build, then there is a proper cp1252-ANSI mapping and it avoids
some annoying side effects.

A char like "œ" is not defined in an iso-8859-1 table. However, it is defined
in iso-8859-15. Infortunatelly, in the latter case, the position of the "€"
symbol does not correspond to "€" position in cp1252, which makes the situation
even worse. (I think this is probably one of the reason explaining why the
iso-8859-15 is not so much used.)

Today, I can summerize the situation like this, as long as both ansi and unicode
builds are available, I prefer to work with the ansi build.

I'm not so pessimistic, as far as I understand, wxWidgets 3.0 will/should solve
these issues (ansi/unicode unification). For us, Python users, the Python 3
will also imply a lot of changes.

 >>> s = 'abcœ'
 >>> print s
abcœ
 >>> u = u'abcœ'
 >>> print u
abc?
 >>> u
u'abc\x9c'
 >>> u = u'abcœ€'
 >>> u
u'abc\x9c\x80'
 >>> print u
abc??
 >>> u = unicode('abcœ€', 'iso-8859-1')
 >>> print u
abc??
 >>> u
u'abc\x9c\x80'
 >>> u = unicode('abcœ€', 'cp1252')
 >>> print u
abcœ€
 >>> u
u'abc\u0153\u20ac'
 >>> s = 'abcœ€'
 >>> print s
abcœ€

Jean-Michel Fauth, Switzerland


Reply | Threaded
Open this post in threaded view
|

Re: wxPython and Unicode and widgets

jmf-2
In reply to this post by Donn Ingle
Christopher Barker wrote:

To make that clear with your example:

2. "Right" way:
msg = u"Could be a fatal string"
wx.MessageBox( msg, "", wx.OK )

so pass only unicode objects to wxPython.

Where did your "could be a fatal string" come from? when you get that is
when you should handle the decoding, as Chris (the other one) said.

----

Basically you are right. But you forget, in my mind, one important effect
which is coming from the Python side. Python "speaks" better iso-8859-1 than
cp1252 (win platform). This is especially true for the str <--> unicode
conversions.

If one works in a pure <str>-type mode, eg cp1252, on a win platform, using the
wxPython ansi build, then there is a proper cp1252-ANSI mapping and it avoids
some annoying side effects.

A char like "œ" is not defined in an iso-8859-1 table. However, it is defined
in iso-8859-15. Infortunatelly, in the latter case, the position of the "€"
symbol does not correspond to "€" position in cp1252, which makes the situation
even worse. (I think this is probably one of the reason explaining why the
iso-8859-15 is not so much used.)

Today, I can summerize the situation like this, as long as both ansi and unicode
builds are available, I prefer to work with the ansi build.

I'm not so pessimistic, as far as I understand, wxWidgets 3.0 will/should solve
these issues (ansi/unicode unification). For us, Python users, the Python 3
will also imply a lot of changes.

 >>> s = 'abcœ'
 >>> print s
abcœ
 >>> u = u'abcœ'
 >>> print u
abc?
 >>> u
u'abc\x9c'
 >>> u = u'abcœ€'
 >>> u
u'abc\x9c\x80'
 >>> print u
abc??
 >>> u = unicode('abcœ€', 'iso-8859-1')
 >>> print u
abc??
 >>> u
u'abc\x9c\x80'
 >>> u = unicode('abcœ€', 'cp1252')
 >>> print u
abcœ€
 >>> u
u'abc\u0153\u20ac'
 >>> s = 'abcœ€'
 >>> print s
abcœ€

Jean-Michel Fauth, Switzerland


Reply | Threaded
Open this post in threaded view
|

Re: Re: wxPython and Unicode and widgets

Karsten Hilbert
In reply to this post by Donn Ingle
On Thu, Dec 20, 2007 at 01:59:11AM +0200, Donn Ingle wrote:

> I am opening font files and they can have any kind of weird family names and
> file names. I need to pass those strings into various places in the app.

For file names use sys.getfilesystemencoding().

Data *inside* the font files really *should* have an
implicit or explicit declaration of encoding somehwere.

Implicit might be

- whatever the author of the font file happened to encode
  data in his locale in
- a fixed encoding the font file specs might have put down

But the world isn't perfect so it may still be tricky. I am
at times dealing with files

a) which tell me are in encoding 1
b) actually are in encoding 2
c) all the while the specs mandate encoding 3

Hah !

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


Reply | Threaded
Open this post in threaded view
|

Re: Re: wxPython and Unicode and widgets

Donn Ingle
Karsten Hilbert wrote:

try:
 > a) which tell me are in encoding 1
 > b) actually are in encoding 2
 > c) all the while the specs mandate encoding 3
except:
 raise GetOffMyPlanet

I hear you :D

\d


Reply | Threaded
Open this post in threaded view
|

Re: wxPython and Unicode and widgets

Chris Mellon
In reply to this post by jmf-2
On Dec 20, 2007 4:39 AM, jmf <[hidden email]> wrote:

> Christopher Barker wrote:
>
> To make that clear with your example:
>
> 2. "Right" way:
> msg = u"Could be a fatal string"
> wx.MessageBox( msg, "", wx.OK )
>
> so pass only unicode objects to wxPython.
>
> Where did your "could be a fatal string" come from? when you get that is
> when you should handle the decoding, as Chris (the other one) said.
>
> ----
>
> Basically you are right. But you forget, in my mind, one important effect
> which is coming from the Python side. Python "speaks" better iso-8859-1 than
> cp1252 (win platform). This is especially true for the str <--> unicode
> conversions.
>

Sorry, but this just isn't true. Python "speaks" both of them perfectly well.

> If one works in a pure <str>-type mode, eg cp1252, on a win platform, using the
> wxPython ansi build, then there is a proper cp1252-ANSI mapping and it avoids
> some annoying side effects.
>

"str" is a sequence of bytes. The default conversion to use when
converting to unicode is ascii, but one thing it absolutely is not is
cp1252. If you're getting input as cp1252 and you need to worry about
conversions, then you need to manage them explicitly.

> A char like "œ" is not defined in an iso-8859-1 table. However, it is defined
> in iso-8859-15. Infortunatelly, in the latter case, the position of the "€"
> symbol does not correspond to "€" position in cp1252, which makes the situation
> even worse. (I think this is probably one of the reason explaining why the
> iso-8859-15 is not so much used.)
>
> Today, I can summerize the situation like this, as long as both ansi and unicode
> builds are available, I prefer to work with the ansi build.
>

All using ansi is doing is pushing your problem out of sight. The
problem is in your application logic (you aren't converting correctly
between encodings), not in either wx or in Python.

> I'm not so pessimistic, as far as I understand, wxWidgets 3.0 will/should solve
> these issues (ansi/unicode unification). For us, Python users, the Python 3
> will also imply a lot of changes.
>
>  >>> s = 'abcœ'
>  >>> print s
> abcœ
>  >>> u = u'abcœ'
>  >>> print u
> abc?
>  >>> u
> u'abc\x9c'
>  >>> u = u'abcœ€'
>  >>> u
> u'abc\x9c\x80'
>  >>> print u
> abc??
>  >>> u = unicode('abcœ€', 'iso-8859-1')
>  >>> print u
> abc??
>  >>> u
> u'abc\x9c\x80'
>  >>> u = unicode('abcœ€', 'cp1252')
>  >>> print u
> abcœ€
>  >>> u
> u'abc\u0153\u20ac'
>  >>> s = 'abcœ€'
>  >>> print s
> abcœ€
>

Don't make the mistake of thinking that the encoding of the string
literals in your source code has anything to do with the encoding of
the string objects in your program. This can be kind of complicated so
it's not surprising that it confuses people.

Whats happening here is that you're pasting bytes in cp1252. The
terminal you're pasting into (PyShell?) renders these bytes according
to it's default encoding but what Python sees is the raw bytes. When
you print, what python is doing is sending raw bytes (in the ascii
encoding, unless you override it manually) to your terminal, which
displays them however it knows best. You need to make sure that the
correct bytes get both in and out of your application.

I tried to show a DOS terminal session, but I can't actually copy and
paste non-ascii characters from the terminal. Thanks, Microsoft :P


Reply | Threaded
Open this post in threaded view
|

Re: wxPython and Unicode and widgets

Robin Dunn
Chris Mellon wrote:

> On Dec 20, 2007 4:39 AM, jmf <[hidden email]> wrote:
>> Christopher Barker wrote:
>>
>> To make that clear with your example:
>>
>> 2. "Right" way:
>> msg = u"Could be a fatal string"
>> wx.MessageBox( msg, "", wx.OK )
>>
>> so pass only unicode objects to wxPython.
>>
>> Where did your "could be a fatal string" come from? when you get that is
>> when you should handle the decoding, as Chris (the other one) said.
>>
>> ----
>>
>> Basically you are right. But you forget, in my mind, one important effect
>> which is coming from the Python side. Python "speaks" better iso-8859-1 than
>> cp1252 (win platform). This is especially true for the str <--> unicode
>> conversions.
>>
>
> Sorry, but this just isn't true. Python "speaks" both of them perfectly well.
>
>> If one works in a pure <str>-type mode, eg cp1252, on a win platform, using the
>> wxPython ansi build, then there is a proper cp1252-ANSI mapping and it avoids
>> some annoying side effects.
>>
>
> "str" is a sequence of bytes. The default conversion to use when
> converting to unicode is ascii,

Not always.  Python's default can be changed from the site.py file.  And
for automatic conversions done in wxPython (passing a string to a
wxString parameter in a Unicode build, or passing a Unicode object in a
ansi build) then if sys.getdefaultencoding() is still "ascii" then
wxPython will use locale.getdefaultlocale()[1] for the encoding
conversions.  Doing it this way means that when the programmer needs to
deal with strings that are not strictly ascii then most of the time
wxPython will Do The Right Thing with the conversion because it will use
the current system locale's default encoding.

For the curious here is the actual code for deciding what encoding to use:


default = _sys.getdefaultencoding()
if default == 'ascii':
     import locale
     import codecs
     try:
         if hasattr(locale, 'getpreferredencoding'):
             default = locale.getpreferredencoding()
         else:
             default = locale.getdefaultlocale()[1]
         codecs.lookup(default)
     except (ValueError, LookupError, TypeError):
         default = _sys.getdefaultencoding()
     del locale
     del codecs
if default:
     wx.SetDefaultPyEncoding(default)
del default


You can find out what conversion encoding wxPython is using with
wx.GetDefaultPyEncoding, and you can change it if you want with
wx.SetDefaultPyEncoding.

--
Robin Dunn
Software Craftsman
http://wxPython.org  Java give you jitters?  Relax with wxPython!


Reply | Threaded
Open this post in threaded view
|

Re: wxPython and Unicode and widgets

jmf-2
In reply to this post by Donn Ingle
 > Chris Mellon

 > Whats happening here is that you're pasting bytes in cp1252. The
 > terminal you're pasting into (PyShell?) renders these bytes according
 > to it's default encoding but what Python sees is the raw bytes.

The shell outputs I posted in my previous message are coming from
my Python shell, psi, http://spinecho.ifrance.com/psi.html

Why is it working so smoothly? Because in psi, the sys.stdout
pseudo file has an internal mechanism which will convert any
<unicode> to a <str> using the locale encoding and the 'replace'
flag (from there the question marks).

If I attempt to reproduce these commands in PyShell, the shell
will raise UnicodeEncode/DecodeError. Which is not wrong, too.

I do not claim my approach is good or better than one another.
I am quite satified with it.

 > I tried to show a DOS terminal session, but I can't actually copy and
 > paste non-ascii characters from the terminal. Thanks, Microsoft :P

Well, here this is not a question of ascii/non-ascii. The DOS box is
using (historical reason?) the code pages cp850 for Western European
systems and cp437 for US platform.

I cann't check my examples and show the result. Neither the "€" nor
the "œ" char/glyph exist for cp850.

Jean-Michel Fauth, Switzerland


Reply | Threaded
Open this post in threaded view
|

Re: wxPython and Unicode and widgets

Chris Mellon
On Dec 20, 2007 3:01 PM, jmf <[hidden email]> wrote:

>  > Chris Mellon
>
>  > Whats happening here is that you're pasting bytes in cp1252. The
>  > terminal you're pasting into (PyShell?) renders these bytes according
>  > to it's default encoding but what Python sees is the raw bytes.
>
> The shell outputs I posted in my previous message are coming from
> my Python shell, psi, http://spinecho.ifrance.com/psi.html
>
> Why is it working so smoothly? Because in psi, the sys.stdout
> pseudo file has an internal mechanism which will convert any
> <unicode> to a <str> using the locale encoding and the 'replace'
> flag (from there the question marks).
>
> If I attempt to reproduce these commands in PyShell, the shell
> will raise UnicodeEncode/DecodeError. Which is not wrong, too.
>
> I do not claim my approach is good or better than one another.
> I am quite satified with it.
>

I'm not sure how that is relevant to the discussion, then - you've got
a customized environment that suppresses errors, but that's obviously
not the best general case solution, and it's got nothing to do with
ascii vs unicode mode wxPython or support for cp1250 in python.

>  > I tried to show a DOS terminal session, but I can't actually copy and
>  > paste non-ascii characters from the terminal. Thanks, Microsoft :P
>
> Well, here this is not a question of ascii/non-ascii. The DOS box is
> using (historical reason?) the code pages cp850 for Western European
> systems and cp437 for US platform.
>

My terminal seems to be using cp1250, or at least it's consistent with
my tests in that direction. Regardless, my point is that while it can
and does display non-ascii characters, it won't copy them to the
clipboard.


Reply | Threaded
Open this post in threaded view
|

Re: wxPython and Unicode and widgets

jmf-2
In reply to this post by Donn Ingle
Hi Chris (Mellon)

 > I'm not sure how that is relevant to the discussion, then - you've got
 > a customized environment that suppresses errors, but that's obviously
 > not the best general case solution, and it's got nothing to do with
 > ascii vs unicode mode wxPython or support for cp1250 in python.

You don't get the point or I did not explained it correctly. The problem
here is not only a wxPython ansi/unicode problem. It lies in the fact, that
if a developper has to use the unicode build, she/he has probably to deal
with all this unicode encoding/decoding struff. I have also tried to show, that
the solution as proposed by Chris Barker, using u'   ' strings is not the
panacea. From this point of view, I think it fits very well with the concerns
of the original message posted by Donn Ingle.

I'm visiting enough French fora to know, this Python unicode encoding/decoding
stuff is a nightmare for a lot of developpers. (I, too, have sometimes to think
twice on this field).

 > My terminal seems to be using cp1250, ...

cp1250, alias windows-cp1250, corresponds to the setting for Central and Eastern
Europe languages.

Jean-Michel Fauth, Switzerland


Reply | Threaded
Open this post in threaded view
|

Re: wxPython and Unicode and widgets

Andrea Gavana
In reply to this post by Robin Dunn
Hi All,

On Dec 20, 2007 7:52 PM, Robin Dunn wrote:

> Chris Mellon wrote:
> > On Dec 20, 2007 4:39 AM, jmf <[hidden email]> wrote:
> >> Christopher Barker wrote:
> >>
> >> To make that clear with your example:
> >>
> >> 2. "Right" way:
> >> msg = u"Could be a fatal string"
> >> wx.MessageBox( msg, "", wx.OK )
> >>
> >> so pass only unicode objects to wxPython.
> >>
> >> Where did your "could be a fatal string" come from? when you get that is
> >> when you should handle the decoding, as Chris (the other one) said.
> >>
> >> ----
> >>
> >> Basically you are right. But you forget, in my mind, one important effect
> >> which is coming from the Python side. Python "speaks" better iso-8859-1 than
> >> cp1252 (win platform). This is especially true for the str <--> unicode
> >> conversions.
> >>
> >
> > Sorry, but this just isn't true. Python "speaks" both of them perfectly well.
> >
> >> If one works in a pure <str>-type mode, eg cp1252, on a win platform, using the
> >> wxPython ansi build, then there is a proper cp1252-ANSI mapping and it avoids
> >> some annoying side effects.
> >>
> >
> > "str" is a sequence of bytes. The default conversion to use when
> > converting to unicode is ascii,
>
> Not always.  Python's default can be changed from the site.py file.  And
> for automatic conversions done in wxPython (passing a string to a
> wxString parameter in a Unicode build, or passing a Unicode object in a
> ansi build) then if sys.getdefaultencoding() is still "ascii" then
> wxPython will use locale.getdefaultlocale()[1] for the encoding
> conversions.  Doing it this way means that when the programmer needs to
> deal with strings that are not strictly ascii then most of the time
> wxPython will Do The Right Thing with the conversion because it will use
> the current system locale's default encoding.
>
> For the curious here is the actual code for deciding what encoding to use:
>
>
> default = _sys.getdefaultencoding()
> if default == 'ascii':
>     import locale
>     import codecs
>     try:
>         if hasattr(locale, 'getpreferredencoding'):
>             default = locale.getpreferredencoding()
>         else:
>             default = locale.getdefaultlocale()[1]
>         codecs.lookup(default)
>     except (ValueError, LookupError, TypeError):
>         default = _sys.getdefaultencoding()
>     del locale
>     del codecs
> if default:
>     wx.SetDefaultPyEncoding(default)
> del default
>
>
> You can find out what conversion encoding wxPython is using with
> wx.GetDefaultPyEncoding, and you can change it if you want with
> wx.SetDefaultPyEncoding.

Now that we are close to abandon ansi builds (as far as I understood,
which makes me less than happy anyway), there are a couple of things
that astonish me a bit, while being ironically sad (or sadly ironic?):

- It is very difficult (impossible?) to setup an encoding which will
support *all* the possible characters in the known world languages. I
used utf-8 for GUI2Exe but I remember I read it could fail anyway in
some occasions (but my memory could fail here);

- It should be enough to put something like:

# -*- coding: utf-8 -*-

At the beginning of a script to force
Python/wxPython/numpy/matplotlib/whatever site-package you want to
transparently encode/decode everything without the developer
intervention. If I wish to distribute my application in China, Russia
or Germany, my opinion is that I should not waste more than an eye
blink time to think about encodings.

All this stuff about sys.getdefaultencoding(),
wx.GetDefaultPyEncoding(), # -*- coding: whatever -*-,
locale.set_locale(), codecs, BOM, is extremely confusing if you are
not a Python guru (which I am not). There are many resources on the
web to read about it, but sometimes they just help in increasing the
confusion.

I am curious to see what will happen when I'll start moving my
database-based app to unicode (remembering the GUI2Exe encoding
nightmare, God Save Andrea) :-D

Andrea.

"Imagination Is The Only Weapon In The War Against Reality."
http://xoomer.alice.it/infinity77/


Reply | Threaded
Open this post in threaded view
|

Re: wxPython and Unicode and widgets

Andrea Gavana
By the way, sorry for my lack of politeness... I just wanted to wish
everyone a Merry Christmas and Happy New 2008 :-D . See you the 8th of
January!

Andrea.

"Imagination Is The Only Weapon In The War Against Reality."
http://xoomer.alice.it/infinity77/


Reply | Threaded
Open this post in threaded view
|

Re: wxPython and Unicode and widgets

Chris Mellon
In reply to this post by jmf-2
On Dec 21, 2007 3:54 AM, jmf <[hidden email]> wrote:

> Hi Chris (Mellon)
>
>  > I'm not sure how that is relevant to the discussion, then - you've got
>  > a customized environment that suppresses errors, but that's obviously
>  > not the best general case solution, and it's got nothing to do with
>  > ascii vs unicode mode wxPython or support for cp1250 in python.
>
> You don't get the point or I did not explained it correctly. The problem
> here is not only a wxPython ansi/unicode problem. It lies in the fact, that
> if a developper has to use the unicode build, she/he has probably to deal
> with all this unicode encoding/decoding struff. I have also tried to show, that
> the solution as proposed by Chris Barker, using u'   ' strings is not the
> panacea. From this point of view, I think it fits very well with the concerns
> of the original message posted by Donn Ingle.
>

If you can use an ascii build without trouble, then you won't have any
*more* trouble using a unicode build. If you have encoding issues in
unicode builds that you don't have in ascii, your application still
has the errors - they're just working by happenstance.

If you need to support multiple languages and locales and encodings,
then it doesn't matter what build you use - you have to deal with
these issues one way or another. Using unicode literals is not a
panacea, but it's an important step if you're supporting a non-ascii
encoding.

> I'm visiting enough French fora to know, this Python unicode encoding/decoding
> stuff is a nightmare for a lot of developpers. (I, too, have sometimes to think
> twice on this field).
>

Once you understand the fundamental difference between unicode and raw
bytes, it's really not that difficult to understand, and
encoding/decoding isn't that complicated. Figuring out what encoding
you need, and all the other issues related to i18ln are another story,
of course.

>  > My terminal seems to be using cp1250, ...
>
> cp1250, alias windows-cp1250, corresponds to the setting for Central and Eastern
> Europe languages.
>

My terminal displays characters encoded using cp1250 as expected. This
may just be due to overlap - I'm not an expert on windows code pages -
or it may do something more complicated. The DOS terminals behavior in
this regard doesn't seem to be well documented.


Reply | Threaded
Open this post in threaded view
|

Re: wxPython and Unicode and widgets

Karsten Hilbert
In reply to this post by jmf-2
On Fri, Dec 21, 2007 at 10:54:05AM +0100, jmf wrote:

> The problem
> here is not only a wxPython ansi/unicode problem. It lies in the fact, that
> if a developper has to use the unicode build, she/he has probably to deal
> with all this unicode encoding/decoding struff.

See, this if..then is the main symptom of the
misunderstanding. There is no if-then about having to deal
with encodings.

The basic fact to realize is that:

        "there is no such thing as plain-text"

One *always* has to deal with encodings.

However, in tightly bounded environments one may get away a
great deal with letting Python/wxPython/C/gettext/locale do
its thing. Under such circumstances it may appear easier to
use an ANSI build.

The big but comes here: in most cases the "tightly bounded"
environment eventually isn't that tightly bounded. The point
is that one needs to take control of all data sources and
sinks: files, filesystem, network, databases, console,
pipes, ... Some of those will be pre-determined in a
controlled environment (such as the console and the
filesystem). Others are not (network, database).

In other words: we never control the encoding of all sources ...

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


Reply | Threaded
Open this post in threaded view
|

Re: wxPython and Unicode and widgets

Chris Barker - NOAA Federal
In reply to this post by Chris Mellon
Here's how I think about it:

No matter how you slice it, and whether you are dealing with ANSI or
Unicode, you NEED to know the encoding your string of bytes is in, and
you need to deal with that appropriately. Period, end of story.

Unicode can appear to cause additional problems because Unicode
encodings can holds many, many more code points, and thus when you try
to translate from Unicode to an ANSI encoding, it is quite possible to
have a code point that can not be translated. However, this problem is
eliminated if you use unicode everywhere, AND you know what encoding
you're dealing with.

When I find Python a bit frustrating is that I don't think you can tell
it, on the application level, to tell it to use the "replace" or
"ignore" flag by default with encode(). I'd often rather get garbage
than an exception in my apps!

Anyway, if you use Unicode everywhere in your app, then you "only" have
to deal with these issues on I/O -- you need to know the encoding of
data your reading in, and you need to provide the needed encoding for
data you're writing out. You need to do that right for ANSI too, so you
haven't lost anything.

Which brings up another point touched on in this thread -- "only I/O".
It's very handy to use "print" with python, but that's I/O, and most
terminals don't seem to support unicode, or python doesn't know what
encoding to send to the terminal, so, again -- more pain.

So why bother with Unicode at all, if you still have "which encoding"
confusion? Two reasons:

  -- unicode can hold more code points (theoretically all of them for
all languages), so you can have multiple languages supported within one
document, which can be a nice feature.

-- It's the way the computing world is going, so you're going to have to
deal with it anyway. If you start getting unicode data as input, you're
going to be a whole lot better off if you're using Unicode internally,
otherwise you WILL lose data (at best) or crash your app (at worst) if
you get data that can't be represented in ANSI.

We're dealing with this right now in a Web app that get data from a lot
of sources, and it's using libs that aren't fully unicode. It's all too
easy for it to accept input from the browser, save it in the database,
then crash when you try to edit it again -- aarrgg!!

Chris Mellon wrote:
 > Once you understand the fundamental difference between unicode and raw
 > bytes, it's really not that difficult to understand, and
 > encoding/decoding isn't that complicated.

Here's a good start to understanding Unicode:
http://www.joelonsoftware.com/articles/Unicode.html

And here's one that's Python-specific:
http://boodebr.org/main/python/all-about-python-and-unicode

-Chris


--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: wxPython and Unicode and widgets

Chris Barker - NOAA Federal
In reply to this post by Andrea Gavana
Andrea Gavana wrote:
> - It is very difficult (impossible?) to setup an encoding which will
> support *all* the possible characters in the known world languages. I
> used utf-8 for GUI2Exe but I remember I read it could fail anyway in
> some occasions (but my memory could fail here);

I'm curious, because that's certainly the goal of Unicode, and utf-8 is
supposed to support it. Besides, where could data come from that can't
be encoded as utf-8?

There is a Python issue that still confuses me. Internally, Python can
use either UCS-16 or UCS-32, depending on how it is compiled. It's my
understanding the UCS-16 can hold most, but not all of the codepoints,
so what happens when you try to use one that it can't hold?

And what does wx use internally in Unicode builds?

-Chris


--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]


12