|
|
SleepingWolf
Paranoid (IV) InmateFrom: Insane since: Jul 2006
|
posted 12-01-2007 01:41
MySQL tables from my blog are full of characters I didn't ask for - such as couldnâ??t instead of couldn't.
Preliminary research indicates this results from cutting and pasting text from MS Word into the WordPress blog.
The blog text appears fine, but the tables show the crap as does the .sql backup file which is exported from myphpadmin.
MySQL is set to english utf - so it shouldn't be a coding issue. I can't use search and replace either because the replaced characters are not consistent....the same characters can replace a dash or a quote, and sometimes multiple characters replace one character.
Any ideas? This is not a problem now, but it would be a disaster if I needed to restore my site from a backup.
Nature & Travel Photography
Visit the Sleeping Wolves
|
hyperbole
Paranoid (IV) InmateFrom: Madison, Indiana Insane since: Aug 2000
|
posted 12-02-2007 06:04
Using MS Word to generate text is not a good idea. MircoSoft deliberately puts non-printable cahracters into the content of any word document to keep other companies from reading and writing Word documents.
If you want to create text which you will paste into your database use Notepad, WritePad or Write and be sure to store the file as a Text, ASCII, or Unicode Document. Don't use Text MS-DOS format
.
-- not necessarily stoned... just beautiful.
|
Tyberius Prime
Maniac (V) Mad Scientist with FinglongersFrom: Germany Insane since: Sep 2001
|
posted 12-02-2007 12:15
In the feature, copy text into notepad before adding it to Wordpress.
As for your tables: if you can't do search and replace your're
pretty much down to "fix by hand".
As a coder, you could whip up a dictionary approach that makes
an educated guess at what character to use... but it's not going to
be 100% reliable.
Are you sure your're opening your exports as unicode - perhaps
the trouble is simply your phpmyadmin declaring the data to be
non-unicode, and that leading to these strange characters in the browser...
So long,
->Tyberius Prime
|
SleepingWolf
Paranoid (IV) InmateFrom: Insane since: Jul 2006
|
posted 12-02-2007 18:05
Thanks guys - yep, don't worry I will be cutting and pasting from Word into notepad first , and then from notepad to WordPress. I was aware of the junk Word adds to pure text but just didn't think of the implications.
As for the restore - I did find this plugin - I'll keep it for that sad day when I need to restore the database.
I'm going to install some form of WAMP and test it locally beforehand. Worst case scenario will be search and replace.
http://g30rg3x.com/utf8-database-converter/
Honestly, I don't like the idea of the plugin, would rather do it manually, but I will test it and hang on to it.
This is quite a lesson learned. The sad part, is that I've many times seen what MSWord will do to characters when using Outlook, which is why I invariably paste to notepad first and then Outlook when using Word text. Did not practice what I preach for my blog though.
quote:
Tyberius Prime said:
.... and that leading to these strange characters in the browser...So long,->Tyberius Prime
TP: The browser is clean. It's the backup .sql text file which is dirty - and not only the exported data, the tables themselves are dirty.
If I go into the WordPress post, delete a quotation mark coming from MS Word paste, retype the quotation mark, and then refresh the mysql table, I will see the extended characters disappear from the tables - confirming this is a MS artifact. As far as I can tell, everything else is ok - both WP and the Export show UTF.
If I were to restore the blog now, from the .sql, then the browser would display all the crap.
Nature & Travel Photography
Visit the Sleeping Wolves
(Edited by SleepingWolf on 12-02-2007 18:10)
|
SleepingWolf
Paranoid (IV) InmateFrom: Insane since: Jul 2006
|
posted 03-29-2008 18:32
The day has come where I need to restore my database.
I solved the strange character problem, so I'm posting this an update.
The solution is very simple, open the .sql backup in notepad and save in ANSI, not UTF.
That's all there is to it. Checked in Xampp. Perfect. Should work live as well.
Nature & Travel Photography
Main Entrance
|
hyperbole
Paranoid (IV) InmateFrom: Madison, Indiana Insane since: Aug 2000
|
posted 03-31-2008 18:37
I'm glad to hear you got it fixed.
.
-- not necessarily stoned... just beautiful.
|
SleepingWolf
Paranoid (IV) InmateFrom: Insane since: Jul 2006
|
posted 04-03-2008 01:57
Yep..the only detail I left out, when I did the import I had to change the default UTF to latin..
Nature & Travel Photography
Main Entrance
|
Stackelberg
Bipolar (III) InmateFrom: Insane since: Jul 2006
|
posted 05-16-2008 01:55
Hey - I'm new to SQL so please forgive any seemingly idiotic comments. If you change the .sql format to ANSI and restore, will future entries support the special characters? Right now I think I have the same problem that you did, but I'm not entirely sure: I need to display odd characters, é. etc., as entered through a simple CMS form, but the database will not spit them out properly.
Thanks
|
hyperbole
Paranoid (IV) InmateFrom: Madison, Indiana Insane since: Aug 2000
|
posted 05-17-2008 00:24
Stakelberg,
The answer to your question is "It depends". If all the spcial charactes you're using are in the first 256, it may work properly. From what you said it sounds as if they are not because you're getting back corrupted characters when you pull the information out of the db. So, you probably need to be storing the info as utf-8.
.
-- not necessarily stoned... just beautiful.
|
Stackelberg
Bipolar (III) InmateFrom: Insane since: Jul 2006
|
posted 05-17-2008 19:53
So is that just a matter of changing the collation for specific tables/fields/etc. to utf-8? Sorry if this seems obvious, but again, I'm a n00b
|
hyperbole
Paranoid (IV) InmateFrom: Madison, Indiana Insane since: Aug 2000
|
posted 05-18-2008 21:11
You need to set the colation for the table before you load the data into. Depending on how you load the data, you may also need to set the collation for the database connection. I believe you can set the character set for the entire database. You may also set the colation on a table by table basis. However, if you set each table to a different character set, you need to be very carefule when you load the data into the table. It's very easy to end up with corrupted data with different character sets for each table.
Don't worry about asking questions with seemingly obvious answers. We were all noobs once.
.
-- not necessarily stoned... just beautiful.
(Edited by hyperbole on 05-18-2008 21:12)
|
Stackelberg
Bipolar (III) InmateFrom: Insane since: Jul 2006
|
posted 06-06-2008 04:34
Hi again,
I've changed the database to utf-8 and imported the database data. Unfortunately, I'm still getting weird characters spat out in the output xHTML. Any ideas? The database is involved to permit the use of a basic CMS I've put together.
Thanks
|
hyperbole
Paranoid (IV) InmateFrom: Madison, Indiana Insane since: Aug 2000
|
posted 06-06-2008 15:39
How are you putting the data into the database? Converting the database to utf-8 with bad data already in it will not convert the bad data to utf-8. You need to be using a program that can convert the data you have to utf-8 as it inserts it into the database. The same is true for extraction. If the database is handing utf-8 to a program that is not expecting utf-8 you will see 'wierd characters' in the output.
.
-- not necessarily stoned... just beautiful.
|
Stackelberg
Bipolar (III) InmateFrom: Insane since: Jul 2006
|
posted 06-19-2008 21:44
K so I've been playing with this for a while, and I still can't seem to get it right. Here's what I'm doing:
I've got a database table field with some characters in it. I'm just using a basic PHP-driven SQL query to output the content of that field to an xHTML page with <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Then, I run this query through phpMyAdmin:
ALTER DATABASE db_biotron COLLATE utf8_general_ci;
ALTER TABLE tbl_about COLLATE utf8_general_ci;
And I ensure that the field is utf8_general_ci as well.
So I've set everything I can find to utf8_general_ci - the database, the table, the field. As mentioned above, if the data is corrupt, the 'weird characters' won't be corrected. So I go into the field via phpMyAdmin and re-enter the data, and I enter an "é" in the field.
Then I go back to the xHTML and it still outputs as two question marks in FF, and some sort of bizarre character in IE.
Any ideas??
Thanks everyone
(Edited by Stackelberg on 06-19-2008 21:45)
|
hyperbole
Paranoid (IV) InmateFrom: Madison, Indiana Insane since: Aug 2000
|
posted 06-21-2008 00:24
Sorry, too many variables. I've played a little with phpAdmin, but I'm not really and expert. I'm not sure how it would modify a character it's sticking into the table.
It looks as if you're working to eliminate as many misbehaving elements as possible. That''s a good thing. I think at this point I would write a small PHP script to display a string in utf-8 and see what the output looks like. It would also be useful to decompose the utf-8 string into binary or hex values.
Then write a PHP script to stick a utf-8 string into the database and modify the previous script to get the string from the database and display it.
.
-- not necessarily stoned... just beautiful.
|
SleepingWolf
Paranoid (IV) InmateFrom: Insane since: Jul 2006
|
posted 09-26-2008 20:27
I'm kinda late on this thread but I'll share my practical experience. My 2 cents.
If the database has crap in it, strange characters, they should be cleaned up in Notepad (search and replace) or try the "import as ANSI" when Importing. I also use special characters such as french accents and doing that preserved them. I got exactly what I needed - these characters are part of ANSI which should not be confused with ASCII.
Once the database is clean, make sure not to use software like Word that gives you binary junk. I think copying from a browser is ok, but generally I either type directly in my CMS or cut and paste from notepad to ensure i have pure text.
As for the database itself, if you are using a CMS like Wordpress, just stick to its defaults. If you are custom developing your own application, some variation of UTF should be fine (I guess).
Nature & Travel Photography
Main Entrance
|
hyperbole
Paranoid (IV) InmateFrom: Madison, Indiana Insane since: Aug 2000
|
posted 09-29-2008 23:18
Thanks for the update SleepingWolf. If you have the character-encoding in the browser set to the same encoding you're using in the database, you should be able to copy from the browser to the database.
.
-- not necessarily stoned... just beautiful.
|