A blog by Devendra Tewari
The following are some Windows-1252 (ISO 8859-1) charset codes I’ve seen in legacy HTML documents created in Word. The replacement suggestions are for conversion to UTF-8 characters available on the keyboard. Alternatively, you may want to use HTML character entities.
\x85
(ellipsis …) replace with three dots …
\x91
(open curly quote ‘) replace with normal quote '
\x92
(close curly quote ’) replace with normal quote '
\x93
(open curly double quote “) replace with normal quote "
\x94
(close curly double quote ”) replace with normal quote "
\x96
(ndash –) replace with normal -
\x97
(mdash —) replace with normal -
\xa0
(NBSP) replace with normal space
\xa9
(copyright symbol ©) replace with (C)
\xad
(soft hyphen) remove it
\xae
(registered ®) replace with (R)
\xb7
(dot ·) replace with an asterix *
You can use your favorite editor to search using regular expression sequence above, and replace.
Alternatively, use GNU sed to automate file search and replace
gsed -i "s/\x94/'/g" file
This replaces all occurences of character with code 0x94 with a quote. gsed
is GNU sed on macOS. -i
does in place update to the file.