Windows-1252 charset codes
The following are some Windows-1252 (ISO 8859-1) charset codes I’ve seen in legacy HTML documents created in Word. The replacement suggestions are for conversion to UTF-8 characters available on the keyboard. Alternatively, you may want to use HTML character entities.
\x85 (ellipsis …) replace with three dots …
\x91 (open curly quote ‘) replace with normal quote '
\x92 (close curly quote ’) replace with normal quote '
\x93 (open curly double quote “) replace with normal quote "
\x94 (close curly double quote ”) replace with normal quote "
\x96 (ndash –) replace with normal -
\x97 (mdash —) replace with normal -
\xa0 (NBSP) replace with normal space
\xa9 (copyright symbol ©) replace with (C)
\xad (soft hyphen) remove it
\xae (registered ®) replace with (R)
\xb7 (dot ·) replace with an asterix *
You can use your favorite editor to search using regular expression sequence above, and replace.
Alternatively, use GNU sed to automate file search and replace
gsed -i "s/\x94/'/g" file
This replaces all occurences of character with code 0x94 with a quote.
gsed is GNU sed on macOS.
-i does in place update to the file.