Thu, 29 Mar 2007

Maybe This Will Help Someone Else

OfficeHTMLIf someone sends you an “HTML” mail from Outlook, even Tidy will run away screaming unless you strip out some of the gunk manually before trying to fix it.

If it’s Quoted-Printable, you have a bit more work to do first [maybe this (web service) or this (sed script).], though you probably have even more work to do if the original document used a non-Western encoding. Not tested.

sed -e "s/\<o\:p\>/\<p\>/g" | sed -e "s/\<\/o\:p\>/\<\/p\>/g" | /usr/local/bin/tidy -c

broken into two sed invocations for readability’s (hah!) sake…

Of course, it’s all very brute-force, but usually good enough for government work.

:: 12:13
:: /tech/computers/os/all | [+]
::Comments (0)

Name:
E-mail:
URL:
Comment:
The Magic Word:
The two elements in water are hydrogen and ______




The camel died quite suddenly on the second day, and Selena fretted
sullenly and, buffing her already impeccable nails — not for the first
time since the journey begain — pondered snidely if this would dissolve
into a vignette of minor inconveniences like all the other holidays spent
with Basil.
— Winning sentence, 1983 Bulwer-Lytton bad fiction contest.