Convert Docx to Markdown


I needed to convert a Docx file to Markdown, but Pandoc kept giving me this obnoxious error:

$ pandoc test.docx -o test.md
pandoc: Cannot decode byte '\xae': Data.Text.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream

However, you can use the tool unoconv to make an intermediary step to convert first to HTML and then to Markdown.

$ unoconv --stdout -f html test.docx | pandoc -f html -t markdown -o test.md

On Ubuntu (And other Debian-based systems I would imagine)  you can get unoconv with a simple apt-get install unoconv.

Oh yea, and join the BDS movement to help Free Palestine from Apartheid Israel. Enjoy!

Advertisements

About Nahraf
Providing interesting insight into the world of Economics, Theology, Computer Science and Social phenomena.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: