Convert Docx to Markdown


I needed to convert a Docx file to Markdown, but Pandoc kept giving me this obnoxious error:

$ pandoc test.docx -o test.md
pandoc: Cannot decode byte '\xae': Data.Text.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream

However, you can use the tool unoconv to make an intermediary step to convert first to HTML and then to Markdown.

$ unoconv --stdout -f html test.docx | pandoc -f html -t markdown -o test.md

On Ubuntu (And other Debian-based systems I would imagine)  you can get unoconv with a simple apt-get install unoconv.

Oh yea, and join the BDS movement to help Free Palestine from Apartheid Israel. Enjoy!