You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
A ruby gem to liberate content from Microsoft Word documents
Notifications You must be signed in to change notification settings
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Go to fileA Ruby gem to liberate content from the jail that is Word documents
Our default content publishing workflow is terribly broken. We've all been trained to make paper, yet today, content authored once is more commonly consumed in multiple formats, and rarely, if ever, does it embody physical form. Put another way, our go-to content authoring workflow remains relatively unchanged since it was conceived in the early 80s. I'm asked regularly by government employees — knowledge workers who fire up a desktop word processor as the first step to any project — for an automated pipeline to convert Microsoft Word documents to Markdown, the lingua franca of the internet, but as my recent foray into building just such a converter proves, it's not that simple. Markdown isn't just an alternative format. Markdown forces you to write for the web.Read more
gem install word-to-markdown
file = WordToMarkdown.new("/path/to/document.docx") => WordToMarkdown path="/path/to/document.docx"> file.to_s => "# Test\n\n This is a test" file.document.tree => Nokogiri Document>
$ w2m path/to/document.docx
Outputs the resulting markdown to stdout
Word-to-markdown requires soffice a command line interface to LibreOffice that works on Linux, Mac, and Windows. To install soffice, see the LibreOffice documentation.
script/cibuild
First, create the Gemfile.lock by installing the dependencies:
bundle install
Everything you need to run the executable locally:
docker-compose build docker-compose run --rm app bundle exec w2m --help docker-compose run --rm app bundle exec w2m test/fixtures/em.docx
Word-to-markdown-server contains a lightweight server for converting Word Documents as a service. A live version runs at word2md.com.
A ruby gem to liberate content from Microsoft Word documents