Extracting Text From PDF Files 10

Automator

Extracting Text From PDF Files

Hey

I like the use of PDF’s, they work well and (usually) appear the same on ever computer. One of the advantages of PDF’s is the text embedded within the file, and the abilty to manipulate. Although many PDF readers have functions to copy and paste text, whats quicker than creating an Automator script to extract PDF text into a text file.

This process only take a couple of steps so read carefully.

1) Open Automator, found in Applications > Utilities

2) In Automator drag out “Get Selected Finder Items”, then “Extract PDF Text”. I recommend changing the output option to Rich Text.

3) Save the file. Either as an application or a file.

4) Test. Drag a PDF file onto the Automator file and let it do its work, after a short time you will have a text file with the extracted text.

The text file which contains the outputted text will have every single piece of text that is in the orginal PDF file. As a result it may contain random bits of text, but it wont take long to remove the bits you don’t need.


If you want to keep up with the latests post from Mac Tricks And Tips I recommend you subscribe to the RSS Feed.

10 Responses to “Extracting Text From PDF Files”

  1. 1

    For some reason, I don’t have the “Extract PDF Text” option. My OS is Tiger. Does that have something to do with it?

    Comment By Sam on December 11th, at 11:49 pm

  2. 2

    It might. I haven’t got Tiger to test it on.

    Comment By admin on December 12th, at 3:20 pm

  3. 3

    I also have Tiger, and also don’t have “Extract PDF Text”

    Comment By nobody on December 13th, at 2:04 am

  4. 4

    is there a way to convert PDF to pages or MS word using automator

    Comment By venkatesh Prasad J B on January 26th, at 7:55 pm

  5. 5

    I’m not sure off the top of my head, do a google search for it.

    Comment By admin on January 26th, at 9:18 pm

  6. 6

    Hey, indeed.

    Thanks for that terrifically useful tip. It works like a charm. I just converted over a thousand PDF files to text files in about 5 minutes. Outstanding!

    My default set-up in Automator (with Leopard OSX 10.5.5, using just the two steps you suggest) created duplicate text files for each PDF. I then checked the “Replace Existing Files” box in the “Extract PDF Text” step. That was counter-intuitive but did the trick. After that, I only got one text file per PDF and I was away.

    Hurrah.

    Comment By mollivan_jon on February 26th, at 6:44 pm

  7. 7

    Any idea how to save the text without all the returns?

    When the text is in columns in the pdf, all the line endings are saved as returns instead of flowing correctly…

    Comment By Bob Ericsson on February 26th, at 3:12 pm

  8. 8

    Thank you so much!

    Comment By Liz on March 11th, at 7:09 am

  9. 9

    It would be great if they had put which f***ing version of OSX they used. I don’t have the extrant PDF text in my OSX 10.4,,,

    Comment By Smart Ass on March 25th, at 5:05 pm

  10. 10

    It did not work for me I get a blank empty file.

    I am running OS X v. 10.5.8

    Comment By Jill on August 8th, at 9:41 pm

Leave a Reply

You can use: <a href> <h1 - h6> <acronym> <code> <em> <strike> <strong> <i> <b>