<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Stefan Sobek Blog &#187; pdf</title>
	<atom:link href="http://www.sobek.info/blog/tag/pdf/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.sobek.info/blog</link>
	<description>Writing about IT, Software Engineering, sports and other stuff</description>
	<lastBuildDate>Mon, 24 Jan 2011 12:50:08 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>OCR under linux</title>
		<link>http://www.sobek.info/blog/2010/02/08/ocr-under-linux/</link>
		<comments>http://www.sobek.info/blog/2010/02/08/ocr-under-linux/#comments</comments>
		<pubDate>Mon, 08 Feb 2010 09:04:40 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Bash]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Scripting]]></category>
		<category><![CDATA[ocr]]></category>
		<category><![CDATA[pdf]]></category>
		<category><![CDATA[tif]]></category>

		<guid isPermaLink="false">http://www.sobek.info/blog/?p=60</guid>
		<description><![CDATA[Imagine you have a pdf-file you want to make ocr-recognition. Take a scenario where you want to automatically let your linux pc do the job, e.g. in a folder. 
I choose tesseract-ocr as ocr-programm. Easy to install and use and ok for my use. Unfortunately it takes only tif as input file type so that [...]]]></description>
			<content:encoded><![CDATA[<p>Imagine you have a pdf-file you want to make ocr-recognition. Take a scenario where you want to automatically let your linux pc do the job, e.g. in a folder. </p>
<p>I choose tesseract-ocr as ocr-programm. Easy to install and use and ok for my use. Unfortunately it takes only tif as input file type so that we have to convert the pdf to tif first.</p>
<p><strong>To create a tif file with Ghostscript  from pdf:</strong></p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p60code3'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table width="100%" ><tr id="p603"><td class="line_numbers"><pre>1
</pre></td><td class="code" id="p60code3"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">gs</span> <span style="color: #660033;">-q</span> <span style="color: #660033;">-r300</span> <span style="color: #660033;">-dBATCH</span> <span style="color: #660033;">-dNOPAUSE</span> <span style="color: #660033;">-sDEVICE</span>=tiff24nc  <span style="color: #660033;">-sOutputFile</span>=Dokument2.tif Dokument1.pdf</pre></td></tr></table></div>

<p><strong><br />
Now start OCR-recognition with tesseract-ocr (maybe you have to install it). </strong></p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p60code4'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table width="100%" ><tr id="p604"><td class="line_numbers"><pre>1
</pre></td><td class="code" id="p60code4"><pre class="bash" style="font-family:monospace;"> tesseract Dokument2.tif doc.txt <span style="color: #660033;">-l</span> deu</pre></td></tr></table></div>

<p>-l deu means &#8220;Deutsch&#8221; for German language recognition.<br />
-r300 means 300 DPI</p>
<p>Now it is easy to create a script which will automatically check a folder for new files and start ocr etc. </p>
<p>E.g. you can create a shell-script and start this shell script via cronjob. Maybe I will write an example here later. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.sobek.info/blog/2010/02/08/ocr-under-linux/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Create a pdf from manpages</title>
		<link>http://www.sobek.info/blog/2010/01/16/create-a-pdf-from-manpages/</link>
		<comments>http://www.sobek.info/blog/2010/01/16/create-a-pdf-from-manpages/#comments</comments>
		<pubDate>Sat, 16 Jan 2010 16:01:30 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Bash]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[man]]></category>
		<category><![CDATA[pdf]]></category>
		<category><![CDATA[Shell]]></category>

		<guid isPermaLink="false">http://www.sobek.info/blog/?p=222</guid>
		<description><![CDATA[If you want to printout or save info from manpages simply type:

?View Code BASH1
man -t man &#124; ps2pdf - &#62; man.pdf

]]></description>
			<content:encoded><![CDATA[<p>If you want to printout or save info from manpages simply type:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p222code6'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table width="100%" ><tr id="p2226"><td class="line_numbers"><pre>1
</pre></td><td class="code" id="p222code6"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">man</span> <span style="color: #660033;">-t</span> <span style="color: #c20cb9; font-weight: bold;">man</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">ps2pdf</span> - <span style="color: #000000; font-weight: bold;">&gt;</span> man.pdf</pre></td></tr></table></div>

]]></content:encoded>
			<wfw:commentRss>http://www.sobek.info/blog/2010/01/16/create-a-pdf-from-manpages/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

