Question for y’all: Book scanning

mxbees:

realsocialskills:

periegesisvoid:

realsocialskills:

signed-me-again:

realsocialskills:

I’m trying to figure out a good way to scan books.

Having electronic copies of books can be really helpful to possible for people with executive dysfunction. An electronic copy can be the difference between being able to read something and not being able to read it.

Problem is, most books are only available in print, and scanning print books is fairly difficult. (Especially if they are paperbacks).

I don’t know much about good ways to scan books. I’m hoping some of y’all do.

Standard flatbed scanners are not very good for scanning books — it’s hard to capture the whole page, and easy to damage the book. 

I’ve heard that if you’ve willing to destroy the book, you can scan quickly by unbinding it and using an automated document scanner. I haven’t tried it, and I’m worried about losing pages. (Also, many/most of the books I need to scan are out-of-print library books that I can’t destroy, and they’re not easy to obtain.)

There are scanners specifically designed for scanning books, but they are very expensive. (Thousands of dollars).

Some people have created DIY instructions for building a book scanner, but that looks hard to build and still fairly expensive, and it’s hard for me to tell how viable that strategy is.

There are also very good iPhone apps for scanning (I use Scanner Pro). Unfortunately, it’s hard to hold the book and the phone in the right position, and it’s not very practical for scanning more than a few pages.

Do any of y’all have experience with scanning whole books (or whole libraries)? What works? What doesn’t work? 

signed-me-again said:

To be honest, scanning them can only go so far. My inclination would be to try to get them typed. Low-tech enough. High-labor – but I’d work on that for free, and I doubt I’m the only one. Getting pictures high enough definition for an ebook might actually be harder than finding people willing to type, and the ebook wouldn’t be screen-readable anyway.

For the typing process, you’ll need a stand. In fact, you’ll need a stand for scanning, too. The stand would look something like this.

When I was scanning at my local synagogue, we had an even cheaper stand–we borrowed a pulpit from the sanctuary and I used a paperweight sandbag to hold the book open. The scanning itself was done with an iPad using a normal photo app. But I was only doing cover/title/copyright pages, and it was exhausting work. You really want a good stand for trying to scan or type an entire book.

I wish you luck in getting the books digitized. Out-of-print paperbacks are a sad, sad thing.

realsocialskills said:

Thank you for the suggestions; I think for my purposes a different approach would probably be better..

I have some scans of paperback books that I find very useable, so I know that it’s possible to accomplish. I just don’t know how to do it.

When I need OCR from scans, I’ve found that Abbyyy FineReader OCR software works well for that purpose.

I want to preserve formatting; I don’t just want the text. I can see the formatting, and it’s often important.

Especially for things like out of print Hebrew-English prayer books. (And I think I would have a lot of trouble finding someone who is both capable of typing Hebrew with vowels and willing to do so for free).

I definitely think a stand is needed; I’m not sure one that holds the book all the way open will work for paperbacks though. Have you found that it does?

periegesisvoid said:

Most normal copier-style scanners + adobe can do ocr

realsocialskills said:

OCR isn’t the issue — I have really good OCR software. I use Abbyy FineReader for text recognition. (It supports more languages than Adobe and has a few other advantages.).

The issue is getting good images in the first place. Flatbed scanners aren’t very good for scanning books. They work ok for hardbacks if the text isn’t too close to the margins. But for paperbacks or books with text close to the margins, it’s really hard to get images that contain all the text. It’s often completely impossible to capture all of the text without damaging the book.

mxbees said:

So. This sort of thing is kind of my job (ie digitization and/or digital preservation). The two machines I’ve used for scanning rare books have been the treventus and indus ones. You’ll note that the treventus is similar to the DIY solution you linked to.

I do want to say that both were used to digitize rare books (these are old books in delicate condition with few copies in existence). The treventus is more expensive and much more gentle. But the indus scanner was good and gentle too.

I think trying to follow the design principles behind the indus might be a good way for you to attempt this.

You could look into an overhead scanner like this Fujitsu one, which is still quite expensive at $800 USD. But there are cheaper models. I can’t vouch for any of these scanners but I think its the most promising direction for you to look.

Additionally, one part of the indus scanner which solved the problem you mention about the gutters and books, is you could add to this set up by getting a clear pane of glass that is reasonably heavy. This way, you use the glass you squash the book flat without doing a great deal of damage (you might hurt the spine somewhat but it’ll depend on how heavy the glass is).

One thing to look out since you want to ensure you have a high quality image, is making sure you get a scanner capable of scanning at 600 dpi (which the cheap one I linked to does not, it says its 5 megapixels but this isn’t what you want).

Another alternative to getting an overhead scanner is to get a high quality digital camera and a tripod. This way you can mount the camera on the tripod and point it down at your book. This has the benefit of being somewhat portable (depending). As long as you have something to ensure the book is flat (like the pane of glass). This solution also means that you can use the camera for more than one thing (especially if you can’t find a scanner that is an actual scanner… a lot of the overhead ‘scanners’ I’m seeing in my amazon search are actually just digital cameras mounted on a tripod. And not particularly good ones).

Um… This is long enough for now. Feel free to ask more questions (I love talking about this stuff, btw).

realsocialskills said:

Thank you!