Programming the TextEye: Project Overview

As far as programming projects go, this one is rather special: apart from computer programming, it also involves electronic hardware design and implementation as well as some engineering and design, and - as it turned out - a lot of additional learning...

I've already written about how the project came to be, along with progress reports, on the TextEye project page on Hackaday.io, but I still want to give you a summary here - just read on...

What is it?

The idea behind the TextEye is to create a small, portable device, using open source hard- and software, which can act as a reading help for people with visual impairments of any kind. It should be relatively cheap to build, and also function without having to rely on mobile internet access.

In the end, the user should be able to point the TextEye in the direction of a sign, a menu card or something similar, press a button and - after a few seconds of processing - be hearing the text spoken from the device.

What is involved?

From the technical side, we first need a hardware plattform that can provide the following functionality:

take a picture after pressing a button
process the picture (picture optimization, optical character recognition etc.)
provide sound output
run from a rechargeable battery
allow temporary storage of the picture
allow running custom software
if possible, allow adding further hardware for improvement of the device later on

The selected hardware should have a custom casing for secure transportation and easy usage. This has to be designed and build as well.

On the software side, we need software that covers the following functions, either by combining different programs as part of a workflow, or by integrating everything in one monolithic application:

read status data from buttons
start other applications or use their API functions
read image files
perform image optimization like denoising, sharpening, adding more contrast etc.
convert image file formats if necessary
run an optical character recognition for an image
(possibly) clean up improperly recognized text
do a text-to-speech conversion for any text, using the sound hardware
use text-to-speech for workflow status information

As you can see, this is not a trivial thing to do.

Desired Workflow

The final hard- and software combination which makes up the TextEye should mainly realize the following workflow in order to do what it should do:

Wait for a press of the main trigger button.
After the trigger has been pressed, take a picture.
Pre-process the picture to optimize it for optical character recognition.
Perform the optical character recognition and extract text from the picture.
Use the extracted text as input for the text-to-speech conversion.
Send the speech output to the speaker or connected headphones.
Go back to step 1.

This is the basic operation of the TextEye. Apart from that, additional features like volume control for the speech output, integrated lighting for better pictures and similar things can be added, either as additional workflow loops (e.g. for volume control) or as part of the main workflow (like turning on an additional light befor taking the picture and shutting it off afterwards).

How this project was conceived

If you wonder how I came up with the idea for the TextEye in the first place, here is the story:

Back at the end of February 2016 I watched a new episode of Adafruit's "From the Desk of LadyAda" which I had been following for some months (by describing to the Adafruit YouTube channel). This time Limor ("LadyAda") presented the Raspberry Pi Zero contest which was created as a cooperation between Adafruit and Hackaday (specifically Hackaday.io). You can watch the video below:

In the following days, several follow-up videos were broadcast, showing how to implement some simple features and projects using the Raspberry Pi Zero. At that time, the Raspberry Pi Zero (version 1.0) had only come out roughly two months before. Since the earlier Raspberry Pi Models A, A+, B, B+ and 2 had already become popular within the "maker" community all around the world, the smaller and cheaper Zero board was already in high demand. So high in fact that the Pi foundation could not keep up with the production, and board vendors were constantly out of stock.

In case you are not familiar with it: the Raspberry Pi boards are single-board computers created by the Raspberry Pi foundation.

Originally these mini-computers - which usually run a custom version of Linux and only need a 5 volt power supply, a USB keyboard and mouse, and a TV or monitor with HDMI port and cable for standard usage - were created for the education sector, specifically schools, mainly to teach programming.

Thanks to the inclusion of programmable "GPIO" connectors to which analog and digital electronic components can be connected, the Raspberry Pi boards have become very popular in the "maker" community where they are used to create all sorts of electronics projects. The low price of the boards is also an important aspect, and the Raspberry Pi Zero boards are the cheapest option so far.

The Raspberry Pi Zero contest presented a decent chance of getting one of the 10 Raspberry Pi Zero boards by simply coming up and properly documenting a project idea using this board.

While I thought about possible project options, pondering other Raspberry Pi projects I had seen earlier, one idea suddenly popped up in my mind: what about a mobile device with a small, single character Braille display that would allow blind people to read text from ebooks, the news or other sources on the go?

I did a quick websearch, followed by some thinking. In the end, I came to the conclusion that the design and build of a usable Braille display - which needs mechanical and electronic components - is pretty tricky, and I did not feel I could do this within a decent timeframe.

Then I remembered two other things:

the Raspberry Pi based "Pipboy 3000" project from the Ruiz brothers, documented in the Adafruit Learning System, which used the open source "Festival" text-to-speech software for audio output along with some electronics, and
a news post from the Heise newsticker which mentioned the open source optical character recognition (OCR) software "Tesseract" - which turned out to be available for installation on the default "Raspbian" operating system of the Raspberry Pi computers.

Since I also knew that you could connect a USB webcam to a Raspberry Pi and take still pictures from it using the open source "fswebcam" software, and that it was possible to add some additional hardware for audio output to the Raspberry Pi Zero, suddenly everything seemed to fall into place - and the idea for the TextEye was born:

use the camera to take a picture,
optimize the picture,
analyze the picture using the OCR software
and finally read out the text found in the picture with the text-to-speech software.

It was clear that this was not going to be easy to implement, even with the main hard- and software-components already being availabe, but it seemed doable given enough time.

And for the Raspberry Pi Zerocontest, I only needed to document the idea properly. It was not necessary to turn out a working prototype or final product for this.

Slow progress

While the documentation of the idea, related research and getting hold of the necessary hard- and software did not take too long, working on the actual implementation took a lot of time. Due to another big project at my daytime job that used up a lot of my energy and time, I did not often get around to working on the TextEye.

I did a good deal of initial testing of the software components, trying out a manually operated workflow before diving into the development of a control software to implement an automatic workflow.

When I realized that the OCR software struggled with test pictures taken by the USB webcam, I did a lot of additional testing in order to see if the picture quality could be enhanced by using different parameter settings for taking the pictures or adding further image optimization steps by performing additional image pre-processing.

Progress stayed slow, both due to a lack of time as well as seemingly growing OCR problems.

Nevertheless, I went on working on the project as the idea was simply too good in order to just drop it.

Enter the contest(s)...

The work I did put in resulted in winning one ot the Raspberry Pi Zero boards - which was a nice success.

Shortly after the Raspberry Pi Zero contest ended, Sophi, one of the Hackaday.io community managers, contacted me. She encouraged me to enter the project into the Hackaday Prize 2016 contest.

After some thinking, I accepted and added the project to the list of entries for the Hackaday Prize 2016. I also entered it into the "Assistive Technology" sub-contest later on in the same year.

Ongoing testing and prototyping along with documentation on the project page led to additional recognition and successes along the way, e.g. a bit of seed money which I spend to buy additional components and tools for the project.

While the TextEye project was selected as one of the top 100 projects and thus was in the running for the main prize of the Hackaday 2016 Prize, it did not end up on top of the list - there were just too many other great projects, and a lot were much further along or even completely finished before the end of the competition.

Nevertheless, it was great to be participating in all of that.

Outlook

Right now, I have a simple software prototype running, and need to test out the breadboard prototype with that. I also have started to write a more sophisticated version of the software, with a more modular design that should make it easier to have additional people work on different areas of the software in the future.

For the next parts of the article series about this project, I am planning to give you an overview of the software and provide some insight into the development, pointing out some problems I found and the solutions I came up with.

So if you'd like to know more about that, keep coming back to this site and consider adding the RSS feed of this site to your favourite RSS reader - this will keep you informed about new articles.

You can get the RSS feed by simply adding the URL to the ACP site blog to the subscription list of your reader.

Bye for now...

Share this page:

What’s this?

Facebook Twitter Pinterest Tumblr Reddit WhatsApp

Enjoy this page? Please pay it forward. Here's how...

Would you prefer to share this page with others by linking to it?

Click on the HTML link code below.
Copy and paste it, adding a note of your own, into your blog, a Web page, forums, a blog comment, your Facebook account, or anywhere that someone would find this page valuable.