I have been working on a robot that uses my ARTOS real time operating system for robot and embedded controllers. The original idea was to have a single microcomputer for the robot do everything. It would control all of the motors, handle sound generation, sensors, speech and even vision. I really didn’t want to design my own microcomputer, (actually I did but realized it would take too long and become a project of its own) so I went ahead and looked at several COTS microcontrollers.
I wanted a controller that was based on an ARM 9 or better, and has motor control, sound generation and everything else I would need for my robot already built in. I looked at solutions from Gumstix, Tin Can Tools, Virtual Cogs and Cogent.
The Gumstix controllers are inexpensive, have Xscale processors and lots of peripherals specific to robotics. They would be a good choice.
The Cogent has nice processors and I love the SOM SODIMM design of their boards but they don’t offer much in the way of peripherals set up for robotics and they are a little pricey in single quantity.
And finally there is Virtual Cogs. They have an ARM 7 based robot motor controller called the Robot COG and a separate ARM 9 based processor board called the COG. I like having the separate motor control, mostly to isolate those high current circuits from the main processor.
So I decided to go with the Virtual Cogs solution this time. The decision is based on a combination of price, features, speed and flexibility. Of course as I get further along I may change my mind and go with the Gumstix or I may decide to design my own interface board and use one of Cogent controllers. We’ll just have to see how it goes.
The next step was to install my ARTOS operating system and test it on the COGS board. I also had to test the functionality of all of the peripherals. I will leave setting up the Robot COG for another entry.
The installation went very well and of course i found a few things that I needed to change. But once I was done it was nice having the board display some graphics, play back a wav file and still respond to commands typed through a serial terminal.
The next step is to install some robot applications. I want several tasks running at once on this board.
The first is a survival task. I wrote a task that constantly monitors the battery capacity and how far away the robot is from its recharging station. No matter what it is doing, when the battery capacity falls below a certain point, the robot will abort all tasks and make its way back to the charger to recharge its battery. A simple survival routine.
In the future I will modify the survival routine to allow exceptions. It would not make any sense to have the robot abort a task such as carrying someone from a burning building just to go recharge its batteries and end up getting consumed in the fire anyway. It kind of sounds like I will be implementing Asimov’s Three Laws of Robotics.
The second task is the motor control program to allow the robot to move, which we will discuss in another entry.
Then comes a series of tasks to handle sensor input and processing.
The first one got me to thinking about sensor input in general and how to handle sensory input like sound for voice recognition, video for visual response, and even things like touch or smell or text processing.
I have been thinking about this off and on for years and I have come up with what I think is a good solution. I like to call it “Pattern Matching”. I know that this phrase has its own connotations but I have my own idea of what it means which we’ll get into shortly.
But let me tell you where my thinking on this came from.
When we developed the Gemini robot in the 1980s we wanted speech recognition to be built in. The idea of typing all of the commands for controlling the robot didn’t seem practical and the movie robots all interacted with humans by voice, so we needed speech recognition. And we wanted the best type; speaker independent, continuous, speech recognition.
Of course the only systems available back then ran on mainframes and took a lot of processing power. So we had to settle for something that would run on a microprocessor with reasonable performance. We chose to implement speaker dependent, phrase recognition.
The way it worked was to sample sound for a fixed period of time, in our case 3 seconds. We would count zero crossings of the audio waveform for shorter periods called bins during the 3 second sample and store these numbers as a template.
Then whenever the robot heard a sound it would look through the stored templates for a match. To get it to respond to specific commands each speaker had to train the robot to the sounds and then tell it what they meant.
So to teach the robot to stop on command, we wrote a routine that had the robot ask the user to say the word ‘stop’ 3 times. Each time the robot would store the template with the sound it heard. Then it would average the templates together to get a more generalized template. Now the robot knew what the word ‘stop’ sounded like and what it meant. We went through the same steps for a couple dozen commands so training the robot to your voice took a little bit of time. But once done, the robot was pretty darn accurate in recognizing those particular words or phrases.
This is what I mean when I say “pattern matching”. You break any data set into smaller sets or patterns, and then compare them to others to see if the patterns match.
And I found that this technique is used in many areas. Finger print matching is done in the same way. Instead of trying to match every line and swirl in two finger prints, you just sample at several points on the finger print and see if they match. If you sample enough points then you can have pretty high confidence that they are the same finger print.
Facial recognition works the same way. You look at specific features on the face such as length of the nose, spacing of the eyes, lip length, etc and see if they match. You don’t have to do a pixel by pixel comparison to get a pretty good match. And the more points you compare the higher probability of matching a face to a stored database.
So based on this I have broken the steps of pattern matching into some steps that would work whether you are doing speech recognition, visual, touch or smell processing.
The steps are; parse, pattern, search and match.
Parsing is similar to text parsing. To parse text, you first use whitespace to break the text into words. To parse sound, you would look for breaks in the audio to separate words and phrases. To isolate objects in a visual field you would look for edges or other changes in the data.
The next step is to find the patterns in the data. For a segment of audio data you might look at zero crossings, usually referred to as frequency data in every quarter second bin. For more resolution you might want tenth second bins. And you may want to look at amplitude or loudness. For visual data you might look at specific points in the data set and compare brightness, contrast and color.
Now that you have a pattern you would search through a database of previously stored patterns and see if any match with a high level of confidence. The search could take place using another processor or core.
Finally you have your answer. Either there was a pretty good match or a not so good match or no match.
I wrote another blog entry about pattern matching in “Multiple Cores and AI”. I think that pattern matching is the key to a form of artificial intelligence and possibly of learning by machines.
Down the road I will write more about tying this into a functional robot that can do what Gemini could do so many years ago.
I plan to release the source code for this robot at some point. Of course it will use the ARTOS operating system.
But if you can’t wait or want to see the source code for speech recognition from the Gemini robot you can download it from this site. Or you can buy a CD with the complete designs for Gemini including video of Gemini in action. But be warned the source code is all 6502 assembly language, but with plenty of comments.