Meet Budd-E, our AI-powered office robot
Advancements in AI techniques, especially in machine learning and perception, have greatly enhanced the capabilities of robots, leading to new developments in areas such as autonomous vehicles, robotic surgery, and service robots. Particularly, we view the recent emergence of LLM services as a big potential breakthrough for human-machine interaction in these areas. Budd-E (see Figure 1), our proof-of-concept office service robot, represents a humble yet intriguing application of these advances. More concretely, Budd-E is a four-wheel drive service robot that can be controlled via Wi-Fi through any mobile device or a computer. On top of the traditional manual control options, much like a small remote control car, we have extended Budd-E with a number of AI-powered capabilities, leveraging various AI services from Amazon Polly (AWS) to GPT-4/Vision (OpenAI), that enable users to interact and control it in new and very unique ways, as we’ll show next.
Execute arbitrary task
Users can input complex tasks as unstructured text using the UI, which Budd-E can understand using GPT-4 and execute. This opens up a completely new way to control Budd-E, by simply telling it what to do. As long as the given task can be broken down to steps that are supported by the underlying hardware (e.g. LEDs for light, buzzer for sound, etc.), Budd-E is happy to oblige! Below we showcase a few examples:
- Input: Move forward half a meter, look around in all directions then return to your original position.
move_look.mp4
- Input: Move forward half a meter, turn right, then move back half a meter.
move_turn_back.mp4
- Input: Move around in a circle (half a meter).
mov_circle.mp4
- Input: Make two circles without moving.
full_circle_stationary.mp4
- Input: Make an SOS sound signal in MORSE code.
sos_sound.mp4
- Input: Turn your lights on in Christmas fashion.
lights_xmas.mp4
Describe scene
Budd-E can also describe in text and audio (accessibility) what it sees through its camera, with a remarkably accurate eye for detail by leveraging OpenAI’s Vision API (see Figure 2):
The output reads as follows:
“A relaxed German Shepherd puppy lies on a tile floor in a cozy room with a green exercise ball, a pair of small decorative mushrooms, and a photo on a cabinet in the background.”
That is indeed a one-and-a-half year old German Shepherd puppy, relaxed yet slightly alert of Budd-E’s presence.
Text-to-Speech
Users can type any English text to Budd-E through the interface, which it will then convert to audio using Amazon Polly and play through its USB-speaker for everyone to hear. This capability has various fun office applications, such as alerting everyone that it’s lunch time, but is also especially useful in various real-world use-cases, such as alerting people to evacuate a building in the event of a hazard, as shown in the video below.
Input: Alert! There is a fire hazard in this area. Please evacuate through the exit door.
Closing summary
Applications of the recent emergence of LLM services have primarily focused on how users interact with software and information, yet less attention has gone to potential breakthroughs when interfacing with hardware itself, e.g. in robotics. Curious as we are, we have learned a lot and had tons of fun building Budd-E as we explored the many possibilities of combining AI with robotics. From computer vision to speech and interpreting human text, we’re very excited to extend Budd-E in new creative ways, adding more brains and more muscle down the road. Keep an eye out for our upcoming articles, where we'll dive deeper into the technical intricacies of this journey.
Interested to hear more about our approach? Never hesitate to reach out at hello@panenco.com.
See also
Are you looking for an entrepreneurial digital partner? Reach out to hello@panenco.com or schedule a call