News 2020

July 2020

Live-Streamed Game Collects Sounds To Help Train Home-Based Artificial Intelligence

Virginia Alvino Young

From yawning to closing the fridge door, a lot of sounds occur within the home. Such sounds could be useful for home-based artificial intelligence applications, but training that AI requires a robust and diverse set of samples. A video game developed by Carnegie Mellon University researchers leverages live streaming to collect sound donations from players that will populate an open-source database. "The methods for developing machine-learning-based interaction have become so accessible that now it's about collecting the right kind of data to create devices that can do more than just listen to what we say," said Nikolas Martelaro, an assistant professor in the School of Computer Science's Human-Computer Interaction Institute (HCII.) "We want these devices to use all the sounds in our environment to act." "This data could be used to create extremely useful technologies," said Jessica Hammer, the Thomas and Lydia Moran Assistant Professor of Learning Science in the HCII and the Entertainment Technology Center. "For example, if AI can detect a loud thud coming from my daughter's room, it could wake me up. It can notify me if my dryer sounds different and I need to change the lint trap, or it can create an alert if it hears someone who can't stop coughing." Hammer and her team developed the game, "Rolling Rhapsody," specifically to be played on the live-streaming platform Twitch. The streamer controls a ball, which they must roll around to collect treasure scattered about a pirate stronghold. Viewers contribute to the game by collecting sounds from their homes using a mobile app. "When they submit sounds, they are donating them to the database for researchers to use, but those sounds are also used as a part of the game on the live stream, incentivizing viewers with rewards and recognition for collecting many sounds or unique sounds," Hammer said. "Twitch reaches populations that might not otherwise be engaged in this kind of social good," said Hammer of the platform, which has become even more popular during the current novel coronavirus pandemic. "And Twitch is already used for donations, which is the case with charity streams. We thought we could leverage that strong culture of generosity to say, 'It's OK if you don't have any money. You can donate sounds from your home to help science.'" Hammer noted that following successful live play-testing, a broader field test will be conducted later this summer. "We can use this as a proof of concept for a new kind of game experience that can result in ethical data collection from the home," Hammer said. Privacy is paramount, and all players and viewers must opt in and provide consent to upload sounds. Additional privacy measures have also been taken, including opportunities for viewers to redact sound files that may have accidentally captured something personal. They can delete submissions, choose to store sounds locally and withdraw their consent at any time. "We can collect data in a way that's fun and feels good for everybody involved," Hammer said. If the research team finds that their method collects different data than traditional crowd-sourced experiences, they can begin to study how this technique generalizes to other kinds of problems. "This research doesn't have to be limited to gathering audio data for the home. A simple extension is gathering other kinds of audio data. Then you can use the same game, just change the kinds of challenges you give the players," Hammer said. The research is sponsored by Philips Healthcare and Bosch, and is part of Polyphonic — a larger project that includes an application for sound labeling and validation, and an interface where researchers can view and download sounds.

CMU Team Trains Autonomous Drones Using Cross-Modal Simulated Data

Virginia Alvino Young

To fly autonomously, drones need to understand what they perceive in the environment and make decisions based on that information. A novel method developed by Carnegie Mellon University researchers allows drones to learn perception and action separately. The two-stage approach overcomes the "simulation-to-reality gap," and creates a way to safely deploy drones trained entirely on simulated data into real-world course navigation."Typically drones trained on even the best photorealistic simulated data will fail in the real world because the lighting, colors and textures are still too different to translate," said Rogerio Bonatti, a doctoral student in the School of Computer Science's Robotics Institute. "Our perception module is trained with two modalities to increase robustness against environmental variabilities."The first modality that helps train the drone's perception is image. The researchers used a photorealistic simulator to create an environment that included the drone, a soccer field, and red square gates raised off the ground and positioned randomly to create a track. They then built a large dataset of simulated images from thousands of randomly generated drone and gate configurations.The second modality needed for perception is knowing the gates' position and orientation in space, which the researchers accomplished using the dataset of simulated images.Teaching the model using multiple modalities reinforces a robust representation of the drone's experience, meaning it can understand the essence of the field and gates in a way that translates from simulation to reality. Compressing images to have fewer pixels aids this process. Learning from a low-dimensional representation allows the model to see through the visual noise in the real world and identify the gates.With perception learned, researchers deploy the drone within the simulation so it can learn its control policy — or how to physically move. In this case, it learns which velocity to apply as it navigates the course and encounters each gate. Because it's a simulated environment, a program can calculate the drone's optimal trajectory before deployment. This method provides an advantage over manually supervised learning using an expert operator, since real-world learning can be dangerous, time-consuming and expensive.The drone learns to navigate the course by going through training steps dictated by the researchers. Bonatti said he challenges specific agilities and directions the drone will need in the real world. "I make the drone turn to the left and to the right in different track shapes, which get harder as I add more noise. The robot is not learning to recreate going through any specific track. Rather, by strategically directing the simulated drone, it's learning all of the elements and types of movements to race autonomously," Bonatti said.Bonatti wants to push current technology to approach a human's ability to interpret environmental cues."Most of the work on autonomous drone racing so far has focused on engineering a system augmented with extra sensors and software with the sole aim of speed. Instead, we aimed to create a computational fabric, inspired by the function of a human brain, to map visual information to the correct control actions going through a latent representation," Bonatti said.But drone racing is just one possibility for this type of learning. The method of separating perception and control could be applied to many different tasks for artificial intelligence such as driving or cooking. While this model relies on images and positions to teach perception, other modalities like sounds and shapes could be used for efforts like identifying cars, wildlife or objectsContributing researchers to this work include Carnegie Mellon's Sebastian Scherer, and Ratnesh Madaan, Vibhav Vineet and Ashish Kapoor of the Microsoft Corporation. The paper, "Learning Visuomotor Policies for Aerial Navigation Using Cross-Modal Representations," has been accepted to the International Conference on Intelligent Robots and Systems (IROS) 2020. The paper's code is open-sourced and available for other researchers.

Which Way to the Fridge? Common Sense Helps Robots Navigate

Byron Spice

A robot travelling from point A to point B is more efficient if it understands that point A is the living room couch and point B is a refrigerator, even if it's in an unfamiliar place. That's the common sense idea behind a "semantic" navigation system developed by Carnegie Mellon University and Facebook AI Research (FAIR).That navigation system, called SemExp, last month won the Habitat ObjectNav Challenge during the virtual Computer Vision and Pattern Recognition conference, edging a team from Samsung Research China. It was the second consecutive first-place finish for the CMU team in the annual challenge.SemExp, or Goal-Oriented Semantic Exploration, uses machine learning to train a robot to recognize objects — knowing the difference between a kitchen table and an end table, for instance — and to understand where in a home such objects are likely to be found. This enables the system to think strategically about how to search for something, said Devendra S. Chaplot, a Ph.D. student in CMU's Machine Learning Department."Common sense says that if you're looking for a refrigerator, you'd better go to the kitchen," Chaplot said. Classical robotic navigation systems, by contrast, explore a space by building a map showing obstacles. The robot eventually gets to where it needs to go, but the route can be circuitous.Previous attempts to use machine learning to train semantic navigation systems have been hampered because they tend to memorize objects and their locations in specific environments. Not only are these environments complex, but the system often has difficulty generalizing what it has learned to different environments.Chaplot — working with FAIR's Dhiraj Gandhi, along with Abhinav Gupta, associate professor in the Robotics Institute, and Ruslan Salakhutdinov, professor in the Machine Learning Department — sidestepped that problem by making SemExp a modular system.The system uses its semantic insights to determine the best places to look for a specific object, Chaplot said. "Once you decide where to go, you can just use classical planning to get you there."This modular approach turns out to be efficient in several ways. The learning process can concentrate on relationships between objects and room layouts, rather than also learning route planning. The semantic reasoning determines the most efficient search strategy. Finally, classical navigation planning gets the robot where it needs to go as quickly as possible.Semantic navigation ultimately will make it easier for people to interact with robots, enabling them to simply tell the robot to fetch an item in a particular place, or give it directions such as "go to the second door on the left."

What Mourning the "Death" of a Robot Looks Like on Social Media

Virginia Alvino Young

"RIP Oppy" sounds like a condolence for a human, or at least a pet. But it's actually a phrase that was shared on social media about NASA's Opportunity rover project, which ceased communications from Mars in 2018. Elizabeth Carter is a project scientist in the Robotics Institute in Carnegie Mellon University's School of Computer Science. "I saw the social media response to Opportunity's mission officially ending, and people were posting all over Facebook and Twitter about how sad they were, and I was surprised by how similar it seemed to when a celebrity passes away," she said. To determine if average users could tell the difference between tweets about robots, humans, animals and objects, her research team presented a user group with samples of deidentified tweets about various "deaths." These included people like Mac Miller, animals like Grumpy Cat and robots like Opportunity and Jibo. The researchers found that people often had a difficult time discerning the subject type of robot-related tweets, especially when it came to Opportunity. "Oppy" tweets were mistaken for being about a human 63% of the time. The pronoun "you" was used in more than half of the sampled tweets about Opportunity. Among those, 72% were directed at the rover, with others directed at NASA and its scientists. Since Opportunity landed on Mars in 2004, many people have learned about the rover in school and followed its research findings. Carter speculates that, as was the case with her, the project inspired many people around the world. "It's nice that so many people cared so much about a research project that they took to social media to respond to its completion," she said. Carter said it illustrates the importance of educational programs and public outreach for science projects. Carter said there has been a lot of research in lab studies about how people anthropomorphize robots, but since many people don't have robots in their homes, there hasn't been much opportunity to see how people respond outside the lab. "It's hard to study these types of things out in the world, and this was a unique opportunity to at least see how people talk about robots in these circumstances," she said. "Death of a Robot: Social Media Reactions and Language Usage When a Robot Stops Operating" is co-written by Samantha Reig, Xiang Zhi Tan, Gierad Laput, Stephanie Rosenthal, and Aaron Steinfeld, all of Carnegie Mellon University. The paper was presented earlier this year at the ACM/IEEE International Conference on Human-Robot Interaction.

Transparent, Reflective Objects Now Within Grasp of Robots

Byron Spice

Kitchen robots are a popular vision of the future, but if a robot of today tries to grasp a kitchen staple such as a clear measuring cup or a shiny knife, it likely won't be able to. Transparent and reflective objects are the things of robot nightmares.Roboticists at Carnegie Mellon University, however, report success with a new technique they've developed for teaching robots to pick up these troublesome objects. The technique doesn't require fancy sensors, exhaustive training or human guidance, but relies primarily on a color camera. The researchers will present this new system during this summer's International Conference on Robotics and Automation virtual conference.David Held, an assistant professor in CMU's Robotics Institute, said depth cameras, which shine infrared light on an object to determine its shape, work well for identifying opaque objects. But infrared light passes right through clear objects and scatters off reflective surfaces. Thus, depth cameras can't calculate an accurate shape, resulting in largely flat or hole-riddled shapes for transparent and reflective objects.But a color camera can see transparent and reflective objects as well as opaque ones. So CMU scientists developed a color camera system to recognize shapes based on color. A standard camera can't measure shapes like a depth camera, but the researchers nevertheless were able to train the new system to imitate the depth system and implicitly infer shape to grasp objects. They did so using depth camera images of opaque objects paired with color images of those same objects.Once trained, the color camera system was applied to transparent and shiny objects. Based on those images, along with whatever scant information a depth camera could provide, the system could grasp these challenging objects with a high degree of success."We do sometimes miss," Held acknowledged, "but for the most part it did a pretty good job, much better than any previous system for grasping transparent or reflective objects."The system can't pick up transparent or reflective objects as efficiently as opaque objects, said Thomas Weng, a Ph.D. student in robotics. But it is far more successful than depth camera systems alone. And the multimodal transfer learning used to train the system was so effective that the color system proved almost as good as the depth camera system at picking up opaque objects."Our system not only can pick up individual transparent and reflective objects, but it can also grasp such objects in cluttered piles," he added.Other attempts at robotic grasping of transparent objects have relied on training systems based on exhaustively repeated attempted grasps — on the order of 800,000 attempts — or on expensive human labeling of objects.The CMU system uses a commercial RGB-D camera that's capable of both color images (RGB) and depth images (D). The system can use this single sensor to sort through recyclables or other collections of objects — some opaque, some transparent, some reflective.In addition to Held and Weng, the research team included Oliver Kroemer, assistant professor of robotics at CMU; Amith Pallankize, a senior at BITS Pilani in India; and Yimin Tang, a senior at ShanghaiTech. The National Science Foundation, Sony Corporation, the Office of Naval Research, Efort Intelligent Equipment Co. and ShanghaiTech supported this research.

Video Game Teaches Productive Civil Discourse and Overcoming Tribalism

Virginia Alvino Young

How can students learn to make their civil discourse more productive? One Carnegie Mellon University researcher proposes an AI-powered video game. The educational system targeted toward high schoolers adapts to students' specific values and can be used to measure — and in some cases reduce — the impact of bias. "Activities like debate club often reinforce the objective of 'winning' conversations rather than fostering democratic goals. Social media further polarizes, reinforcing information bubbles, biases and tribalism. It makes it hard to move forward," said Nicholas Diana who recently earned his Ph.D. from the School of Computer Science's Human-Computer Interaction Institute. He added that productive civil discourse is a skill that needs to be taught, modeled and practiced just like any other core subject. In his novel computer game, Persuasion Invasion, solo players encounter pacifist alien invaders that don't conquer through war, but instead sow discord and division. When the community can't agree on small issues, the aliens strike. Players assume the role of government agents who must unite communities. The game aims to provide students with a better understanding of the values that shape both their own beliefs and those of others, with opportunities to practice overcoming bias. "It is not about finding middle ground, changing beliefs or confusing civility with politeness. It's about finding common ground based on shared values," Diana said. During gameplay, players must choose the most persuasive argument for a particular computer-generated character. If the human player is biased, they'll select the argument that aligns with their own values rather than the other player's. Before gameplay, players take the Moral Foundations Theory Questionnaire, which rates how much a player values things like fairness, loyalty and empathy. With an estimate of each players' values, the game can infer their beliefs. It also uses machine learning and natural language processing to evaluate the values displayed in the text of each argument, and can predict which options will appeal most to players. "This allows instruction to adapt to player values. We can remind players that 'this option might be most appealing to you, but remember you're trying to appeal to the other person.'" To help understand the other characters, players can use tools such as "Social Media Snoop," which allows them to see that character's most recent status. "Conversation Reset" lets players step back to a point of agreement and try again. "Games are effective because they are immersive, and individual play may reduce the negative effects of social pressure or shyness," Diana said. "Players are free from social consequences and social influence. They can practice in a safe environment where they're not afraid to say the wrong thing, which builds confidence." The researchers deployed Persuasion Invasion in a study with high school social studies and English classes. More than a quarter of students continued to play outside class, and 82% thought it was an effective way to practice what they learned. Data showed that students improved their abilities to reduce tribalism and identify values. By defining and naming discourse moves, the skills became highly transferable. Diana found that skills advanced with practice. "Students said that they used these techniques on their parents, and they appreciated that they could really apply what they learned to their own lives," he said. Diana said he'd like to explore extending the game beyond high school students, and that it may be useful for all ages to play. "Beyond direct applications in civic technology, I believe this new kind of value-adaptive instruction also has implications for any system designed to mitigate bias or augment human reasoning."

Bonatti Receives Microsoft Research Dissertation Grant

Byron Spice

Rogerio Bonatti, a Ph.D. candidate in the Robotics Institute, is one of 10 students across North America who will receive Microsoft Research Dissertation Grants to support research for their Ph.D. thesis. Bonatti, who expects to complete his dissertation next year, has focused his research at the intersection of machine learning theory and motion planning. His dissertation is "Active Vision: Autonomous Aerial Cinematography With Learned Artistic Decision-Making." "I create methods for robust robot intelligence in real-world settings," Bonatti said. "My work has been deployed for multiple applications, ranging from autonomous cinematography with aerial vehicles all the way to drone racing." He interned at Microsoft Research last summer. He studied mechatronics engineering at the University of São Paulo, Brazil, and spent a year at Cornell University before beginning his graduate studies at CMU in 2016. Now in its fourth year, the Microsoft Research Dissertation Grant offers up to $25,000 to support the research of students nearing the completion of doctoral degrees at North American universities who are underrepresented in the field of computing. About 230 students submitted proposals this year, the most competitive group yet.  

CMU-Q Launches Virtual Computer Science Discovery Workshop

$story.metadata.author

Carnegie Mellon University in Qatar (CMU-Q) has launched its first virtual outreach program for high school students, using an online version of its popular Mindcraft program created in response to the COVID-19 pandemic. "We have re-invented our outreach strategy in the wake of the COVID-19 pandemic," said Khaled Harras, computer science program director at CMU-Q and co-director of the Hamad Bin Jassim Center for K-12 Computer Science Education (HBJ Center). "For Mindcraft Virtual, we revamped one of our workshops to become more online-friendly, and we fully developed a new robotics programming workshop specifically for the virtual experience." A collaboration between CMU-Q and the Jassim and Hamad Bin Jassim Charitable Foundation, the HBJ Center introduces school-aged children in Qatar to concepts and career paths in computer science. Mindcraft, one of the center's initiatives, was created for high school students with a wide range of computer science backgrounds, from those with no experience at all to those who have strong skills in programming. It has introduced nearly 5,000 students to computer science since it began in 2016. "Computer science is essential to many fields of study, as well as many existing and emerging industries," said Saeed Al-Hajri, the foundation's CEO. "With the pandemic and the move to remote work, we are seeing now more than ever how computing skills are critical for the next generation." Harras sees computer science as a crucial enabling catalyst for nearly every discipline, including those in the sciences, humanities, engineering and medicine. "Computational thinking is the new math of the 21st century," he said. "We hope that programs like Mindcraft will help young generations recognize this so they can remain competitive in future markets and economies," he said. "We feel the conceptual portion of Mindcraft is especially important," said Nour Tabet, outreach coordinator and the facilitator of both the in-person and online workshops. "A lot of computer science work is about thinking through problems. The robotics section is fun and practical, but the conceptual activities really give students a key perspective on computer science." When CMU-Q moved to remote teaching because of the pandemic, in-person Mindcraft workshops were also suspended. To go virtual, the Mindcraft team carefully assessed various online tools and the format of the experience. After months of research, writing and testing, they launched the first Mindcraft Virtual for 12 students in late June. Tabet invited two students who had attended previous in-person workshops to attend so they could compare the experiences. "I was so pleased that these students loved the virtual version. They of course missed being in person with the other students, but they really enjoyed the activities," she said. Mindcraft Virtual will continue throughout the summer, and Harras and Tabet anticipate the online sessions will continue until the pandemic restrictions are lifted. In fact, this milestone opens the door for greater future impact and reachability that will no longer be limited by physical presence at the workshop. Harras believes the Mindcraft experience is particularly welcome now, as pandemic restrictions are preventing many people from travelling. "Mindcraft Virtual turned out better than I had personally imagined," he said. "I thought the long online hours would be tiring, but the students wanted more. One student shared with us that ‘this was my best 2 days in a long time, especially during this pandemic situation.’ It was heartwarming.” To learn more about Mindcraft Virtual, visit the project's webpage.

Kaess Wins Inaugural RSS Test of Time Award

Byron Spice

Michael Kaess, associate research professor in the Robotics Institute, and Frank Dellaert, a Ph.D. alumnus of the School of Computer Science and a professor at Georgia Tech, are winners of the inaugural Robotics Science and Systems Test of Time Award. The award recognizes the highest impact papers presented at the RSS conference from at least 10 years ago — papers that changed thinking, identified new problems, or pioneered new approaches to robotic design and problem solving. Kaess and Dellaert were cited for a pair of papers from 2005 and 2006 — one by Dellaert and one co-authored by both — concerning simultaneous localization and mapping (SLAM). This is a method widely used in autonomous robots for constructing or updating a map of an unknown environment while keeping track of the robot's location within the map. They will present an award keynote during this year's RSS 2020 conference, which is being held virtually. Their talk, "Factor Graphs: Exploiting Structure in Robotics," will be at 1 p.m. EDT on Tuesday, July 14, and will be followed by a Test of Time panel session on the topic. Kaess received his Ph.D. in computer science at Georgia Tech with Dellaert as his adviser. He joined the Robotics Institute, where he is director of the Robot Perception Lab, in 2013 after working as a research scientist at MIT. His research is focused on mobile robot autonomy.

New System Combines Smartphone Videos To Create 4D Visualizations

Byron Spice

Researchers at Carnegie Mellon University have demonstrated that they can combine iPhone videos shot "in the wild" by separate cameras to create 4D visualizations that allow viewers to watch action from various angles, or even erase people or objects that temporarily block sight lines.Imagine a visualization of a wedding reception, where dancers can be seen from as many angles as there were cameras, and the tipsy guest who walked in front of the bridal party is nowhere to be seen.The videos can be shot independently from variety of vantage points, as might occur at a wedding or birthday celebration, said Aayush Bansal, a Ph.D. student in CMU's Robotics Institute. It also is possible to record actors in one setting and then insert them into another, he added."We are only limited by the number of cameras," Bansal said, with no upper limit on how many video feeds can be used.Bansal and his colleagues presented their 4D visualization method at the Computer Vision and Pattern Recognition virtual conference last month."Virtualized reality" is nothing new, but in the past it has been restricted to studio setups, such as CMU's Panoptic Studio, which boasts more than 500 video cameras embedded in its geodesic walls. Fusing visual information of real-world scenes shot from multiple, independent, handheld cameras into a single comprehensive model that can reconstruct a dynamic 3D scene simply hasn't been possible.Bansal and his colleagues worked around that limitation by using convolutional neural nets (CNNs), a type of deep learning program that has proven adept at analyzing visual data. They found that scene-specific CNNs could be used to compose different parts of the scene.The CMU researchers demonstrated their method using up to 15 iPhones to capture a variety of scenes — dances, martial arts demonstrations and even flamingos at the National Aviary in Pittsburgh."The point of using iPhones was to show that anyone can use this system," Bansal said. "The world is our studio."The method also unlocks a host of potential applications in the movie industry and consumer devices, particularly as the popularity of virtual reality headsets continues to grow.Though the method doesn't necessarily capture scenes in full 3D detail, the system can limit playback angles so incompletely reconstructed areas are not visible and the illusion of 3D imagery is not shattered.In addition to Bansal, the research team included Robotics Institute faculty members Yaser Sheikh, Deva Ramanan and Srinivasa Narasimhan. The team also included Minh Vo, a former Ph.D. student who now works at Facebook Reality Lab. The National Science Foundation, Office of Naval Research and Qualcomm supported this research.