Kayla PapakieTuesday, September 6, 2022Print this page.
Rashmi Vinayak, an assistant professor in Carnegie Mellon University's Computer Science Department, earned a 2022 Meta Research Award for work on silent data corruptions at large scales. As part of the award, she will receive $50,000 to devise solutions for identifying data errors in large-scale computing systems.
Silent data corruption occurs when computational errors in data go undetected and then present as application-level problems. Vinayak's work will develop tools inspired by coding theory to efficiently detect silent data corruption to increase the reliability of hyperscale computing systems, like that used by Meta.
"We take it for granted that if we ask a computer to perform a computation, we get the correct answer," Vinayak said. "But there are certain operating conditions where the answer may not be correct. This is a problem because most applications assume what the computer returns back is correct."
Data errors can occur for a number of reasons, including changes in manufacturing technology and general wear and tear of computing units due to aging. Companies the size of Meta depend on massive computing infrastructures, with billions of users who rely on the computer to be correct and efficient. Current processes for identifying data errors might involve running the computation multiple times until an outlier is recognized, but this process is inefficient for such large-scale setups. Vinayak's project aims to find a solution that will increase reliability while remaining efficient for such massive computing systems.
Meta established its Research Award to foster collaborative approaches from industry and academia that tackle large-scale data errors. Projects range from architectural solutions and fleetwide testing strategies to simulation and manufacturing approaches. An important aspect of this award is the collaboration that it enables between Vinayak's research group and Meta.
"They have real data, which is crucial when doing research in order to make an impact on real-world problems," she said.
Aaron Aupperlee | 412-268-9068 | aaupperlee@cmu.edu