Superintelligence — AI systems that surpass human intelligence — could be just a decade away, raising urgent questions about how to ensure their safety and alignment with human values, according to OpenAI.
These AI systems could be hugely beneficial, but more sophisticated systems create the possibility for unpredictability. Misinterpreting human values or intent, algorithmic biases and security risks are all cause for concern, especially with highly developed AI systems where the potential consequences are far greater.
Yu Meng, assistant professor of computer science at the University of Virginia School of Engineering and Applied Science, will soon undertake a project to help prepare for this future.
Enhancing AI Training for a Safer, Smarter Future
While AI models are rapidly progressing, training these models continues to pose challenges. Meng’s research will aim to advance the training methodologies researchers rely on, making them more effective for superintelligent systems.
Training AI effectively is crucial for ensuring that these systems can learn, adapt and operate in ways that minimize risks and maximize benefits for society.
"Our research tackles the pressing challenges of AI alignment as we approach the era of superintelligence,” said Meng. “We aim to innovate new training methods essential for building superintelligent systems that are both safe and beneficial.”
“Beyond advancing AI capabilities, we are dedicated to ensuring that rapidly evolving AI systems remain aligned with human values and intentions. By doing so, we are helping to shape a future where superintelligent AI serves to enhance humanity rather than threatens it."
Helping AI Learn from Imperfect and Evolving Data
Part of the project will investigate how researchers can help AI learn better when using data that is incomplete, ambiguous or labeled inconsistently. Real-world human supervision is often imperfect, and AI systems are not always capable of accurately extrapolating information. Meng plans to test different techniques to help AI understand what information is important, despite the noise, ultimately making AI models more reliable, less biased and more accurate.
Additionally, the project will seek to understand how AI training should change as our knowledge grows. With information constantly evolving, anticipating how training methodologies will need to adapt is essential. Failing to continually improve AI training can cause future systems to be stunted, resulting in unreliable models and more errors.
A New Take on Train the Trainer
There’s also a possibility future AI models may be able to train other AI. To better understand this potential, the team will also compare the effectiveness of training data from humans to training data created by AI. At some point, AI models may become advanced enough to provide guidance similar to humans.
Meng’s work is supported by a $200,000 Superalignment Fast Grant from OpenAI. The competitive funding program targets projects that aim to build next-generation superhuman AI systems. This year, only 50 proposals out of a total of 2,700 were funded.