U.S. Patent for Breakthrough Machine Learning Data Preprocessing Methods
CARY, N.C., January 7, 2025 — SAS Institute Inc. today announced the grant of U.S. Patent No. 12,190,219 to Dr. Joseph Nyangon and Dr. Ruth Akintunde, entitled Systems and methods for outlier detection and feature transformation in machine learning model training. Issued by the U.S. Patent and Trademark Office (USPTO), the patent covers a novel end-to-end automated data preprocessing framework designed to significantly enhance the accuracy, stability, and scalability of machine learning models.
Dr. Nyangon delivering a keynote address at the 2024 Utility Analytics Week in Chicago.
The patented technology introduces adaptive outlier removal that dynamically identifies and filters anomalous data points before feature transformation. By integrating automated data sanitization with seamless feature engineering, the framework converts raw, noisy datasets into machine-ready features through a streamlined pipeline compatible with existing enterprise ML platforms. Designed for scale, the architecture supports terabyte-level datasets with minimal manual tuning. Empirical benchmarks demonstrate up to a 15% reduction in prediction error rates compared with traditional preprocessing approaches.
Data quality remains a persistent challenge for organizations deploying advanced analytics in sectors such as finance, healthcare, manufacturing, energy, telecommunications, and the Internet of Things. This invention lowers barriers to adoption by automating critical data-cleaning steps, accelerating deployment timelines, reducing data-science resource demands, and enabling more trustworthy insights.
“Automating outlier detection and feature transformation empowers data scientists to focus on innovation rather than manual data cleanup,” said Dr. Nyangon, lead inventor. Dr. Akintunde added that the collaboration delivered “a flexible, scalable framework that helps teams derive maximum value from their data.”
Key Features
Adaptive Outlier Removal: Dynamically filters noise and anomalies based on configurable thresholds, ensuring cleaner training datasets.
Seamless Feature Transformation: Converts sanitized data into machine-ready features using a streamlined pipeline that integrates directly with existing ML platforms.
Enterprise-Scale Architecture: Deployable as a scalable software component capable of handling terabyte-scale datasets with minimal manual tuning.
Enhanced Predictive Reliability: Empirical benchmarks demonstrate up to a 15% reduction in prediction error rates when compared to traditional preprocessing methods.
Industry Impact
Data quality remains a critical barrier for organizations adopting advanced analytics across finance, healthcare, energy and beyond. Machine learning practitioners in these sectors and beyond face persistent challenges from noisy and inconsistent data. This patented technology promises to lower the barrier to entry for organizations seeking to deploy high-fidelity predictive models by automating critical data-cleaning steps. Early implementations are expected to accelerate deployment timelines, reduce data-science resource requirements, and yield more trustworthy insights.
Areas of Application
This patented technology can be deployed across multiple sectors to improve data quality and model performance, including:
Finance: Fraud detection and risk modeling
Healthcare: Diagnostic analytics and patient-outcome prediction
Manufacturing: Quality control and predictive maintenance
Energy: Electricity demand forecasting and grid optimization
Telecommunications: Network anomaly detection
Internet of Things: Sensor-data preprocessing
Quotes
“Automating outlier detection and feature transformation empowers data scientists to focus on innovation rather than manual data cleanup,” said Dr. Nyangon, Lead Inventor. “This patent represents a major milestone in making machine learning more accessible and reliable by tackling data imperfections at the source. It embodies our vision of making advanced analytics accessible and reliable for all organizations.”
“Collaborating with Joe to architect a flexible, scalable framework was a highlight of our R&D efforts,” added Dr. Ruth Akintunde, Co-Inventor. “I’m proud that our work will help teams derive maximum value from their data.”
Future Outlook
Building on this milestone, Dr. Nyangon and his team plan to extend the methodology to support real-time data streams and automated anomaly alerts for application in electricity demand forecasting, fraud detection and financial risk modeling, and telecommunications network anomaly detection, among others.
Call to Action
Learn more about U.S. Patent No. 12,190,219 at uspto.gov.