Machine learning, evolutionary computing and big data are specialties of Alberto Cano, Ph.D., assistant professor in the Department of Computer Science.
What are you working on right now?
I’m working on machine learning algorithms for real-time, high-speed, drifting data streams. The goal is to classify data on the fly and design classification models that self-adapt to changes in the data properties over time. The challenge is to develop both accurate and fast classifiers that quickly adapt to changes in data. Applications include credit card fraud detection or email spam filtering. Malicious users continuously change the way they try to get to you, and our classification systems must adapt accordingly.
What’s the biggest challenge in making algorithms that can self-adapt to big data?
One big challenge is making them run fast enough. Data streams may comprise millions of data instances per second, and we cannot afford a delay or bottleneck in the classification. Who wants to wait a few minutes to verify that a credit card transaction wasn’t fraudulent? Therefore, we need to use high-performance computing architectures to deploy the algorithms and scale them to big data. We’re using GPUs [graphics processing units] and a MapReduce programming model on Hadoop and Spark computing frameworks to scale ours out.
What do you hope to achieve with these high-speed algorithms?
My job as a machine learning scientist is to develop the algorithms and make them available to the scientific community. The ultimate goal is to translate the know-how into industry and society to solve real-world problems. My current project is funded by Hamilton Beach Brands Inc. and we’re working on applying these algorithms to business intelligence — integrating machine learning in business decision-making to improve their operations.
What attracted you to this aspect of machine learning?
The number of open issues that await a solution. This is a relatively novel area and there are countless research questions to be addressed. It’ll keep me busy for the next five years! How can we understand, explain and learn from drifts in real-time data? Is there a way to detect if my system is being attacked by an adversary injecting malicious data into my inputs? How could I build a robust classifier that would detect and prevent that?
Who else from VCU is working with you on this project?
I’m working very closely with Dr. Bartosz Krawczyk and my Ph.D. student Martha Roseberry on this project. It’s been a fruitful collaboration and we’ve published several papers in top-tier conferences and journals that are getting the attention of the research community. Our complementary expertise is crucial to integrating the diverse ideas that will improve the performance and accuracy of these classification models.
Fun fact: I was born and raised in Spain, so I am Spanish-Spanish.