Finding the needles in 'big data' haystacks

Boian Alexandrov and Velimir 'Monty' Vesselinov, Albuquerque Journal, N.M.

March 7, 2021 at 10:02 AM·4 min read

Mar. 7—A seemingly bottomless ocean of "big data" has flooded our world. Bits and bytes are pouring in from sources ranging from satellites and MRI scans to massive computer simulations and seismic-sensor networks, from security cameras to smartphones, from genome sequencing of SARS-Cov-2 to COVID-19 test results, from social networks to texts zipping from phone to phone.

Making sense of this ever-increasing racket is vital to national security, economic stability, individual health and practically every branch of science — and the job is getting easier, thanks to the SmartTensors artificial intelligence tool we have developed at Los Alamos National Laboratory.

Without any human guidance, this technology sifts through millions of millions of bytes of diverse data to find the hidden patterns and features that make the data understandable, revealing its underlying processes or causes. SmartTensors also can identify just how many features are needed to make sense of enormous, multidimensional datasets.

In analyzing data, finding that optimal number reduces a massive set of data to a scale that's manageable for computers to process and subject matter experts to analyze. The features that SmartTensors extracts are explainable and understandable chunks of data.

What makes a face?

Take, for example, facial recognition algorithms, which rely on large datasets. A face is a set of key facial features and features that matter less — noses, eyes, eyebrows, ears, mouths, cheeks, foreheads, jawlines, hairlines and chins. SmartTensors can be pointed at a large number of photos of faces and isolate those features as the important ones for recognizing faces. It also can determine how many of those features — the optimal number — are required to do the job accurately and reliably. For instance, maybe only specific shapes of eyes, noses and mouths are needed for facial recognition. It might also be essential to categorize all the faces that have oval eyes and slim noses.

In other database examples, the features needed to represent the whole dataset might not be that obvious. Very large sets of data — measured in billions of millions of bytes — typically are made up of unknown features obscured by a torrent of less useful information and noise in the data.

Vast datasets, such as COVID-19 test results or information from earthquake sensors, are formed exclusively by things we can observe directly. But in big-data analytics, it is difficult to directly link these observables to the underlying processes that control the behavior and generate the data. These processes or hidden features are not directly observable and are confusingly mixed with each other, with unimportant features, and with noise.

Cocktail party problem

The problem is similar to extracting the individual voices at a noisy cocktail party with a set of microphones recording the chatter. How do you isolate one or more conversations while individuals are moving around and talking? The number of hidden features here is the number of individual voices and their characteristics, which might include the pitch and tone of each person's voice, for instance. Once that's determined, it's easier to follow a conversational thread or a person.

Similarly, to sort out the important information in a dataset, SmartTensors organizes the information into a data cube, or tensor, that's made of three or more dimensions. Each dimension is a particular category of information within that data. So, in the cocktail party example, the pitch of a voice might be one dimension, its tonal qualities another, its volume a third, and so on. If you think of the data cube as being made up of many small, stacked cubes, each one represents information about some or all of the features of the data. The representation of the data in the form of a tensor allows fast processing as the AI churns through all the data.

As you might expect, we've applied SmartTensors to more important problems than separating individual conversations at a cocktail party. SmartTensors is helping us understand climate processes, watershed mechanisms, hidden geothermal resources, carbon sequestration processes, chemical reactions, protein structures, pharmaceutical molecules, cancerous mutations in human genomes, and more. In a world swimming in big data, this kind of tool just might help us all keep our heads above water.

Boian Alexandrov is an AI expert and principal investigator on the SmartTensors project in the Physics and Chemistry of Materials group at Los Alamos National Laboratory. Velimir "Monty" Vesselinov is an expert in machine learning, data analytics and model diagnostics in the Computational Earth Science group at Los Alamos, and also a principal investigator on the project. SmartTensors was funded by the Laboratory Directed Research and Development (LDRD) program at Los Alamos. For more information, visit the SmartTensors website.

Yahoo Sports
Broncos, Jets, Lions and Texans have new uniforms. Let's rank them
Which new uniforms are winners this season?
1d ago
Yahoo Sports
Based on the odds, here's what the top 10 picks of the NFL Draft will be
What would a mock draft look like using just betting odds?
3d ago
Yahoo Finance
Jamie Dimon is worried the US economy is headed back to the 1970s
JPMorgan's CEO is concerned the US economy could be in for a repeat of the stagflation that hampered the country during the 1970s.
2d ago
Yahoo Sports
Luka makes Clippers look old, Suns are in big trouble & a funeral for Lakers | Good Word with Goodwill
Vincent Goodwill and Tom Haberstroh break down last night’s NBA Playoffs action and preview several games for tonight and tomorrow.
1d ago
Yahoo Sports
Dave McCarty, player on 2004 Red Sox championship team, dies 1 week after team's reunion
The Red Sox were already mourning the loss of Tim Wakefield from that 2004 team.
5d ago
Yahoo TV
Everyone's still talking about the 'SNL' Beavis and Butt-Head sketch. Cast members and experts explain why it's an instant classic.
Ryan Gosling, who starred in the skit, couldn't keep a straight face — and neither could some of the "Saturday Night Live" cast.
2d ago
Autoblog
These are the cars being discontinued for 2024 and beyond
As automakers shift to EVs, trim the fat on their lineups and cull slow-selling models, these are the vehicles we expect to die off soon.
2d ago
Yahoo Sports
Ryan Garcia drops Devin Haney 3 times en route to stunning upset
The 25-year-old labeled "mentally fragile" by many delivered the upset for the ages.
5d ago
Yahoo Sports
Arch Manning dominates in the Texas spring game, and Jaden Rashada enters the transfer portal
Dan Wetzel, Ross Dellenger & SI’s Pat Forde react to the huge performance this weekend by Texas QB Arch Manning, Michigan and Notre Dame's spring games, Jaden Rashada entering the transfer portal, and more
3d ago
Yahoo Sports
Chiefs make Andy Reid NFL's highest-paid coach, sign president Mark Donovan, GM Brett Veach to extensions
Reid's deal reportedly runs through 2029 and makes him the highest-paid coach in the NFL.
3d ago
Yahoo Sports
Yankees' Nestor Cortés told by MLB his pump-fake pitch is illegal
Cortés' attempt didn't fool Andrés Giménez, who fouled off the pitch.
5d ago
Yahoo Life
Here’s when people think old age begins — and why experts think it’s starting later
People's definition of "old age" is older than it used to be, new research suggests.
3d ago
Yahoo Sports
NBA playoffs: Who's had the most impressive start to the postseason? Most surprising?
Our NBA writers weigh in on the first week of the playoffs and look ahead to what they're watching as the series shift to crucial Game 3s.
7h ago
Yahoo Finance
Donald Trump nabs additional $1.2 billion 'earnout' bonus from DJT stock
Trump is entitled to an additional 36 million shares if the company's share price trades above $17.50 "for twenty out of any thirty trading days" over the next three years.
2d ago
Yahoo Finance
Retirement confidence in the US ticks up; new rule for financial advisers is set to start
Two-thirds of Americans reported that they feel confident they have enough money for a comfortable retirement, up a notch from last year.
13h ago
Yahoo Sports
Lions' new uniforms get leaked early, and they find some humor in it
The Lions' new uniforms got released prematurely.
7d ago
Yahoo Sports
Charlie Woods, Tiger Woods’ 15-year-old son, fails to advance in U.S. Open qualifier
Charlie Woods shot a 9-over 81 in the first stage of U.S. Open qualifying on Thursday in Florida.
4h ago
Yahoo Sports
2024 NFL mock draft: With one major trade-up, it's a QB party in the top 5
Our final 2024 mock draft projects four quarterbacks in the first five picks, but the Cardinals at No. 4 might represent the key pivot point of the entire board.
4d ago
Yahoo Sports
Dylan Edwards set to be latest Colorado running back to enter transfer portal
All four rushers who had more than 10 carries in 2023 for the Buffaloes are transferring.
2d ago
Yahoo Sports
NBA Playoffs: Lillard sinks the Pacers, Celtics-Heat controversy, plus injury concerns for Kawhi & Embiid
Vincent Goodwill and Amin Elhassan react to (just about) every Round 1 game of the NBA Playoffs after the first games have been played over the weekend.
3d ago

News

Life

Entertainment

Finance

Sports

New on Yahoo

Finding the needles in 'big data' haystacks

Recommended Stories

Broncos, Jets, Lions and Texans have new uniforms. Let's rank them

Based on the odds, here's what the top 10 picks of the NFL Draft will be

Jamie Dimon is worried the US economy is headed back to the 1970s

Luka makes Clippers look old, Suns are in big trouble & a funeral for Lakers | Good Word with Goodwill

Dave McCarty, player on 2004 Red Sox championship team, dies 1 week after team's reunion

Everyone's still talking about the 'SNL' Beavis and Butt-Head sketch. Cast members and experts explain why it's an instant classic.

These are the cars being discontinued for 2024 and beyond

Ryan Garcia drops Devin Haney 3 times en route to stunning upset

Arch Manning dominates in the Texas spring game, and Jaden Rashada enters the transfer portal

Chiefs make Andy Reid NFL's highest-paid coach, sign president Mark Donovan, GM Brett Veach to extensions

Yankees' Nestor Cortés told by MLB his pump-fake pitch is illegal

Here’s when people think old age begins — and why experts think it’s starting later

NBA playoffs: Who's had the most impressive start to the postseason? Most surprising?

Donald Trump nabs additional $1.2 billion 'earnout' bonus from DJT stock

Retirement confidence in the US ticks up; new rule for financial advisers is set to start

Lions' new uniforms get leaked early, and they find some humor in it

Charlie Woods, Tiger Woods’ 15-year-old son, fails to advance in U.S. Open qualifier

2024 NFL mock draft: With one major trade-up, it's a QB party in the top 5

Dylan Edwards set to be latest Colorado running back to enter transfer portal

NBA Playoffs: Lillard sinks the Pacers, Celtics-Heat controversy, plus injury concerns for Kawhi & Embiid