Fortunately in the Terminator movies, Skynet didn’t have the same technology we do now, as the T-800 would have had no problem differentiating between different Sarah Connors if it did. Researchers have figured out a way to use machine learning to make AIs far better at “disambiguating” between different people with the same name.
Traditionally AI has difficulty in telling the difference between different people with similar identifying information. If you have three John Smiths on record, how does the system know that new data about John Smith is related to those already on record, or to an entirely new John Smith previously unregistered with the system?
Solving that problem is the challenge that computer scientists Murat Dundar and Mohammad al Hasan, and doctoral student Baichuan Zhang, all of Indiana University — Purdue University Indianapolis, set themselves. They claim to have made an improvement on old methods, by using machine learning and a variety of sources to help figure out who’s who with different data sets.
“We can teach the computer to recognize names and disambiguate information accumulated from a variety of sources — Facebook, Twitter and blog posts, public records and other documents — by collecting features such as Facebook friends and keywords from people’s posts using the identical algorithm,” said Hasan (via Phys). “Our proposed method is scaleable and will be able to group records belonging to a unique person even if thousands of people have the same name, an extremely complicated task.”
While that is impressive in its own right, the algorithm that Hasan and his co-developers have created is able to work when dealing with a non-exhaustive dataset. That is, it can adapt to new information that expands the list of possible people. As new information is fed into the system, it is able to recognize when that is relevant to existing people or when it is for people it has never encountered before.
It’s able to do this by looking at particular information to identify individuals and how they are connected. It looks at friends and associations, the way they talk or write — diction and syntax — and groups they belong to, such as organizations or audiences at certain events.
Moving forward, the developers plan to continue developing their algorithms to better improve the system’s accuracy and its ability to scale up. They see big applications for it in the future, for everything from automated database management for large-scale organizations, to recognizing potentially dangerous individuals in a security context.