The Internal Revenue Service is collecting a lot more than taxes this year--it's also acquiring a huge volume of personal information on taxpayers' digital activities, from eBay auctions to Facebook posts and, for the first time ever, credit card and e-payment transaction records, as it expands its search for tax cheats to places it's never gone before.
The IRS, under heavy pressure to help Washington out of its budget quagmire by chasing down an estimated $300 billion in revenue lost to evasions and errors each year, will start using "robo-audits" of tax forms and third-party data the IRS hopes will help close this so-called "tax gap." But the agency reveals little about how it will employ its vast, new network scanning powers.
Tax lawyers and watchdogs are concerned about the sweeping changes being implemented with little public discussion or clear guidelines, and Congressional staff sources say the IRS use of "big data" will be a key issue when the next IRS chief comes to the Senate for approval. Acting commissioner Steven T. Miller replaced Douglas Shulman last November.
"It's well-known in the tax community, but not many people outside of it are aware of this big expansion of data and computer use," says Edward Zelinsky, a tax law expert and professor at Benjamin N. Cardozo School of Law and Yale Law School. "I am sure people will be concerned about the use of personal information on databases in government, and those concerns are well-taken. It's appropriate to watch it carefully. There should be safeguards." He adds that taxpayers should know that whatever people do and say electronically can and will be used against them in IRS enforcement.
IRS's big data tracking. Consumers are already familiar with Internet "cookies" that track their movements and send them targeted ads that follow them to different websites. The IRS has brought in private industry experts to employ similar digital tracking--but with the added advantage of access to Social Security numbers, health records, credit card transactions and many other privileged forms of information that marketers don't see.
"Private industry would be envious if they knew what our models are," boasted Dean Silverman, the agency's high-tech top gun who heads a group recruited from the private sector to update the IRS, in a comment reported in trade publications. The IRS did not respond to a request for an interview.
In trade presentations and public documents, the agency has said it will use a massively parallel computer system that can analyze data from different networks to find irregularities and suspicious activities.
Much of the work already has been automated to process and analyze electronic tax returns in current "robo-audits" that flag unusual behavior patterns. With IRS audit staff reduced by budget cuts this year, the agency will be forced to rely on computer-generated audits more than ever.
The agency declined to comment on how it will use its new technology. But agency officials have been outlining plans at industry conferences, working with IBM, EMC and other private-sector specialists. In presentations, officials have said they may use the big data for:
-- Charting and analyzing social media such as Facebook
-- Targeting audits by matching tax filings to social media or electronic payments
-- Tracking individual Internet addresses and emailing patterns
-- Sorting data in 32,000 categories of metadata and 1 million unique "attributes"
-- Machine learning across "neural" networks
-- Statistical and agent-based modeling
-- Relationship analysis based on Social Security numbers and other personal identifiers
Officials have said much of the data will be used only for research. The agency's economic forecasts and data are a key part of Washington's budget infrastructure. Former commissioner Douglas Shulman said in an IRS statement that the technology will employ "billions of pieces of data" to target enforcement and to "detect and combat noncompliance."
U.S. Tax Court records show that information gathered from Facebook and eBay postings have been used by the IRS in defending tax challenges. Under a Freedom of Information Act disclosure obtained by privacy advocates at the Electronic Frontier Foundation, the group published the IRS's 38-page manual used to train auditors to search Internet addresses, Facebook postings and other social media to back audit enforcements.
In practice, the third-party data has been used only if the irregular returns merit more attention. In one much-cited example, IRS officials talk about prisoners who were filing false claims for energy tax credits for window replacements.
The agency, wary of public opinion about invasive audit practices, has pulled back from using so-called "social audits," which, for example, might single out horse-racing enthusiasts or sailboaters for special attention. But by screening existing data for one million unique attributes, the agency can quietly create a DNA-like code to understand the economic behavior of any individual.
The IRS last year used a profiling test model to study 1,500 tax preparers with histories of reporting deficiencies and managed to recover $200 million. It cited the experience as proof that its data analysis works. Early this year, however, a new set of rules it developed for tax preparers was thrown out by a federal court who said the agency had overstepped its mandate. The IRS would not comment on whether the rules were based on its new screening tools.
Lots of computing power, for what? The agency's computers can now load all U.S. tax returns in just 10 hours, compared with the four months it took just eight years ago, Jeff Butler, IRS director of research databases told the IBM TechAmerica conference last November. That leaves a lot of time for other uses. The IRS says it expects 80 percent of its tax returns to be filed electronically this year. That makes a total of 250 million returns filed, with $2 trillion in revenue.
But processing those returns uses only a fraction of the agency's computing power. An entire year of tax returns amounts to 15 terabytes, or just 1.5 percent of the IRS storage of 1.2 petabytes (one quadrillion bits of information), based on public data from IRS presentations. The agency has expanded its data capacity by 1,000 percent in the past six years.
It also recently assembled $350 million in high-tech tools to do a lot of auditing, tracking and analyzing what people do on the Internet. The agency has used social media and other third-party sources in the past, but it has now increased its capability to so from its own growing database of networks.
Congressional staffers on the House Ways and Means Committee and the Joint Committee on Taxation, both of which oversee the IRS, say they have been occupied by more pressing issues related to the budget crisis, and Congress gave the tax officials leeway to use technology to solve the growing problem of identity theft. But they said they will look at the possibility of errors in robo-audits as well as the storage of data on millions of taxpayers.
The IRS is guarded about how its audits are triggered, tax experts say, because too much information on what they do might help tax cheats. Major accounting firms have been given little information on the changes and were reluctant to comment, although some said privately that they are aware of the new IRS tools but it is too early to tell how they will be used. Taxpayer advocacy groups also say they are waiting to see how the IRS manages its technology upgrade, and are holding out hope that it will make taxes more fair and efficient and force tax evaders to pay their share of the overall burden.
While many applaud the effort to update government technology with private-sector tools, they say the agency needs to conform to higher standards.
"I don't really see strong legal regulation in place to manage something of this magnitude," says Paul Schwartz, University of California law professor and co-director of the Berkeley Center for Law & Technology. The IRS is working with the same kind of oversight and rules that were developed in the paper tax-return era, says Schwartz. But with the technology it now has, the agency can "see into people's lives" as never before.
Tax returns are like narratives of how people spent their money, and tax audits have been guided by "reasonable" interpretations of allowable credits and deductions by the IRS agents who manage audits. "Social media can make people testify against themselves," Schwartz says. "They provide a counter-narrative." He cites as an example a businessperson going to Florida for five meetings over a week who also visits family in Miami. A casual Google+ posting to friends online about "visiting my mother in Florida" could paint a different picture than the deduction taken on the tax form.
"It will be interesting to see what the IRS does with all of their new tools. They will have to be very careful," says Schwartz. So, too, will taxpayers.
More From US News & World Report