Making Big Data Better
Government CIO Magazine
Big Data is currently the hot topic in information and technology management. But before an agency can analyze and process Big Data it has to collect it. For federal regulatory and enforcement agencies the collection process must be done in ways that are least disruptive but still accurate and timely. Agencies such as the Food and Drug Administration (FDA), and the Departments of Justice (DOJ) and the Treasury—whose regulatory and enforcement missions involve partners and stakeholders across government, industry and the public—have done just that. These agencies have shown that successful data collection can happen by focusing on core business processes, adopting recognized data collection and transmission standards, and leveraging collaborative governance frameworks.
Data collection is a business problem
Successful data collection starts with focusing on the business processes the agency drives, contributes to or affects, and the corresponding data that is integral to those processes. Rather than collecting a vast array of information, the collection should focus on specific analyses required for these processes. In addition, it’s a good idea for agencies to understand the extent to which regulatory or enforcement data collections already are a subset of the overall data collected by industry or an added burden. As illustrated below, thoughtful and focused collection of data that is already produced by industry is more likely to produce fewer data integrity or quality challenges.
FDA has invested heavily to develop data collection and management strategies to expedite review and approval of new pharmaceutical and medical device products. The FDA website provides information to help industry, the medical and public health communities and the public to electronically prepare and submit documents such as product and product labeling applications or adverse events reports. For example, FDA’s Center for Devices and Radiological Health (CDRH) encourages manufacturers to use data and terminology standards in pre-market submissions and post-market reports for medical devices, explaining how product review occurs more quickly when manufacturers use the CDRH published standards. Standardized electronic applications have increased data quality and timeliness and contributed to reductions in total CDRH review time and approval backlogs.
Similarly, the federal agencies that collectively supervise the nation’s financial institutions collaborated via the Federal Financial Institutions Examination Council to develop common reporting requirements for financial institutions. Examples include the Uniform Bank Performance Report, which allows analysis of how bank management decisions and economic conditions impact the institutions’ stability, and the quarterly Consolidated Report of Condition and Income, generally referred to as the “Call Report,” which provides data regarding a bank’s financial condition and the results of its operations. The results include more consistency in the regulatory process and more timely access to financial institutions’ information for regulators, the public and other stakeholders.
Developing and adopting data standards
Adopting widely accepted data standards is key to effective data collection by fostering common vocabulary and formats to make preparing and submitting reports to regulatory agencies easier. Standards also facilitate rapid data retrieval, analysis and distillation, as well as contribute to improved data quality and integrity. However, agencies should not develop data standards unilaterally, but instead should use ones that already exist, whenever possible.
FDA, for example, relies heavily on electronic health and research information sharing standards developed and managed by industry-led non-profit organizations such as Health Level Seven International (HL7) and the Clinical Data Interchange Standards Consortium (CDISC). FDA’s commitment to data standards is exemplified in a position statement released in September, “FDA envisions a semantically interoperable and sustainable submission environment that serves both regulated clinical research and health care. To this end, FDA will continue to research and evaluate, with its stakeholders, potential new approaches to current and emerging data standards. FDA does not foresee the replacement of CDISC standards for study data and will not implement new approaches without public input on the cost and utility of those approaches.” An example of FDA’s adoption of HL7 standards is CDRH’s electronic Medical Device Reporting program that reports issues to public health, patient safety, and quality improvement organizations. Leveraging the HL7 standards, health care providers can create HL7 ICSR compliant adverse event submissions themselves or use FDA’s eSubmitter application.
A broad-based example of standards adoption is the National Information Exchange Model (NIEM). NIEM is a set of data sharing processes and tools that are used to describe data in consistent and clear ways that enable data sets to be exchanged across entities within a specific community of interest (e.g., law enforcement). Used today across many disciplines (e.g., health, finance, public safety, etc.), NIEM was developed initially by DOJ and the Department of Homeland Security to share law enforcement information across federal, state, local and tribal law enforcement agencies. The NIEM website (https://www.niem.gov) provides numerous examples of how NIEM has been utilized at all levels of government. As an eample, DOJ used NIEM to build the data definitions and data publication, search and retrieval standards for the National Data Exchange System (NDEx), a law enforcement records repository managed by the FBI but utilized by law enforcement agencies nationally. The NDEx system dramatically increases the ability of law enforcement agencies to coordinate investigations across jurisdictions and to understand emerging patterns of crime, while preserving the individual case systems of each participating agency.
Agency data collection efforts are most effective when the objectives and standards are developed and adopted in conjunction with governance processes built on trust and collaboration, and whenever possible, interagency and/or industry-wide participation. Regardless of statutory mandates, data quality, integrity and timeliness suffer when agencies fail to get cooperation from the entities from which data are sought. The most effective way to achieve this “buy-in” is to leverage governance structures that either are independent or include substantial participation from key stakeholders.
An interesting example of governance is the model being used to develop a global legal entity identifier (LEI) for financial institutions. The Treasury’s Office of Financial Research is leading this initiative to create the means to uniquely identify companies participating in global financial markets and the resulting financial linkages between firms. Doing so will help private firms and government regulators better understand financial system risks. The LEI is being developed through an international public-private cooperative involving the Group of 20 finance ministers and central bank governors from the world’s largest economies and a Private Sector Preparatory Group, which ultimately will ensure all parties implement the LEI consistently. Although still under development, the LEI process is a model because it recognizes the need to involve all key stakeholders, which increases the likelihood that data collected via LEI will be meaningful, accurate and timely.
Governance also was key to improving law enforcement information sharing after September 11, 2001. NIEM and information systems such as NDEx were developed and adopted successfully through existing governance and collaboration bodies such as the Criminal Justice Information System Advisory Policy Board and the Global Information Sharing Initiative that worked with DOJ to mobilize the national law enforcement community to support changes in information sharing policy and protocols, development and maturation of new data standards (e.g., NIEM), and investments in new information sharing systems and infrastructure (e.g., NDEx).
The old adage is “garbage in, garbage out.” Data processing tools and technology are important to managing the massive data sets collected by federal agencies. However, if the information input to those tools is not relevant, accurate or timely, the resulting agency analyses or actions are not likely to be well-founded. Agencies that rely on data collected from other stakeholders need to work collaboratively with those stakeholders to identify meaningful data, that can be collected in the least burdensome manner, using standards agreed to by all parties and leveraging independent, inter-disciplinary governance bodies that make sure agency stakeholders are involved in the decision processes related to regulatory and enforcement data collections.