Best Practices and Recommendations to Promote Data Sharing in Government (from a TECHNOLOGY perspect

Craig Parisot
Jan 11, 2017
4 min read

On December 15th, ATA, LLC along with Data Community DC and the City Innovate Foundation hosted the Data Sharing Forum. The forum was held in conjunction with the Federal Data in Action Summit sponsored by the White House Office of Science and Technology Policy (OSTP) and National Technical Information Service (NTIS). We focused on bringing together Subject Matter Experts (SMEs) in the data technology and Government policy fields. The SMEs were open to speak on the roadblocks, issues, solutions, and successes they’ve encountered while working with the federal, commercial, regional and state policies and technological systems. These discussions touched on common topics ranging from the discovery of data through proper metadata and cataloging to how an effective policy should be established to force organizational and cultural changes in the effort of bettering data sharing and access. Conclusively, the commonalities in the discussions, the overall format of the forum, and the participation of each panelist and audience member gave rise to excellent take-a-ways and a positive outlook on the future of data sharing and access within the Government.

This series of Blog Posts will detail recommendations and best practices from two panels; one each focused on the issue of data sharing from the perspectives of technology and policy.

First, I’ll focus on recommendations and best practices from the Technology Panel.

Standardization of data (going back to redo all data versus using APIs to translate into common language)

Standardizing all pre-existing data to one language would be labor intensive, taxing and costly. Instead, developing and deploying representational state transfer (REST) APIs for communication between pre-existing servers and databases, would translate the pre-exiting data into a common language for cross domain communication and new application services. “You can’t reinvent the engine while it’s still flying.” The issue lies with standardizing of data in individual organizations, they maintain governance, write the data in their own language and aren’t accounting for cross domain accessibility. It is more of a standardization issue than it is a technical problem, because data translators can be developed and deployed. Pressing onward the data should be uniformly standardized, but since that isn’t currently the case then developing data translators to translate all data into a common language should be the focus.

The need for data discoverability

Data can become easily discoverable if we use proper documentation and metadata cataloguing with common language in descriptors and tags. The organization who maintains governance of a data set should avoid using jargon, or language that limits analysts from discovering the data. We must clearly label and define data during cataloguing, to include; a clear description of the dataset, the organization who maintains the governance of the data (i.e. the data owner), who collected the data, and the original questions asked that led to the collection of the data. Additionally, analysts, IT professionals, and data scientist feedback reports on datasets should be performed. This will promote data refinement and cleaning, keeping data set descriptions accurate and of best quality. Finally, sharing all data through the data communities’ open source spaces to allow analysts, data scientists and multiple organizations to view and enable data.

Completing tasks quickly and efficiently

Using existing enterprise systems and using tools familiar to developers for building upon the existing systems brings down cost by eliminating the need to redesign a new system. Additionally, using the Agile method for rapid deployment and testing, this method also ensures the products capabilities and direction is what the customer needs for its mission. Having a Dev Ops team aids in implementation of products quickly and efficiently. The Dev Ops flow of deploying new toolsets and Aps into the environment is quick and agile because it avoids bureaucratic slowdowns and can have a containerized method to move from Dev Ops to the enterprise environment. It accelerates time to have a test environment from conceptualization to deployment.

Using the agile method, integrating existing enterprise systems, and having a Dev Ops allows rapid deployment, cut down on time and cost efficiently.

Documentation of Data to ensure usefulness based on value

Data neutrality, the idea that all data is useful, should be the way all data is viewed, because it is almost impossible to determine data’s value. Additionally, data maintained by organizations may not be utilized to its full potential due to funding being allocated to only mission essential tasks. Access of data to innovators, data scientists, and industries allows the analysis and enablement of the data for projects the organization would not fund. Ergo, documentation of all data should be clear, in common language, properly catalogued, accurate and automated. Constant refining and cleaning of data by IT professionals and data scientists from recommendations and feedback from other data scientists, analysts and users should be implemented. This will drive the discoverability of the data and result in its value to increase.

Time to take on the biggest technology roadblocks and difficulties

The location, policies and governance of the data is the most difficult road block. Retrieving data is an issue because the data is inaccessible at its location. Agencies format the data however they choose and makes common access and discoverability slim. The governance and culture surrounding the data causes issues as well. If the culture and policies on sharing data isn’t granted, then there is no data to share.

The next post will focus on recommendations and best practices from the Policy Panel.

About the Author: Craig Parisot is the CEO of Advanced Technology Applications (ATA, LLC) of McLean, Virginia focused on full stack data science engineering providing strategy, infrastructure, analytic and security solutions in multiple industry sectors serving commercial and government clients. Craig is also the Organizer of the Full Stack Data Science Meet-up in collaboration with Data Community DC and an angel and growth stage technology investor.