DataOps: Toward an Incremental Data Process

DataOps: Toward an Incremental Data Process

Date: July 2, 2021

Data science projects are known to have a high failure rate of up to 85% despite the nature of their important role to business. Integrating data analytics into core Information Technology (IT) capabilities can be elusive and daunting.

“If we consider IT projects two-dimensional with requirements versus implementation, data projects are three-dimensional. The third dimension is needed to uncover data gems, even though the requirements don’t know where the gems are and what they look like,” says Xiaolin Li, director of Data Engineering and Analytics at T-Rex.

The concept of Data Operations (DataOps) was introduced in 2014 but in recent years it has grown into an emerging branch of IT operations. The key components of DataOps include:

  • Information Architecture: Understanding data context and usage environment; developing data taxonomy and methodology for incremental data analysis.
  • Data Integration and Automation: Integrating data results for continuous delivery; automating process flow and validating output through iterative measurement and feedback.
  • Data Governance: Collaborative data catalog with end-to-end data support from scientists, analysts, engineers, and clients; incorporating data knowledge and analysis into Agile development
  • Data Security: Integrating security into each stage of data lifecycle; addressing the need to protect a wide variety of data workload.

In a recent national data analytics project of T-Rex’s 2020 Census Technical Integration (TI) implementation, the team adopted a DataOps-based approach and achieved the project goal successfully. The project was the first of its kind because of the new response data collection.  At the start of the project, the team faced technical challenges such as data evaluation and performance enhancement. Those issues were resolved effectively through a data-centric, iterative, and highly cooperative process. When the 2020 Census ended last year, the team finished the complex, multi-faceted analysis of 300 million national responses critical to the mission.

Throughout the course of the project, the team treated data as a software component and integrated it into the coding cycles. Test data were treated like code, built and released bi-weekly. The data were adjusted in each cycle to follow the requirement evolution and calibrated to improve model accuracy.  Daily integration tests, which included both code and data, were fully automated such that team members could address issues early. “The DataOps recipe was a key contributing factor to the project’s success,” Xiaolin Li points out; “we created the analytics model incrementally, built and deployed it automatically, and reviewed it constantly.”

Global data volume has been predicted to grow 23% annually according to a recent IDC report.1 Those who understand data better will gain a competitive edge and be in a better position to project success. DataOps is expected to play an ever bigger role in the data industry moving forward.

Learn more about T-Rex’s Data Engineering and Analytics capability.

1  International Data Corporation, “Data Creation and Replication Will Grow at a Faster Rate than Installed Storage Capacity, According to the IDC Global DataSphere and StorageSphere,” March 24, 2021, https://www.idc.com/getdoc.jsp?containerId=prUS47560321

 


recently posted
Driving Federal Mission Success: T-Rex’s Strategic Value as an AWS Advanced Tier Partner

Driving Federal Mission Success: T-Rex’s Strategic Value as an AWS Advanced Tier Partner

Amazon Web Services (AWS) Advanced Tier is a top-tier designation in the AWS Partner Network. This designation signifies deep understanding of AWS services and a proven track record of customer success. Federal agencies that leverage AWS Advanced Tier Partners benefit […]

T-Rex Achieves AWS Advanced Tier Partner Status

T-Rex Achieves AWS Advanced Tier Partner Status

T-Rex Solutions, a leading technology solutions provider, is proud to announce its recognition as an Amazon Web Services (AWS) Advanced Tier Services Partner. This prestigious designation highlights T-Rex’s ongoing commitment to delivering cloud innovation and its outstanding contributions to major […]

T-REX RECEIVES 2025 HIRE VETS MEDALLION AWARD FROM THE U.S. DEPARTMENT OF LABOR

T-REX RECEIVES 2025 HIRE VETS MEDALLION AWARD FROM THE U.S. DEPARTMENT OF LABOR

The U.S. Department of Labor has recognized T-Rex Solutions as one of the record-breaking 887 recipients of the 2025 HIRE Vets Medallion Award. T-Rex earned the platinum award. The Honoring Investments in Recruiting and Employing American Military Veterans Act (HIRE […]

T-Rex’s Zero Trust Reference Architecture: Multi-Cloud

T-Rex’s Zero Trust Reference Architecture: Multi-Cloud

Agencies should closely manage their Zero Trust program when considering a multi-cloud architecture to ensure a secure and zero trust compliant multi-cloud environment. Agencies can maximize these benefits: Agency maintains and improves Zero Trust Pillars Agency gains security capabilities Agency […]