DevOps for Data and your Data Project

DevOps for Data

DevOps is one of the hottest trends in the software industry and successful DevOps implementation is the goal of most progressive IT organizations (see chart below, courtesy of Google Trends). DevOps (short for development and operations) is a set of automated software practices that combine software development (Dev), testing and IT operations (Ops) to shorten the software development life cycle while delivering features, fixes, and updates frequently in alignment with the business’ objectives.

DevOps is typically cross-functional (people from different IT-related business units) and uses different software tools. These tools usually fit into one or more of the following categories:

Coding – code development and review, source code management tools, code merging
Building – continuous integration tools (like Jenkins), build status
Testing – continuous testing tools (like QuerySurge, Selenium, Cucumber, JMeter) that provide feedback on business risks
Packaging – artifact repository, application pre-deployment staging
Releasing – change management, release approvals, release automation
Configuring – infrastructure configuration and management, infrastructure as code tools
Monitoring – applications performance monitoring, end-user experience

While we’re at it, let’s add a few more terms to the DevOps movement:

Continuous Integration (CI). Continuous Integration is about automating build and test processes to make sure the resulting software is in a good state, ideally every time a developer changes code. CI helps development teams avoid integration issues where the software works on individual developers’ machines, but it fails when all developers combine their code.

Continuous Delivery (CD). Continuous Delivery goes one step further to automate a software release, which typically involves packaging the software for deployment in a production-like environment. The goal of CD is to make sure the software is always ready to go to production, even if the team decides not to do it for business reasons.

Continuous Deployment (also CD). Continuous deployment goes one step further than continuous delivery. With this practice, every change that passes all stages of your production pipeline is released to your customers. There’s no human intervention, and only a failed test will prevent a new change to be deployed to production.

Continuous Testing. One of the hottest buzz terms in the testing world, continuous testing is the process of executing automated tests as part of the delivery pipeline to obtain immediate feedback on the business risks associated with a release candidate. Continuous testing cannot be implemented without test automation.

DevOps principles demand strong interdepartmental communication and rely heavily on automation tools.

There are 6 primary goals of DevOps:

Increase speed of development and release processes
Make builds more reliable
Shorter turnaround for new features and bug fixes
Greater Scalability of applications and infrastructure
Increased security by automating compliance practices
Improved collaboration throughout the development lifecycle

And now the movement to incorporate a DevOps-type of automated process for data has grown stronger. These practices are often referred to as DevOps for Data or DataOps and apply DevOps tools and techniques to data. Data is growing geometrically and applying automation to develop, deploy and validate/test the data is becoming more critical, as businesses are implementing BI & Analytics to make sense of their data and to leverage it in hopes of providing a competitive advantage.

According to Gartner:

DataOps is a collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and data consumers across an organization. The goal of DataOps is to deliver value faster by creating predictable delivery and change management of data, data models and related artifacts. DataOps uses technology to automate the design, deployment and management of data delivery with appropriate levels of governance, and it uses metadata to improve the usability and value of data in a dynamic environment.”

Why DevOps and DevOps for Data?

Enterprise companies are adopting DevOps concepts to speed up deployment, thereby providing both internal and external customers with more builds, more features and better quality faster than ever. The only way to accomplish more and faster builds is to automate the entire process, which is where DevOps comes in.

IDC Survey: Primary Driver for Scaling DevOps — Click to Enlarge

Why DevOps for Data and Testing?

Development teams need to validate data in an ETL process when the process is completed. Operations teams needs to validate the new data every day. And both teams need to collaborate on this. How can you solve this?

The Continuous Testing Process

The Development team builds and runs unit tests as ETL code is developed, for immediate testing as code is committed, catching issues in the ETL code quickly and reducing remediation costs.

The QA team designs and executes tests during the development cycle to provide the development team with quick feedback on each ETL build deployed, helping to pinpoint where defects appear in the code.

The Operations team executes tests automatically — after ETL execution — on a regular daily cycle.

Tests in DataOps have a role in both the Value and Innovation Pipelines. In the Value Pipeline, tests monitor the data values flowing through the data factory to catch anomalies or flag data values outside statistical norms. In the Innovation Pipeline, tests validate new analytics before deploying them”

- DataKitchen

Comparing DevOps to DataOps process from DataKitchen — Comparing DevOps to DataOps process — from DataKitchen

DevOps for Data and Test Automation Solutions

There are many commercial solutions in the data validation and ETL testing space (see article on ETL Testing here). But only a subset provide a full API for integration into a CI/CD pipeline. A robust data testing solution should be able to automate the data validation of source and target data stores with full DevOps functionality for continuous testing.

And a robust DevOps for Data solution should integrate with virtually all DevOps solutions in the marketplace. A small subset of vendors (i.e. QuerySurge) provide full RESTful and command-line APIs that give you the ability to create and modify source and target test queries, connections to data stores, tests associated with an execution suite, new staging tables from various data connections and customize flow controls based on run results.

Other benefits should include:

Data quality at speed. Validate up to 100% of all data up to 1,000 x faster than traditional testing.
Test automation. Automate your data testing from the kickoff of tests to performing the validation to automated emailing of the results and updating your test management system.
Test across platforms, whether a Big Data lake, Data Warehouse, traditional database, NoSQL document store, BI reports, flat files, JSON files, SOAP or restful web services, xml, mainframe files, or any other data store.
DevOps for Data and Continuous Testing functionality. Integration with Data Integration/ETL solutions, Build/Configuration solutions, and QA/ Test Management solutions. Some vendors (i.e. QuerySurge) provide full RESTful and CLI APIs that give you the ability to create and modify source and target test queries, connections to data stores, tests associated with an execution suite, new staging tables from various data connections and customize flow controls based on run results.
Analysis of data, with analytics dashboards and data intelligence reports.

The Executive Office & Critical Data — Click to Enlarge

About QuerySurge

QuerySurge is the leading data validation and testing solution in the market. QuerySurge automates the data validation and ETL testing of Big Data, Data Warehouses, Business Intelligence Reports and Enterprise Applications with full DevOps functionality for continuous testing.

With QuerySurge DevOps for Data :

Testers can dynamically generate, execute, and update tests and data stores utilizing API calls
Teams have access to 100+ API calls with hundreds of different properties
Testers can try QuerySurge’s RESTful API functions in Swagger to see what results they return before they use them live in their code.
QuerySurge integrates with virtually all DevOps solutions in the marketplace, including all Data Integration/ETL solutions, Build/Configuration solutions, and QA/ Test Management solutions

About QuerySurge and Swagger

Swagger is an open source Interface Description Language (IDL) for describing RESTful APIs expressed using JSON. Swagger is used together with a set of open-source software tools to design, build, document, and use RESTful web services. Swagger includes automated documentation, code generation (into many programming languages), and test-case generation.

And Swagger is embedded in QuerySurge.

What does all that mean? It means as the QuerySurge team adds functions to the RESTful API, Swagger is automatically generated. So it is self-documenting.

It also means you can test out QuerySurge’s RESTful API functions in Swagger to see what results they return before you use them live in your code. Pretty cool stuff.

QuerySurge DevOps for Data’s RESTful API features include the ability to create and modify:

source and target test queries
connections to data stores
tests associated with an execution suite
new staging tables from various data connections
custom flow controls based on run results

Below are some sample Use Cases that can be utilized in a DataOps pipeline with QuerySurge DevOps for Data :

Use Case Example #1
QuerySurge tests are automatically initiated after ETL execution completes and conditional logic is applied based on specific results of those executions.
Click to enlarge
Use Case Example #2
When new environments are deployed, new connections are automatically created in QuerySurge and tests are duplicated to test against this environment.
Click to enlarge

The Use Cases are practically endless as QuerySurge DevOps for Data provides the flexibility to integrate your data testing process into your existing DevOps implementation.

We are now in a full DevOps for Data movement to validate and automate the testing of the data pipeline. We’re just at the beginning of this movement but we are all-in on building the right solution into QuerySurge to automate the continuous testing process.

Have any questions or want to give QuerySurge DevOps for Data a spin? Let us know if we can assist you in any way. Please send questions or comments to [email protected]

Download a full version of QuerySurge DevOps for Data

Download a full version of QuerySurge DevOps for Data and experience all of the powerful features that will transform your testing process.

With QuerySurge DevOps for Data:

choose between a RESTful API or a Command-Line API
dynamically generate, execute, and update tests and data stores utilizing API calls
access 60+ API calls with almost 100 different properties
integration with virtually all DevOps solutions in the marketplace

Register Here

Get Certified as a DevOps-for-Data Tester now!

Learn, Earn and Inform.

Learn everything about DevOps and DataOps from our self-paced training portal
Earn your digital badge from a renowned digital credential network and
Inform your social media community of your new certification

Our latest certification is Certified DevOps for Data Testers. Gain a fundamental understanding of DevOps processes and objectives. Acquire knowledge on how to implement both DevOps and DevOps-for-Data within an organization. Become familiar with DevOps tools and learn to design, create and execute a CI/CD pipeline.

Certified devops for data testers badge new

And the training and cert exam are free for customers and partners! We also have quarterly promotions to prospects and members of certain LinkedIn groups (like the QuerySurge Group) where it is free.

Learn More

Schedule a Private Demo of automated ETL Testing Leader QuerySurge

Schedule a private virtual meeting with our experts and your team. Receive a live, guided, interactive tour of QuerySurge’s features and workflows. Get answers to any questions specific to QuerySurge and your company, your industry or your architecture.

Schedule it now