DevOps for Data and your Data Project
Everything you need to know about DevOps for Data and how to successfully automate the data validation & testing of your DataOps pipeline
DevOps for Data
DevOps is one of the hottest trends in the software industry and successful DevOps implementation is the goal of most progressive IT organizations (see chart below, courtesy of Google Trends). DevOps (short for development and operations) is a set of automated software practices that combine software development (Dev), testing and IT operations (Ops) to shorten the software development life cycle while delivering features, fixes, and updates frequently in alignment with the business’ objectives.
DevOps is typically cross-functional (people from different IT-related business units) and uses different software tools. These tools usually fit into one or more of the following categories:
- Coding – code development and review, source code management tools, code merging
- Building – continuous integration tools (like Jenkins), build status
- Testing – continuous testing tools (like QuerySurge, Selenium, Cucumber, JMeter) that provide feedback on business risks
- Packaging – artifact repository, application pre-deployment staging
- Releasing – change management, release approvals, release automation
- Configuring – infrastructure configuration and management, infrastructure as code tools
- Monitoring – applications performance monitoring, end-user experience
While we’re at it, let’s add a few more terms to the DevOps movement:
Continuous Integration (CI). Continuous Integration is about automating build and test processes to make sure the resulting software is in a good state, ideally every time a developer changes code. CI helps development teams avoid integration issues where the software works on individual developers’ machines, but it fails when all developers combine their code.
Continuous Delivery (CD). Continuous Delivery goes one step further to automate a software release, which typically involves packaging the software for deployment in a production-like environment. The goal of CD is to make sure the software is always ready to go to production, even if the team decides not to do it for business reasons.
Continuous Deployment (also CD). Continuous deployment goes one step further than continuous delivery. With this practice, every change that passes all stages of your production pipeline is released to your customers. There’s no human intervention, and only a failed test will prevent a new change to be deployed to production.
Continuous Testing. One of the hottest buzz terms in the testing world, continuous testing is the process of executing automated tests as part of the delivery pipeline to obtain immediate feedback on the business risks associated with a release candidate. Continuous testing cannot be implemented without test automation.
DevOps principles demand strong interdepartmental communication and rely heavily on automation tools.
There are 6 primary goals of DevOps:
- Increase speed of development and release processes
- Make builds more reliable
- Shorter turnaround for new features and bug fixes
- Greater Scalability of applications and infrastructure
- Increased security by automating compliance practices
- Improved collaboration throughout the development lifecycle
And now the movement to incorporate a DevOps-type of automated process for data has grown stronger. These practices are often referred to as DevOps for Data or DataOps and apply DevOps tools and techniques to data. Data is growing geometrically and applying automation to develop, deploy and validate/test the data is becoming more critical, as businesses are implementing BI & Analytics to make sense of their data and to leverage it in hopes of providing a competitive advantage.
According to Gartner:
DataOps is a collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and data consumers across an organization. The goal of DataOps is to deliver value faster by creating predictable delivery and change management of data, data models and related artifacts. DataOps uses technology to automate the design, deployment and management of data delivery with appropriate levels of governance, and it uses metadata to improve the usability and value of data in a dynamic environment.”
Why DevOps and DevOps for Data?
Enterprise companies are adopting DevOps concepts to speed up deployment, thereby providing both internal and external customers with more builds, more features and better quality faster than ever. The only way to accomplish more and faster builds is to automate the entire process, which is where DevOps comes in.
Why DevOps for Data and Testing?
Development teams need to validate data in an ETL process when the process is completed. Operations teams needs to validate the new data every day. And both teams need to collaborate on this. How can you solve this?
The Continuous Testing Process
The Development team builds and runs unit tests as ETL code is developed, for immediate testing as code is committed, catching issues in the ETL code quickly and reducing remediation costs.
The QA team designs and executes tests during the development cycle to provide the development team with quick feedback on each ETL build deployed, helping to pinpoint where defects appear in the code.
The Operations team executes tests automatically — after ETL execution — on a regular daily cycle.
Tests in DataOps have a role in both the Value and Innovation Pipelines. In the Value Pipeline, tests monitor the data values flowing through the data factory to catch anomalies or flag data values outside statistical norms. In the Innovation Pipeline, tests validate new analytics before deploying them”
DevOps for Data and Test Automation Solutions
There are many commercial solutions in the data validation and ETL testing space (see article on ETL Testing here). But only a subset provide a full API for integration into a CI/CD pipeline. A robust data testing solution should be able to automate the data validation of source and target data stores with full DevOps functionality for continuous testing.
And a robust DevOps for Data solution should integrate with virtually all DevOps solutions in the marketplace. A small subset of vendors (i.e. QuerySurge) provide full RESTful and command-line APIs that give you the ability to create and modify source and target test queries, connections to data stores, tests associated with an execution suite, new staging tables from various data connections and customize flow controls based on run results.
Other benefits should include:
- Data quality at speed. Validate up to 100% of all data up to 1,000 x faster than traditional testing.
- Test automation. Automate your data testing from the kickoff of tests to performing the validation to automated emailing of the results and updating your test management system.
- Test across platforms, whether a Big Data lake, Data Warehouse, traditional database, NoSQL document store, BI reports, flat files, JSON files, SOAP or restful web services, xml, mainframe files, or any other data store.
- DevOps for Data and Continuous Testing functionality. Integration with Data Integration/ETL solutions, Build/Configuration solutions, and QA/ Test Management solutions. Some vendors (i.e. QuerySurge) provide full RESTful and CLI APIs that give you the ability to create and modify source and target test queries, connections to data stores, tests associated with an execution suite, new staging tables from various data connections and customize flow controls based on run results.
- Analysis of data, with analytics dashboards and data intelligence reports.
About QuerySurge
QuerySurge is the leading data validation and testing solution in the market. QuerySurge automates the data validation and ETL testing of Big Data, Data Warehouses, Business Intelligence Reports and Enterprise Applications with full DevOps functionality for continuous testing.
With QuerySurge DevOps for Data :
- Testers can dynamically generate, execute, and update tests and data stores utilizing API calls
- Teams have access to 100+ API calls with hundreds of different properties
- Testers can try QuerySurge’s RESTful API functions in Swagger to see what results they return before they use them live in their code.
- QuerySurge integrates with virtually all DevOps solutions in the marketplace, including all Data Integration/ETL solutions, Build/Configuration solutions, and QA/ Test Management solutions
About QuerySurge and Swagger
Swagger is an open source Interface Description Language (IDL) for describing RESTful APIs expressed using JSON. Swagger is used together with a set of open-source software tools to design, build, document, and use RESTful web services. Swagger includes automated documentation, code generation (into many programming languages), and test-case generation.
And Swagger is embedded in QuerySurge.
What does all that mean? It means as the QuerySurge team adds functions to the RESTful API, Swagger is automatically generated. So it is self-documenting.
It also means you can test out QuerySurge’s RESTful API functions in Swagger to see what results they return before you use them live in your code. Pretty cool stuff.
QuerySurge DevOps for Data’s RESTful API features include the ability to create and modify:
- source and target test queries
- connections to data stores
- tests associated with an execution suite
- new staging tables from various data connections
- custom flow controls based on run results
Below are some sample Use Cases that can be utilized in a DataOps pipeline with QuerySurge DevOps for Data :
-
Use Case Example #1
QuerySurge tests are automatically initiated after ETL execution completes and conditional logic is applied based on specific results of those executions.
-
Use Case Example #2
When new environments are deployed, new connections are automatically created in QuerySurge and tests are duplicated to test against this environment.
The Use Cases are practically endless as QuerySurge DevOps for Data provides the flexibility to integrate your data testing process into your existing DevOps implementation.
We are now in a full DevOps for Data movement to validate and automate the testing of the data pipeline. We’re just at the beginning of this movement but we are all-in on building the right solution into QuerySurge to automate the continuous testing process.
Have any questions or want to give QuerySurge DevOps for Data a spin? Let us know if we can assist you in any way. Please send questions or comments to [email protected]
Download a full version of QuerySurge DevOps for Data
Download a full version of QuerySurge DevOps for Data and experience all of the powerful features that will transform your testing process.
Get Certified as a DevOps-for-Data Tester now!
Learn, Earn and Inform.
- Learn everything about DevOps and DataOps from our self-paced training portal
- Earn your digital badge from a renowned digital credential network and
- Inform your social media community of your new certification
Our latest certification is Certified DevOps for Data Testers. Gain a fundamental understanding of DevOps processes and objectives. Acquire knowledge on how to implement both DevOps and DevOps-for-Data within an organization. Become familiar with DevOps tools and learn to design, create and execute a CI/CD pipeline.
And the training and cert exam are free for customers and partners! We also have quarterly promotions to prospects and members of certain LinkedIn groups (like the QuerySurge Group) where it is free.