White Paper
Operationalizing Data Trust:
A Comprehensive Analysis of Automated Validation
in the Databricks Lakehouse Architecture
1. The Imperative of Data Integrity in the Lakehouse Era
The ascent of the Data Lakehouse architecture, epitomized by the Databricks platform, represents a watershed moment in enterprise data management. By converging the flexibility of data lakes with the performance and governance of data warehouses, organizations have unlocked unprecedented capabilities in artificial intelligence, machine learning, and real-time analytics. However, this architectural unification introduces a complex set of challenges regarding data integrity. As data volume, velocity, and variety explode, the traditional manual methods of validation - often described as "stare and compare" - have become dangerously obsolete. In this landscape, integrating QuerySurge, a purpose-built automated data testing solution, with Databricks offers a robust mechanism to enforce quality. This report provides an exhaustive examination of how QuerySurge complements Databricks, with a specific focus on validating the Medallion Architecture (Bronze, Silver, Gold layers) and automating Quality Gates within a DataOps CI/CD pipeline.
(To expand the sections below, click on the +)
1.1 The Erosion of Trust in Modern Data Systems
The fundamental promise of the Lakehouse is to serve as a "Single Source of Truth." Yet, without rigorous, automated validation, this promise often devolves into a "Data Swamp." The consequences of data defects are severe: financial reports may be misstated, recommendation engines may serve irrelevant content, and predictive maintenance models may fail to detect critical equipment faults.
In traditional data warehousing, the "Schema-on-Write" paradigm enforced a level of structure at the point of ingestion. If data did not match the schema, the load failed. The Lakehouse, particularly in its initial ingestion layers (Bronze), often employs "Schema-on-Read" or flexible ingestion to accommodate the chaotic nature of big data streams (IoT logs, clickstream data, JSON dumps). While this flexibility is a feature, it is also a vector for defects. Schema drift, truncated files, and silent data corruption can propagate unnoticed through the pipeline until they manifest in a boardroom dashboard.
QuerySurge addresses this "Trust Gap" by introducing an independent validation layer. Unlike internal consistency checks (such as Delta Live Tables Expectations, which are part of the pipeline logic), QuerySurge acts as an external auditor. It verifies data independently of the transformation logic, ensuring that the data in the Lakehouse accurately reflects the source systems and business rules. This distinction is critical: verifying that the code did what the developer intended is different from verifying that the data is correct relative to the business reality.
1.2 The Strategic Role of Independent Validation
The concept of "Black Box" testing is central to QuerySurge’s value proposition in a Databricks environment. In software engineering, it is axiomatic that the developer who writes the code should not be the sole tester of that code. The same principle applies to data engineering. If a Data Engineer writes a PySpark transformation in Databricks to calculate "Net Sales," and then writes a unit test using the same logic to verify it, they have only proved that their code is consistent with itself, not that it is factually correct.
QuerySurge allows Quality Assurance (QA) professionals and Data Stewards to define validation rules that are completely decoupled from the implementation logic. Connecting via JDBC to both the upstream Source (e.g., an SAP ERP, a Mainframe, or a Kafka stream) and the downstream Target (Databricks Delta Tables), QuerySurge compares the data point-for-point. This approach catches logic errors, transformation bugs, and ingestion failures that internal checks might miss. Furthermore, by automating this process via APIs, organizations can move from reactive "fire-fighting" to proactive "DataOps," where quality is continuously monitored.
2. Architectural Synergy: Integrating QuerySurge with Databricks
To understand the validation capabilities, one must first analyze the architectural interplay between QuerySurge and Databricks. QuerySurge is not a SaaS tool that resides in the Databricks cloud; rather, it is a distributed software platform deployed within the customer's infrastructure (on-premise or private cloud). This architecture ensures data sovereignty and security, a critical requirement for regulated industries like finance and healthcare.
(To expand the sections below, click on the +)
- 2.1 The Hub-and-Spoke Validation Architecture
- 2.2 Connectivity Mechanics: The JDBC Bridge
- 2.3 Security and Authentication Models
2.1 The Hub-and-Spoke Validation Architecture
QuerySurge operates on a distributed "Hub-and-Spoke" model designed for scalability and parallel processing.
2.1.1 The Application Server (The Hub)
The core of the system is the Application Server. This component acts as the orchestrator. It manages the repository of test assets ("QueryPairs"), handles user authentication and role-based access control (RBAC), manages test scheduling, and aggregates results. The App Server does not execute the data comparisons itself; instead, it delegates work to the Agents. This separation of concerns prevents the management interface from becoming a bottleneck during high-volume testing.
2.1.2 The Agents (The Spokes)
The Agents are the execution engines of QuerySurge. These are lightweight Java applications that can be deployed on multiple virtual machines or Docker containers. When a validation scenario is triggered, the App Server dispatches QueryPairs to available Agents.
- Execution Flow: The Agent receives the SQL for the Source and the SQL for the Target. It opens simultaneous JDBC connections to both systems.
- Data Retrieval: The Agent sends the queries to the respective systems. For Databricks, the query is executed on the Databricks Compute Cluster (or SQL Warehouse), leveraging the massive parallel processing power of Spark.
- Comparison: The results of the queries are streamed back to the Agent. The Agent then performs the comparison algorithm (e.g., row-by-row matching, primary key matching) and reports the status (Pass/Fail) back to the Hub.
This architecture is significant for Big Data validation because it allows for horizontal scaling. If the validation workload increases (e.g., verifying 100 tables simultaneously), an organization can simply spin up more Agents to parallelize the throughput without altering the core infrastructure.
2.2 Connectivity Mechanics: The JDBC Bridge
The lifeline between QuerySurge and Databricks is the JDBC (Java Database Connectivity) standard. While Databricks is built on Apache Spark, it exposes a SQL interface that allows it to behave like a traditional relational database for external tools.
2.2.1 Driver Configuration
To enable connectivity, the specific Databricks JDBC Driver must be deployed to every QuerySurge Agent. The recommended driver is the DatabricksJDBC4x.jar.
- Driver Class: The configuration uses the class com.databricks.client.jdbc.Driver. It is imperative to avoid legacy Simba drivers (com.simba.spark...) to ensure compatibility with modern Databricks features like Photon execution engines and Unity Catalog.
2.2.2 The Connection String
The JDBC URL defines the parameters of the connection. A correctly configured URL ensures not only connectivity but also performance and security.
- Format: jdbc:databricks://<ServerHostname>:<Port>/<HTTPPath>;transportMode=http;SSL=1;AuthMech=<Type>;
- Server Hostname: This points to the Databricks Workspace control plane.
- HTTP Path: This routes the request to the specific Compute Cluster or SQL Warehouse. Using a dedicated SQL Warehouse for validation is often recommended over an interactive cluster to ensure resource isolation and consistent performance.
2.3 Security and Authentication Models
In enterprise environments, managing credentials securely is paramount. QuerySurge supports modern authentication flows that align with Azure and AWS security best practices.
2.3.1 Service Principal Authentication
For automated pipelines (CI/CD), relying on human user credentials (username/password) is fragile and insecure. Passwords expire, and employees leave organizations. QuerySurge supports Azure Active Directory (Entra ID) Service Principals.
- Implementation: The JDBC string utilizes AuthMech=11 (for Azure AD). The OAuth2ClientId and OAuth2Secret are passed as parameters.
- Benefit: This treats QuerySurge as a distinct "non-human identity" in the Databricks workspace. Administrators can grant this Service Principal read-only access to specific tables in the Unity Catalog, enforcing the principle of least privilege. The pipeline can then run autonomously without fear of token expiration breaking the build.
2.3.2 Encryption in Transit
Given that Agents may reside in a different VPC or on-premise network than the Databricks control plane, data privacy during transit is non-negotiable. The SSL=1 parameter in the connection string enforces TLS 1.2+ encryption for all data moving between Databricks and the QuerySurge Agent. This ensures that sensitive PII or financial data extracted for validation cannot be intercepted.
QuerySurge Hub-and-Spoke Architecture Components
Component |
Role / Description |
Notes |
|---|---|---|
Application Server (Hub) |
Orchestrates all test assets, schedules runs, manages RBAC, aggregates results. |
Does not run comparisons. Delegates to Agents. |
Agents (Spokes) |
Execute QueryPairs. Run SQL on Source and Databricks, retrieve data, compare results. |
Scalable horizontally. Supports parallel validation. |
JDBC Bridge |
Connectivity layer that enables QuerySurge to talk to Databricks clusters and SQL Warehouses. |
Uses the DatabricksJDBC4x.jar driver. |
Security Layer |
Auth via service principals, OAuth, encryption. |
TLS enforced via SSL=1. |
3. Validating the Medallion Architecture: A Layer-by-Layer Strategy
The "Medallion Architecture" is the standard design pattern for Databricks Lakehouses. It organizes data into three distinct layers of increasing quality: Bronze (Raw), Silver (Cleansed), and Gold (Curated). Each transition represents a transformation step where errors can be introduced. QuerySurge provides a validation strategy tailored to the specific risks of each layer.
(To expand the sections below, click on the +)
- 3.1 The Bronze Layer: Ensuring Ingestion Integrity
- 3.2 The Silver Layer: Validating Hygiene and Transformation
- 3.3 The Gold Layer: Validating Business Logic and Aggregation
3.1 The Bronze Layer: Ensuring Ingestion Integrity
The Bronze layer acts as the landing zone. Data here should match the source system exactly. It is often stored in efficient columnar formats like Parquet or Delta.
3.1.1 Completeness Validation (Row Counts)
The most fundamental question at this stage is: "Did we get everything?" A common failure mode in ETL is the silent dropping of records due to network timeouts or buffer overflows.
- QuerySurge Approach: Create a "QueryPair" that compares the record count of the Source (e.g., an Oracle TRANSACTIONS table) with the Bronze Delta table.
- Example SQL:
- Source: SELECT COUNT(*) FROM SALES_TRANSACTIONS WHERE TRX_DATE = '2023-10-27'
- Target: SELECT COUNT(*) FROM bronze_sales_transactions WHERE transaction_date = '2023-10-27'
- Automation: This simple test serves as a crucial "smoke test." If the counts differ by even one record, the pipeline should be halted immediately. This prevents the downstream processing of incomplete data.
3.1.2 Schema Drift Detection
In the dynamic world of Big Data, upstream systems often change without warning. A new column might be added to a JSON payload, or a data type might change from Integer to String. If the Bronze ingestion job isn't configured to handle this (or if it handles it poorly), data is lost.
- QuerySurge Approach: QuerySurge can validate metadata. By querying the INFORMATION_SCHEMA of both the source and Databricks, QuerySurge can verify that column names, data types, and precisions match the expected definition. If a column is missing in the Bronze layer that exists in the Source, QuerySurge raises a "Schema Drift" alert.
3.2 The Silver Layer: Validating Hygiene and Transformation
The Silver layer is where the messy raw data becomes "Enterprise Quality." Transformations include deduplication, handling missing values, standardizing formats (e.g., state codes), and joining disparate datasets. This complexity makes it the most error-prone layer.
3.2.1 Deduplication Verification
Bronze data often contains duplicates (e.g., the same log entry sent twice). The Silver layer is responsible for filtering these out.
- QuerySurge Approach: A "Duplicate Key" test validates that the uniqueness constraint holds true.
- Target SQL:
- Validation Logic: If this query returns any rows, the test fails. This confirms that the deduplication logic in the Databricks notebook (e.g., dropDuplicates()) functioned correctly.
3.2.2 Transformation Logic Testing
If the business rule states "If Customer Region is NULL, set to 'Unknown'", QuerySurge verifies this independently.
- Target SQL: SELECT COUNT(*) FROM silver_customers WHERE region IS NULL
- If the result > 0, the transformation failed.
- Joins: Silver tables often result from joining multiple Bronze tables (e.g., Orders + Customers). QuerySurge allows the QA engineer to write a standard SQL join between the Bronze tables and compare the output against the Silver table. This verifies that the JOIN logic (Left, Inner, Outer) in the ETL code was implemented correctly and didn't result in Cartesian products or data loss.
3.3 The Gold Layer: Validating Business Logic and Aggregation
The Gold layer powers dashboards and ML models. Data here is often denormalized and aggregated (e.g., "Monthly Sales by Store"). Validation here is about mathematical accuracy.
3.3.1 Aggregate Balancing
QuerySurge verifies that the sum of the parts equals the whole.
- Strategy: Compare the SUM(Sales_Amount) from the Silver detailed transactions against the Total_Sales in the Gold summary table.
- Target SQL:
- Query A (Silver): SELECT Store_ID, SUM(Amount) FROM silver_transactions GROUP BY Store_ID
- Query B (Gold): SELECT Store_ID, Total_Sales FROM gold_store_monthly_summary
- Comparison: QuerySurge joins these results on Store_ID and checks if the values match. This catches aggregation errors, grouping set errors, and rounding issues.
3.3.2 Regression Testing
When ETL code is refactored (e.g., optimizing a Spark job for performance), there is a risk of introducing subtle bugs. QuerySurge allows for "Regression Testing" by comparing the Gold layer output of the new pipeline against the Gold layer output of the old pipeline (or a backup). This ensures that performance optimizations do not come at the cost of data accuracy.
Validation Objectives by Medallion Layer
Layer |
Validation Focus |
Key Risks |
QuerySurge Checks |
|---|---|---|---|
Bronze (Raw) |
Completeness, schema fidelity |
Schema drift, missing records, ingestion corruption |
Row counts, schema metadata validation |
Silver (Cleansed) |
Deduplication, hygiene, transformations, joins |
Duplicates, null handling failures, incorrect joins |
Duplicate checks, business-rule tests, join validation |
Gold (Curated) |
Aggregations, business logic accuracy |
Wrong totals, grouping errors, regression impacts |
Aggregate balancing, regression testing |
4. The Challenge of Semi-Structured Data: JSON and Arrays
One of the defining features of Databricks is its ability to handle semi-structured data like JSON, Avro, and Parquet. Traditional SQL testing tools struggle with nested arrays and hierarchical structures. QuerySurge, leveraging the full power of Spark SQL via JDBC, handles this complexity natively.
(To expand the sections below, click on the +)
- 4.1 Flattening Nested Structures for Validation
- 4.2 Handling JSON Strings with from_json
- 4.3 Validating Schema Evolution
4.1 Flattening Nested Structures for Validation
Consider a typical use case: A source system sends JSON logs where a single "Order" object contains an array of "Items."
JSON Structure:
In the Bronze table, this might be stored as a string or a Struct/Array type. To validate this against a flat Source table (where Order and Items might be joined), the validator needs to "explode" the array.
4.1.1 The EXPLODE Strategy
QuerySurge users can write valid Spark SQL in their test queries to flatten this data on the fly during the validation read.
- QuerySurge SQL Example:
- Mechanism: When the QuerySurge Agent sends this query to Databricks, Spark executes the explode function, transforming the single JSON row into multiple rows (one per item). This flattened result set is then returned to the Agent.
- Comparison: The Agent can now compare this result set row-for-row against a standard SQL query from the Source database (e.g., SELECT * FROM ORDER_ITEMS). This capability is essential for validating NoSQL-to-Relational or JSON-to-Table migrations without writing complex custom scripts.
4.2 Handling JSON Strings with from_json
Often, data arrives in Databricks as a raw string column containing JSON. QuerySurge can parse this during the test execution.
- SQL Example:
- This allows QuerySurge to validate specific fields deep within a JSON document without requiring the data to be fully modeled in the database first. This is particularly useful for "Shift Left" testing, where QA wants to validate raw landing data immediately upon arrival.
4.3 Validating Schema Evolution
Databricks features like "Auto Loader" allow the schema to evolve automatically (adding new columns as they appear in the source). This poses a challenge for rigid testing tools. QuerySurge addresses this through Column Threshold testing.
- Configurable Strictness: A QuerySurge test can be configured to "Ignore New Columns." If the source has 5 extra columns that the target doesn't (or vice versa), the test can be set to issue a Warning rather than a Failure.
- Benefit: This prevents brittle tests that break every time a developer adds a non-critical field, while still enforcing the integrity of the core "Key Business Elements" (KBEs) that must match exactly.
5. Operationalizing Trust: DataOps and CI/CD Integration
The manual execution of these tests is insufficient for modern agile teams. The goal is Continuous Testing within a DataOps framework. QuerySurge is designed to be embedded directly into the CI/CD pipeline, acting as an automated "Quality Gate."
(To expand the sections below, click on the +)
- 5.1 The Concept of Quality Gates
- 5.2 The DevOps API
- 5.3 Integration Walkthrough: Azure DevOps
- 5.4 Integration Walkthrough: Jenkins
5.1 The Concept of Quality Gates
A Quality Gate is a checkpoint in the automated pipeline.
- Bronze Gate: After ingestion, run basic row counts. If Pass -> Trigger Silver Job. If Fail -> Stop Pipeline and Alert.
- Silver Gate: After cleaning, run duplicate checks and null checks. If Pass -> Trigger Gold Job.
- Gold Gate: After aggregation, run full regression tests. If Pass -> Publish to Tableau Server.
This automated governance ensures that bad data is quarantined early, preventing the "Garbage In, Garbage Out" phenomenon from reaching decision-makers.
5.2 The DevOps API
QuerySurge exposes a rich RESTful API (and a CLI wrapper) that allows external orchestration tools to drive the testing process. Key API capabilities include:
- runScenario: Triggers a pre-defined set of tests.
- getScenarioOutcome: Polls for the status (Pass/Fail) and detailed metrics.
- updateConnection: Dynamically changes the JDBC URL (e.g., pointing the tests from the "Dev" cluster to the "Staging" cluster during a promotion).
5.3 Integration Walkthrough: Azure DevOps
Azure DevOps is the standard CI/CD tool for many Azure Databricks shops. QuerySurge provides a native extension to simplify this integration.
5.3.1 Pipeline Implementation (YAML)
The integration typically involves three steps in the Azure Pipeline YAML:
- Run Test Suite:
This task initiates the validation job on the QuerySurge server. It is non-blocking (asynchronous). - Get Results:
This task waits for the tests to finish and retrieves the outcome. - Gate Logic (Blocking):
The task can be configured to "Fail the Build" if the QuerySurge result is "Failed." This effectively stops the deployment pipeline. If the tests fail, the new Databricks notebooks are not promoted to Production, or the data pipeline is halted.
5.4 Integration Walkthrough: Jenkins
For organizations using Jenkins, the integration is often achieved via the QuerySurge CLI or REST API within a Shell Script or Groovy script.
5.4.1 Jenkinsfile Logic
This script makes the Jenkins build status dependent on the data quality. It allows for "Nightly Build" concepts to be applied to data: every night, the pipeline runs, validates, and generates a report. If data is bad, the team wakes up to a red build and an automated Jira ticket.
CI/CD Quality Gates in the Medallion Architecture
Gate |
Trigger |
Required Validation |
Pipeline Action |
|---|---|---|---|
Bronze Gate |
After raw ingestion |
Row counts, schema checks |
Promote to Silver if pass |
Silver Gate |
After cleansing |
Deduplication, null rules, join logic |
Promote to Gold if pass |
Gold Gate |
After aggregation |
Balancing, regression tests |
Release to BI/ML if pass |
Automation Interfaces for CI/CD
Interface |
Capabilities |
Typical Use |
|---|---|---|
REST API |
Run scenarios, poll results, update connections |
Databricks Jobs, Jenkins pipelines |
CLI |
Shell-based execution for automation |
Jenkins, cron jobs |
Azure DevOps Extension |
First-class QuerySurge integration |
YAML pipelines, gated deployments |
6. Performance Engineering: Extract vs. Pushdown
A common concern in Big Data testing is performance. Moving billions of rows for validation seems counter-intuitive. QuerySurge addresses this through architectural choices and testing strategies.
(To expand the sections below, click on the +)
6.1 The Extraction Architecture
It is important to clarify that QuerySurge works by extracting the result sets of the Source and Target queries back to its own QuerySurge Agents for comparison. It does not push the comparison logic down into Databricks itself (unlike some tools that create temporary "diff" tables inside the warehouse).
- Implication: If you try to run SELECT * FROM Big_Table on a 10-billion row table, the network transfer will be the bottleneck.
- Optimization: The "Pushdown" happens in the Query Design.
- Instead of SELECT *, the test should utilize the compute power of Databricks to aggregate the data before sending it to QuerySurge.
- Example: SELECT Region, Product, SUM(Sales) FROM Big_Table GROUP BY Region, Product.
- Databricks (Spark) executes this massive aggregation efficiently. QuerySurge only receives the aggregated summary (e.g., 10,000 rows). This allows validation of the entire dataset's logical integrity without moving petabytes of data over the network.
6.2 Agent Scaling and Partitioning
For cases where row-level detail is required (e.g., regulatory audits), QuerySurge supports partitioning.
- Technique: Split the validation into chunks.
- Test 1: Validate Jan 2023.
- Test 2: Validate Feb 2023.
- Test 3: Validate Mar 2023.
- Parallelism: These tests can be assigned to different Agents. If you have 10 Agents, you can run 10 months of validation in parallel, effectively achieving 10x throughput. This horizontal scaling allows QuerySurge to keep pace with the massive data volumes typical of Databricks environments.
Performance Approaches for Large Datasets
Method |
Description |
Advantage |
|---|---|---|
Aggregation Pushdown |
Let Databricks aggregate before returning data. |
Reduces data movement; faster tests. |
Partitioned Testing |
Split validation into time-based or logical segments. |
Parallelizable and scalable. |
Multiple Agents |
Scale out QuerySurge execution layer. |
Linear throughput improvements. |
7. Reporting, Compliance, and Business Value
The ultimate output of QuerySurge is not just a "Pass" or "Fail," but "Data Intelligence."
(To expand the sections below, click on the +)
7.1 Compliance and Audit Trails
In industries like Banking (BCBS 239) and Pharma (21 CFR Part 11), the proof of validation is mandatory. QuerySurge automatically logs every test execution.
- The Audit Log: It records Who ran the test, When it ran, What SQL was executed, and the Exact Outcome.
- Immutable History: This history cannot be altered, providing a secure chain of evidence for auditors. An auditor can log in and see that on November 12th, the "Risk Aggregation" table was validated against the "Trade Source" system and matched 100%.
7.2 Ready for Analytics
QuerySurge’s "Ready for Analytics" feature exposes its internal database to BI tools.
- The Data Quality Dashboard: Organizations can build a Power BI or Tableau dashboard that connects to QuerySurge.
- Metrics: This dashboard can display trends like "Data Reliability Score per Week," "Defect Rate by Source System," or "Validation Coverage."
- Strategic Value: This elevates Data Quality from a technical issue to a business metric. Executives can see a "Health Score" for their data assets alongside their financial KPIs.
Audit and Reporting Capabilities
Feature |
Purpose |
Impact |
|---|---|---|
Immutable Audit History |
Tracks who ran tests, how, and when. |
Required for regulated industries. |
Data Quality Dashboards |
Visualizes trends in defects and coverage. |
Makes data quality a business metric. |
Ready for Analytics |
Exposes QuerySurge test results for BI tools. |
Enables enterprise-wide visibility. |
8. Conclusion: The Path to Trusted Data
The integration of QuerySurge with the Databricks Lakehouse Platform offers a comprehensive solution to the "Data Trust Gap." By automating the validation of the Medallion Architecture—from the raw ingestion of the Bronze layer to the complex transformations of the Silver layer and the high-value aggregations of the Gold layer—QuerySurge ensures that the data driving the enterprise is accurate, complete, and compliant.
Furthermore, by embedding this validation into the CI/CD pipeline via robust APIs, organizations can operationalize a true DataOps culture. Quality is no longer an afterthought or a manual bottleneck; it is an automated gatekeeper. Whether flattening complex JSON arrays, detecting subtle schema drift, or scaling to validate billions of rows, QuerySurge provides the necessary tooling to secure the data pipeline. In doing so, it allows organizations to fully leverage the power of Databricks with the confidence that their insights are built on a foundation of verifiable truth.
Works cited
- Databricks — QuerySurge, accessed December 6, 2025
https://www.querysurge.com/solutions/integrations/databricks - Search | QuerySurge, accessed December 6, 2025
https://www.querysurge.com/search/p35 - Solving the Enterprise Data Validation Challenge — QuerySurge, accessed December 6, 2025
https://www.querysurge.com/business-challenges/solving-enterprise-data-validation - Creating Quality Gates in the Medallion Architecture with Pandera — Endjin, accessed December 6, 2025
https://endjin.com/blog/2025/04/creating-quality-gates-in-the-medallion-architecture-with-pandera - 10 Powerful Features to Simplify Semi-structured Data Management in the Databricks Lakehouse, accessed December 6, 2025
https://www.databricks.com/blog/2021/11/11/10-powerful-features-to-simplify-semi-structured-data-management-in-the-databricks-lakehouse.html - Competitive Analysis: QuerySurge vs DataGaps, accessed December 6, 2025
https://www.querysurge.com/product-tour/competitive-analysis/datagaps - ETL Testing — QuerySurge, accessed December 6, 2025
https://www.querysurge.com/solutions/etl-testing - QuerySurge: Home, accessed December 6, 2025
https://www.querysurge.com/ - QuerySurge Product Architecture, accessed December 6, 2025
https://www.querysurge.com/product-tour/product-architecture - QuerySurge Architecture and Architecture Diagram — Customer Support, accessed December 6, 2025
https://querysurge.zendesk.com/hc/en-us/articles/206086453-QuerySurge-Architecture-and-Architecture-Diagram - Search | QuerySurge, accessed December 6, 2025
https://www.querysurge.com/search/p36 - Connecting QuerySurge to Azure Databricks with Service Principal …, accessed December 6, 2025
https://querysurge.zendesk.com/hc/en-us/articles/39466218305933-Connecting-QuerySurge-to-Azure-Databricks-with-Service-Principal - Configure a connection to Databricks using the Databricks JDBC …, accessed December 6, 2025
https://learn.microsoft.com/en-us/azure/databricks/integrations/jdbc-oss/configure - QuerySurge Features, accessed December 6, 2025
https://www.querysurge.com/product-tour/features - Medallion Architecture: Key Concepts and Examples — DataForge, accessed December 6, 2025
https://www.dataforgelabs.com/data-transformation-tools/medallion-architecture - Medallion Architecture and Scalable Data Modeling in Microsoft Fabric — phData, accessed December 6, 2025
https://www.phdata.io/blog/medallion-architecture-and-scalable-data-modeling-in-microsoft-fabric/ - What is the medallion lakehouse architecture? — Azure Databricks | Microsoft Learn, accessed December 6, 2025
https://learn.microsoft.com/en-us/azure/databricks/lakehouse/medallion - Automating Data Quality Gates in Your Medallion Architecture with… — QuerySurge, accessed December 6, 2025
https://www.querysurge.com/company/resource-center/events/automating-data-quality-gates-in-your-medallion-architecture-with-querysurges-devops-for-data - Query JSON strings | Databricks on AWS, accessed December 6, 2025
https://docs.databricks.com/aws/en/semi-structured/json - Working with Semi-Strcutured JSON Data in Databricks. | by Krishna yogi — Medium, accessed December 6, 2025
https://krishna-yogik.medium.com/working-with-semi-strcutured-json-data-in-databricks-ce8b48fbb4e2 - DevOps for Data and your Data Project — QuerySurge, accessed December 6, 2025
https://www.querysurge.com/solutions/devops-for-data - Using QuerySurge Integration with Azure DevOps (Versions: 8.2+) — Customer Support, accessed December 6, 2025
https://querysurge.zendesk.com/hc/en-us/articles/360056713972-Using-QuerySurge-Integration-with-Azure-DevOps-Versions‑8 – 2 - QuerySurge Integration with Azure DevOps — Visual Studio …, accessed December 6, 2025
https://marketplace.visualstudio.com/items?itemName=RTTS.querysurge-tasks - Scheduling QuerySurge with Jenkins and the QuerySurge Base CLI — Customer Support, accessed December 6, 2025
https://querysurge.zendesk.com/hc/en-us/articles/214610203-Scheduling-QuerySurge-with-Jenkins-and-the-QuerySurge-Base-API - Big Data Testing | QuerySurge, accessed December 6, 2025
https://www.querysurge.com/solutions/testing-big-data - Achieving Data Quality at Speed — QuerySurge, accessed December 6, 2025
https://www.querysurge.com/business-challenges/speed-up-testing - Azure Data Lake| QuerySurge, accessed December 6, 2025
https://www.querysurge.com/solutions/integrations/azure-data-lake - Ready for Analytics — QuerySurge, accessed December 6, 2025
https://www.querysurge.com/solutions/ready-for-analytics - Making Sense of Testing Results | QuerySurge, accessed December 6, 2025
https://www.querysurge.com/business-challenges/data-intelligence-data-analytics



