White Paper

Operationalizing Data Trust:
A Comprehensive Analysis of Automated Validation
in the Databricks Lakehouse Architecture

Home
White Papers
An Analysis of Automated Validation in the Databricks Lakehouse Architecture

1. The Imperative of Data Integrity in the Lakehouse Era

The ascent of the Data Lakehouse architecture, epitomized by the Databricks platform, represents a watershed moment in enterprise data management. By converging the flexibility of data lakes with the performance and governance of data warehouses, organizations have unlocked unprecedented capabilities in artificial intelligence, machine learning, and real-time analytics. However, this architectural unification introduces a complex set of challenges regarding data integrity. As data volume, velocity, and variety explode, the traditional manual methods of validation - often described as "stare and compare" - have become dangerously obsolete. In this landscape, integrating QuerySurge, a purpose-built automated data testing solution, with Databricks offers a robust mechanism to enforce quality. This report provides an exhaustive examination of how QuerySurge complements Databricks, with a specific focus on validating the Medallion Architecture (Bronze, Silver, Gold layers) and automating Quality Gates within a DataOps CI/CD pipeline.

(To expand the sections below, click on the +)

1.1 The Erosion of Trust in Modern Data Systems
1.2 The Strategic Role of Independent Validation

1.1 The Erosion of Trust in Modern Data Systems

The fundamental promise of the Lakehouse is to serve as a "Single Source of Truth." Yet, without rigorous, automated validation, this promise often devolves into a "Data Swamp." The consequences of data defects are severe: financial reports may be misstated, recommendation engines may serve irrelevant content, and predictive maintenance models may fail to detect critical equipment faults.

In traditional data warehousing, the "Schema-on-Write" paradigm enforced a level of structure at the point of ingestion. If data did not match the schema, the load failed. The Lakehouse, particularly in its initial ingestion layers (Bronze), often employs "Schema-on-Read" or flexible ingestion to accommodate the chaotic nature of big data streams (IoT logs, clickstream data, JSON dumps). While this flexibility is a feature, it is also a vector for defects. Schema drift, truncated files, and silent data corruption can propagate unnoticed through the pipeline until they manifest in a boardroom dashboard.

QuerySurge addresses this "Trust Gap" by introducing an independent validation layer. Unlike internal consistency checks (such as Delta Live Tables Expectations, which are part of the pipeline logic), QuerySurge acts as an external auditor. It verifies data independently of the transformation logic, ensuring that the data in the Lakehouse accurately reflects the source systems and business rules. This distinction is critical: verifying that the code did what the developer intended is different from verifying that the data is correct relative to the business reality.

1.2 The Strategic Role of Independent Validation

The concept of "Black Box" testing is central to QuerySurge’s value proposition in a Databricks environment. In software engineering, it is axiomatic that the developer who writes the code should not be the sole tester of that code. The same principle applies to data engineering. If a Data Engineer writes a PySpark transformation in Databricks to calculate "Net Sales," and then writes a unit test using the same logic to verify it, they have only proved that their code is consistent with itself, not that it is factually correct.

QuerySurge allows Quality Assurance (QA) professionals and Data Stewards to define validation rules that are completely decoupled from the implementation logic. Connecting via JDBC to both the upstream Source (e.g., an SAP ERP, a Mainframe, or a Kafka stream) and the downstream Target (Databricks Delta Tables), QuerySurge compares the data point-for-point. This approach catches logic errors, transformation bugs, and ingestion failures that internal checks might miss. Furthermore, by automating this process via APIs, organizations can move from reactive "fire-fighting" to proactive "DataOps," where quality is continuously monitored.

2. Architectural Synergy: Integrating QuerySurge with Databricks

To understand the validation capabilities, one must first analyze the architectural interplay between QuerySurge and Databricks. QuerySurge is not a SaaS tool that resides in the Databricks cloud; rather, it is a distributed software platform deployed within the customer's infrastructure (on-premise or private cloud). This architecture ensures data sovereignty and security, a critical requirement for regulated industries like finance and healthcare.

(To expand the sections below, click on the +)

2.1 The Hub-and-Spoke Validation Architecture
2.2 Connectivity Mechanics: The JDBC Bridge
2.3 Security and Authentication Models

2.1 The Hub-and-Spoke Validation Architecture

QuerySurge operates on a distributed "Hub-and-Spoke" model designed for scalability and parallel processing.

2.1.1 The Application Server (The Hub)

The core of the system is the Application Server. This component acts as the orchestrator. It manages the repository of test assets ("QueryPairs"), handles user authentication and role-based access control (RBAC), manages test scheduling, and aggregates results. The App Server does not execute the data comparisons itself; instead, it delegates work to the Agents. This separation of concerns prevents the management interface from becoming a bottleneck during high-volume testing.

2.1.2 The Agents (The Spokes)

The Agents are the execution engines of QuerySurge. These are lightweight Java applications that can be deployed on multiple virtual machines or Docker containers. When a validation scenario is triggered, the App Server dispatches QueryPairs to available Agents.

Execution Flow: The Agent receives the SQL for the Source and the SQL for the Target. It opens simultaneous JDBC connections to both systems.
Data Retrieval: The Agent sends the queries to the respective systems. For Databricks, the query is executed on the Databricks Compute Cluster (or SQL Warehouse), leveraging the massive parallel processing power of Spark.
Comparison: The results of the queries are streamed back to the Agent. The Agent then performs the comparison algorithm (e.g., row-by-row matching, primary key matching) and reports the status (Pass/Fail) back to the Hub.

This architecture is significant for Big Data validation because it allows for horizontal scaling. If the validation workload increases (e.g., verifying 100 tables simultaneously), an organization can simply spin up more Agents to parallelize the throughput without altering the core infrastructure.

2.2 Connectivity Mechanics: The JDBC Bridge

The lifeline between QuerySurge and Databricks is the JDBC (Java Database Connectivity) standard. While Databricks is built on Apache Spark, it exposes a SQL interface that allows it to behave like a traditional relational database for external tools.

2.2.1 Driver Configuration

To enable connectivity, the specific Databricks JDBC Driver must be deployed to every QuerySurge Agent. The recommended driver is the DatabricksJDBC4x.jar.

Driver Class: The configuration uses the class com.databricks.client.jdbc.Driver. It is imperative to avoid legacy Simba drivers (com.simba.spark...) to ensure compatibility with modern Databricks features like Photon execution engines and Unity Catalog.

2.2.2 The Connection String

The JDBC URL defines the parameters of the connection. A correctly configured URL ensures not only connectivity but also performance and security.

Format: jdbc:databricks://<ServerHostname>:<Port>/<HTTPPath>;transportMode=http;SSL=1;AuthMech=<Type>;
Server Hostname: This points to the Databricks Workspace control plane.
HTTP Path: This routes the request to the specific Compute Cluster or SQL Warehouse. Using a dedicated SQL Warehouse for validation is often recommended over an interactive cluster to ensure resource isolation and consistent performance.

2.3 Security and Authentication Models

In enterprise environments, managing credentials securely is paramount. QuerySurge supports modern authentication flows that align with Azure and AWS security best practices.

2.3.1 Service Principal Authentication

For automated pipelines (CI/CD), relying on human user credentials (username/password) is fragile and insecure. Passwords expire, and employees leave organizations. QuerySurge supports Azure Active Directory (Entra ID) Service Principals.

Implementation: The JDBC string utilizes AuthMech=11 (for Azure AD). The OAuth2ClientId and OAuth2Secret are passed as parameters.
Benefit: This treats QuerySurge as a distinct "non-human identity" in the Databricks workspace. Administrators can grant this Service Principal read-only access to specific tables in the Unity Catalog, enforcing the principle of least privilege. The pipeline can then run autonomously without fear of token expiration breaking the build.

2.3.2 Encryption in Transit

Given that Agents may reside in a different VPC or on-premise network than the Databricks control plane, data privacy during transit is non-negotiable. The SSL=1 parameter in the connection string enforces TLS 1.2+ encryption for all data moving between Databricks and the QuerySurge Agent. This ensures that sensitive PII or financial data extracted for validation cannot be intercepted.

QuerySurge Hub-and-Spoke Architecture Components

Component	Role / Description	Notes
Application Server (Hub)	Orchestrates all test assets, schedules runs, manages RBAC, aggregates results.	Does not run comparisons. Delegates to Agents.
Agents (Spokes)	Execute QueryPairs. Run SQL on Source and Databricks, retrieve data, compare results.	Scalable horizontally. Supports parallel validation.
JDBC Bridge	Connectivity layer that enables QuerySurge to talk to Databricks clusters and SQL Warehouses.	Uses the DatabricksJDBC4x.jar driver.
Security Layer	Auth via service principals, OAuth, encryption.	TLS enforced via SSL=1.

3. Validating the Medallion Architecture: A Layer-by-Layer Strategy

The "Medallion Architecture" is the standard design pattern for Databricks Lakehouses. It organizes data into three distinct layers of increasing quality: Bronze (Raw), Silver (Cleansed), and Gold (Curated). Each transition represents a transformation step where errors can be introduced. QuerySurge provides a validation strategy tailored to the specific risks of each layer.

(To expand the sections below, click on the +)

3.1 The Bronze Layer: Ensuring Ingestion Integrity
3.2 The Silver Layer: Validating Hygiene and Transformation
3.3 The Gold Layer: Validating Business Logic and Aggregation

3.1 The Bronze Layer: Ensuring Ingestion Integrity

The Bronze layer acts as the landing zone. Data here should match the source system exactly. It is often stored in efficient columnar formats like Parquet or Delta.

3.1.1 Completeness Validation (Row Counts)

The most fundamental question at this stage is: "Did we get everything?" A common failure mode in ETL is the silent dropping of records due to network timeouts or buffer overflows.

QuerySurge Approach: Create a "QueryPair" that compares the record count of the Source (e.g., an Oracle TRANSACTIONS table) with the Bronze Delta table.
Example SQL:
Source: SELECT COUNT(*) FROM SALES_TRANSACTIONS WHERE TRX_DATE = '2023-10-27'
Target: SELECT COUNT(*) FROM bronze_sales_transactions WHERE transaction_date = '2023-10-27'
Automation: This simple test serves as a crucial "smoke test." If the counts differ by even one record, the pipeline should be halted immediately. This prevents the downstream processing of incomplete data.

3.1.2 Schema Drift Detection

In the dynamic world of Big Data, upstream systems often change without warning. A new column might be added to a JSON payload, or a data type might change from Integer to String. If the Bronze ingestion job isn't configured to handle this (or if it handles it poorly), data is lost.

QuerySurge Approach: QuerySurge can validate metadata. By querying the INFORMATION_SCHEMA of both the source and Databricks, QuerySurge can verify that column names, data types, and precisions match the expected definition. If a column is missing in the Bronze layer that exists in the Source, QuerySurge raises a "Schema Drift" alert.

3.2 The Silver Layer: Validating Hygiene and Transformation

The Silver layer is where the messy raw data becomes "Enterprise Quality." Transformations include deduplication, handling missing values, standardizing formats (e.g., state codes), and joining disparate datasets. This complexity makes it the most error-prone layer.

3.2.1 Deduplication Verification

Bronze data often contains duplicates (e.g., the same log entry sent twice). The Silver layer is responsible for filtering these out.

QuerySurge Approach: A "Duplicate Key" test validates that the uniqueness constraint holds true.
- Target SQL:
- Validation Logic: If this query returns any rows, the test fails. This confirms that the deduplication logic in the Databricks notebook (e.g., dropDuplicates()) functioned correctly.

3.2.2 Transformation Logic Testing

If the business rule states "If Customer Region is NULL, set to 'Unknown'", QuerySurge verifies this independently.

Target SQL: SELECT COUNT(*) FROM silver_customers WHERE region IS NULL
If the result > 0, the transformation failed.
Joins: Silver tables often result from joining multiple Bronze tables (e.g., Orders + Customers). QuerySurge allows the QA engineer to write a standard SQL join between the Bronze tables and compare the output against the Silver table. This verifies that the JOIN logic (Left, Inner, Outer) in the ETL code was implemented correctly and didn't result in Cartesian products or data loss.

3.3 The Gold Layer: Validating Business Logic and Aggregation

The Gold layer powers dashboards and ML models. Data here is often denormalized and aggregated (e.g., "Monthly Sales by Store"). Validation here is about mathematical accuracy.

3.3.1 Aggregate Balancing

QuerySurge verifies that the sum of the parts equals the whole.

Strategy: Compare the SUM(Sales_Amount) from the Silver detailed transactions against the Total_Sales in the Gold summary table.
Target SQL:
Query A (Silver): SELECT Store_ID, SUM(Amount) FROM silver_transactions GROUP BY Store_ID
Query B (Gold): SELECT Store_ID, Total_Sales FROM gold_store_monthly_summary
Comparison: QuerySurge joins these results on Store_ID and checks if the values match. This catches aggregation errors, grouping set errors, and rounding issues.

3.3.2 Regression Testing

When ETL code is refactored (e.g., optimizing a Spark job for performance), there is a risk of introducing subtle bugs. QuerySurge allows for "Regression Testing" by comparing the Gold layer output of the new pipeline against the Gold layer output of the old pipeline (or a backup). This ensures that performance optimizations do not come at the cost of data accuracy.

Validation Objectives by Medallion Layer

Layer	Validation Focus	Key Risks	QuerySurge Checks
Bronze (Raw)	Completeness, schema fidelity	Schema drift, missing records, ingestion corruption	Row counts, schema metadata validation
Silver (Cleansed)	Deduplication, hygiene, transformations, joins	Duplicates, null handling failures, incorrect joins	Duplicate checks, business-rule tests, join validation
Gold (Curated)	Aggregations, business logic accuracy	Wrong totals, grouping errors, regression impacts	Aggregate balancing, regression testing

4. The Challenge of Semi-Structured Data: JSON and Arrays

One of the defining features of Databricks is its ability to handle semi-structured data like JSON, Avro, and Parquet. Traditional SQL testing tools struggle with nested arrays and hierarchical structures. QuerySurge, leveraging the full power of Spark SQL via JDBC, handles this complexity natively.

(To expand the sections below, click on the +)

4.1 Flattening Nested Structures for Validation
4.2 Handling JSON Strings with from_json
4.3 Validating Schema Evolution

4.1 Flattening Nested Structures for Validation

Consider a typical use case: A source system sends JSON logs where a single "Order" object contains an array of "Items."

JSON Structure:

In the Bronze table, this might be stored as a string or a Struct/Array type. To validate this against a flat Source table (where Order and Items might be joined), the validator needs to "explode" the array.

4.1.1 The EXPLODE Strategy

QuerySurge users can write valid Spark SQL in their test queries to flatten this data on the fly during the validation read.

QuerySurge SQL Example:
Mechanism: When the QuerySurge Agent sends this query to Databricks, Spark executes the explode function, transforming the single JSON row into multiple rows (one per item). This flattened result set is then returned to the Agent.
Comparison: The Agent can now compare this result set row-for-row against a standard SQL query from the Source database (e.g., SELECT * FROM ORDER_ITEMS). This capability is essential for validating NoSQL-to-Relational or JSON-to-Table migrations without writing complex custom scripts.

4.2 Handling JSON Strings with from_json

Often, data arrives in Databricks as a raw string column containing JSON. QuerySurge can parse this during the test execution.

SQL Example:
This allows QuerySurge to validate specific fields deep within a JSON document without requiring the data to be fully modeled in the database first. This is particularly useful for "Shift Left" testing, where QA wants to validate raw landing data immediately upon arrival.

4.3 Validating Schema Evolution

Databricks features like "Auto Loader" allow the schema to evolve automatically (adding new columns as they appear in the source). This poses a challenge for rigid testing tools. QuerySurge addresses this through Column Threshold testing.

Configurable Strictness: A QuerySurge test can be configured to "Ignore New Columns." If the source has 5 extra columns that the target doesn't (or vice versa), the test can be set to issue a Warning rather than a Failure.
Benefit: This prevents brittle tests that break every time a developer adds a non-critical field, while still enforcing the integrity of the core "Key Business Elements" (KBEs) that must match exactly.

5. Operationalizing Trust: DataOps and CI/CD Integration

The manual execution of these tests is insufficient for modern agile teams. The goal is Continuous Testing within a DataOps framework. QuerySurge is designed to be embedded directly into the CI/CD pipeline, acting as an automated "Quality Gate."

(To expand the sections below, click on the +)

5.1 The Concept of Quality Gates
5.2 The DevOps API
5.3 Integration Walkthrough: Azure DevOps
5.4 Integration Walkthrough: Jenkins

5.1 The Concept of Quality Gates

A Quality Gate is a checkpoint in the automated pipeline.

Bronze Gate: After ingestion, run basic row counts. If Pass -> Trigger Silver Job. If Fail -> Stop Pipeline and Alert.
Silver Gate: After cleaning, run duplicate checks and null checks. If Pass -> Trigger Gold Job.
Gold Gate: After aggregation, run full regression tests. If Pass -> Publish to Tableau Server.
This automated governance ensures that bad data is quarantined early, preventing the "Garbage In, Garbage Out" phenomenon from reaching decision-makers.

5.2 The DevOps API

QuerySurge exposes a rich RESTful API (and a CLI wrapper) that allows external orchestration tools to drive the testing process. Key API capabilities include:

runScenario: Triggers a pre-defined set of tests.
getScenarioOutcome: Polls for the status (Pass/Fail) and detailed metrics.
updateConnection: Dynamically changes the JDBC URL (e.g., pointing the tests from the "Dev" cluster to the "Staging" cluster during a promotion).

5.3 Integration Walkthrough: Azure DevOps

Azure DevOps is the standard CI/CD tool for many Azure Databricks shops. QuerySurge provides a native extension to simplify this integration.

5.3.1 Pipeline Implementation (YAML)

The integration typically involves three steps in the Azure Pipeline YAML:

Run Test Suite:
This task initiates the validation job on the QuerySurge server. It is non-blocking (asynchronous).
Get Results:
This task waits for the tests to finish and retrieves the outcome.
Gate Logic (Blocking):
The task can be configured to "Fail the Build" if the QuerySurge result is "Failed." This effectively stops the deployment pipeline. If the tests fail, the new Databricks notebooks are not promoted to Production, or the data pipeline is halted.

5.4 Integration Walkthrough: Jenkins

For organizations using Jenkins, the integration is often achieved via the QuerySurge CLI or REST API within a Shell Script or Groovy script.

5.4.1 Jenkinsfile Logic

This script makes the Jenkins build status dependent on the data quality. It allows for "Nightly Build" concepts to be applied to data: every night, the pipeline runs, validates, and generates a report. If data is bad, the team wakes up to a red build and an automated Jira ticket.

CI/CD Quality Gates in the Medallion Architecture

Gate	Trigger	Required Validation	Pipeline Action
Bronze Gate	After raw ingestion	Row counts, schema checks	Promote to Silver if pass
Silver Gate	After cleansing	Deduplication, null rules, join logic	Promote to Gold if pass
Gold Gate	After aggregation	Balancing, regression tests	Release to BI/ML if pass

Automation Interfaces for CI/CD

Interface	Capabilities	Typical Use
REST API	Run scenarios, poll results, update connections	Databricks Jobs, Jenkins pipelines
CLI	Shell-based execution for automation	Jenkins, cron jobs
Azure DevOps Extension	First-class QuerySurge integration	YAML pipelines, gated deployments

6. Performance Engineering: Extract vs. Pushdown

A common concern in Big Data testing is performance. Moving billions of rows for validation seems counter-intuitive. QuerySurge addresses this through architectural choices and testing strategies.

(To expand the sections below, click on the +)

6.1 The Extraction Architecture
6.2 Agent Scaling and Partitioning

6.1 The Extraction Architecture

It is important to clarify that QuerySurge works by extracting the result sets of the Source and Target queries back to its own QuerySurge Agents for comparison. It does not push the comparison logic down into Databricks itself (unlike some tools that create temporary "diff" tables inside the warehouse).

Implication: If you try to run SELECT * FROM Big_Table on a 10-billion row table, the network transfer will be the bottleneck.
Optimization: The "Pushdown" happens in the Query Design.
- Instead of SELECT *, the test should utilize the compute power of Databricks to aggregate the data before sending it to QuerySurge.
- Example: SELECT Region, Product, SUM(Sales) FROM Big_Table GROUP BY Region, Product.
- Databricks (Spark) executes this massive aggregation efficiently. QuerySurge only receives the aggregated summary (e.g., 10,000 rows). This allows validation of the entire dataset's logical integrity without moving petabytes of data over the network.

6.2 Agent Scaling and Partitioning

For cases where row-level detail is required (e.g., regulatory audits), QuerySurge supports partitioning.

Technique: Split the validation into chunks.
- Test 1: Validate Jan 2023.
- Test 2: Validate Feb 2023.
- Test 3: Validate Mar 2023.
Parallelism: These tests can be assigned to different Agents. If you have 10 Agents, you can run 10 months of validation in parallel, effectively achieving 10x throughput. This horizontal scaling allows QuerySurge to keep pace with the massive data volumes typical of Databricks environments.

Performance Approaches for Large Datasets

Method	Description	Advantage
Aggregation Pushdown	Let Databricks aggregate before returning data.	Reduces data movement; faster tests.
Partitioned Testing	Split validation into time-based or logical segments.	Parallelizable and scalable.
Multiple Agents	Scale out QuerySurge execution layer.	Linear throughput improvements.

7. Reporting, Compliance, and Business Value

The ultimate output of QuerySurge is not just a "Pass" or "Fail," but "Data Intelligence."

(To expand the sections below, click on the +)

7.1 Compliance and Audit Trails
7.2 Ready for Analytics

7.1 Compliance and Audit Trails

In industries like Banking (BCBS 239) and Pharma (21 CFR Part 11), the proof of validation is mandatory. QuerySurge automatically logs every test execution.

The Audit Log: It records Who ran the test, When it ran, What SQL was executed, and the Exact Outcome.
Immutable History: This history cannot be altered, providing a secure chain of evidence for auditors. An auditor can log in and see that on November 12th, the "Risk Aggregation" table was validated against the "Trade Source" system and matched 100%.

7.2 Ready for Analytics

QuerySurge’s "Ready for Analytics" feature exposes its internal database to BI tools.

The Data Quality Dashboard: Organizations can build a Power BI or Tableau dashboard that connects to QuerySurge.
Metrics: This dashboard can display trends like "Data Reliability Score per Week," "Defect Rate by Source System," or "Validation Coverage."
Strategic Value: This elevates Data Quality from a technical issue to a business metric. Executives can see a "Health Score" for their data assets alongside their financial KPIs.

Audit and Reporting Capabilities

Feature	Purpose	Impact
Immutable Audit History	Tracks who ran tests, how, and when.	Required for regulated industries.
Data Quality Dashboards	Visualizes trends in defects and coverage.	Makes data quality a business metric.
Ready for Analytics	Exposes QuerySurge test results for BI tools.	Enables enterprise-wide visibility.

8. Conclusion: The Path to Trusted Data

The integration of QuerySurge with the Databricks Lakehouse Platform offers a comprehensive solution to the "Data Trust Gap." By automating the validation of the Medallion Architecture—from the raw ingestion of the Bronze layer to the complex transformations of the Silver layer and the high-value aggregations of the Gold layer—QuerySurge ensures that the data driving the enterprise is accurate, complete, and compliant.

Furthermore, by embedding this validation into the CI/CD pipeline via robust APIs, organizations can operationalize a true DataOps culture. Quality is no longer an afterthought or a manual bottleneck; it is an automated gatekeeper. Whether flattening complex JSON arrays, detecting subtle schema drift, or scaling to validate billions of rows, QuerySurge provides the necessary tooling to secure the data pipeline. In doing so, it allows organizations to fully leverage the power of Databricks with the confidence that their insights are built on a foundation of verifiable truth.

Want to learn more about how QuerySurge delivers continuous, automated testing for your Databricks data?

Learn More

Works cited

Databricks — QuerySurge, accessed December 6, 2025
https://www.querysurge.com/solutions/integrations/databricks
Search | QuerySurge, accessed December 6, 2025
https://www.querysurge.com/search/p35
Solving the Enterprise Data Validation Challenge — QuerySurge, accessed December 6, 2025
https://www.querysurge.com/business-challenges/solving-enterprise-data-validation
Creating Quality Gates in the Medallion Architecture with Pandera — Endjin, accessed December 6, 2025
https://endjin.com/blog/2025/04/creating-quality-gates-in-the-medallion-architecture-with-pandera
10 Powerful Features to Simplify Semi-structured Data Management in the Databricks Lakehouse, accessed December 6, 2025
https://www.databricks.com/blog/2021/11/11/10-powerful-features-to-simplify-semi-structured-data-management-in-the-databricks-lakehouse.html
Competitive Analysis: QuerySurge vs DataGaps, accessed December 6, 2025
https://www.querysurge.com/product-tour/competitive-analysis/datagaps
ETL Testing — QuerySurge, accessed December 6, 2025
https://www.querysurge.com/solutions/etl-testing
QuerySurge: Home, accessed December 6, 2025
https://www.querysurge.com/
QuerySurge Product Architecture, accessed December 6, 2025
https://www.querysurge.com/product-tour/product-architecture
QuerySurge Architecture and Architecture Diagram — Customer Support, accessed December 6, 2025
https://querysurge.zendesk.com/hc/en-us/articles/206086453-QuerySurge-Architecture-and-Architecture-Diagram
Search | QuerySurge, accessed December 6, 2025
https://www.querysurge.com/search/p36
Connecting QuerySurge to Azure Databricks with Service Principal …, accessed December 6, 2025
https://querysurge.zendesk.com/hc/en-us/articles/39466218305933-Connecting-QuerySurge-to-Azure-Databricks-with-Service-Principal
Configure a connection to Databricks using the Databricks JDBC …, accessed December 6, 2025
https://learn.microsoft.com/en-us/azure/databricks/integrations/jdbc-oss/configure
QuerySurge Features, accessed December 6, 2025
https://www.querysurge.com/product-tour/features
Medallion Architecture: Key Concepts and Examples — DataForge, accessed December 6, 2025
https://www.dataforgelabs.com/data-transformation-tools/medallion-architecture
Medallion Architecture and Scalable Data Modeling in Microsoft Fabric — phData, accessed December 6, 2025
https://www.phdata.io/blog/medallion-architecture-and-scalable-data-modeling-in-microsoft-fabric/
What is the medallion lakehouse architecture? — Azure Databricks | Microsoft Learn, accessed December 6, 2025
https://learn.microsoft.com/en-us/azure/databricks/lakehouse/medallion
Automating Data Quality Gates in Your Medallion Architecture with… — QuerySurge, accessed December 6, 2025
https://www.querysurge.com/company/resource-center/events/automating-data-quality-gates-in-your-medallion-architecture-with-querysurges-devops-for-data
Query JSON strings | Databricks on AWS, accessed December 6, 2025
https://docs.databricks.com/aws/en/semi-structured/json
Working with Semi-Strcutured JSON Data in Databricks. | by Krishna yogi — Medium, accessed December 6, 2025
https://krishna-yogik.medium.com/working-with-semi-strcutured-json-data-in-databricks-ce8b48fbb4e2
DevOps for Data and your Data Project — QuerySurge, accessed December 6, 2025
https://www.querysurge.com/solutions/devops-for-data
Using QuerySurge Integration with Azure DevOps (Versions: 8.2+) — Customer Support, accessed December 6, 2025
https://querysurge.zendesk.com/hc/en-us/articles/360056713972-Using-QuerySurge-Integration-with-Azure-DevOps-Versions‑8 – 2
QuerySurge Integration with Azure DevOps — Visual Studio …, accessed December 6, 2025
https://marketplace.visualstudio.com/items?itemName=RTTS.querysurge-tasks
Scheduling QuerySurge with Jenkins and the QuerySurge Base CLI — Customer Support, accessed December 6, 2025
https://querysurge.zendesk.com/hc/en-us/articles/214610203-Scheduling-QuerySurge-with-Jenkins-and-the-QuerySurge-Base-API
Big Data Testing | QuerySurge, accessed December 6, 2025
https://www.querysurge.com/solutions/testing-big-data
Achieving Data Quality at Speed — QuerySurge, accessed December 6, 2025
https://www.querysurge.com/business-challenges/speed-up-testing
Azure Data Lake| QuerySurge, accessed December 6, 2025
https://www.querysurge.com/solutions/integrations/azure-data-lake
Ready for Analytics — QuerySurge, accessed December 6, 2025
https://www.querysurge.com/solutions/ready-for-analytics
Making Sense of Testing Results | QuerySurge, accessed December 6, 2025
https://www.querysurge.com/business-challenges/data-intelligence-data-analytics