Observe

Reliability measures are the measures that include quality/health metrics such as completeness, accuracy, consistency, timeliness, and validity of the data to ensure the trustworthiness and usability of the data:

● Freshness: The freshness of the data in an asset.

● Volume: Computes the total row count in the data.

● Schema: Computes the total number of columns in the data.

● Duplicates: Counts the number of rows that are identical based on either primary key or composite keys deﬁned by the user.

Freshness

In freshness, the trend reports give an overview of how up-to-date the data is across various timelines.

The freshness measure is by default active for all the assets and provides the timeliness information of the dataset conﬁgured. If a user wants to know when was the last time the dataset was updated or how often the data gets updated, this measure offers a comprehensive historical perspective on the dataset's trends.

Frequently, optimal data freshness is essential for evaluating dataset quality before making crucial business decisions. In the example, the system captures freshness data and visualizes the trend. Additionally, the user can set a manual threshold to get notiﬁed when the freshness number is beyond a certain period.

Volume

In Volume, the trend report shows if there are any changes or deviations in the total row count across a time period.

Volume measure of an asset helps the user understand the changes in the record count of an asset over a period of time. Some use cases require estimating the business checks only on certain limited records.

The completeness dimension offers users insight into the trending record count, enabling them to implement corrective actions upon detecting anomalies. Users can choose to depend on ML-based anomaly detection for alerts or establish a manual threshold at a predetermined number, triggering platform alerts accordingly.

Schema

In Schema, the graph shows the changes and deviations in the total column count across the time period.

Actual result: It is the factual outcome generated from the data observability within the platform. It signiﬁes the real, obtained result by understanding and analyzing the data.

Expected result: It refers to the anticipated or the predicted outcome based on the previous model or predeﬁned criteria.

Comparing the expected result to the actual result in the reliability measures is essential in ensuring accuracy in predictions and assessing the success of various operations that will be performed in PDQ.

Schema measure refers to the number of columns in a table or a view. Ideally, the number of columns would remain the same and the DQScore viewed at the asset level would be based on these columns.

However, when columns are added or deleted from the source, this may lead to impacting the DQscore as these attributes contribute to its overall score calculation, misleading the consumer.

By default, PDQ do not consider any new columns for score calculation unless the user enables the column in the connection page. It is a best practice to keep a check on the trend in schema.

PreviousConnectors NextMeasure

Last updated 8 months ago

Was this helpful?