Case Study: Mosaic.tech
Metric Development and Modeling a Reporting Layer
Snowpack Data | February 6, 2024Introduction
Starting in early 2023, we began working with the team at Mosaic.tech to set up product analytics and reporting needs. A key goal was to clearly define and model their priority metrics.
Mosaic is a strategic finance platform for the entire business, providing a real-time analytics and planning platform that helps teams get from data to decision. This includes data consolidation, automated reporting, predictive modeling, insights, and alignment. With such vast product functionality, Mosaic values the ability to utilize their data to better understand their product, their customers, and drive towards more informed growth and customer success decisions.
Objectives
Mosaic had implemented Heap for web tracking analytics, and had a plethora of production data tables, including internal audit log tracking. The main objective of the project was to develop high-value metrics to provide the client a baseline view of product performance.
Secondary objectives included:
- Secondary metric development
- Reviewing Heap implementation
- Modeling data into clean consumer data marts
- Building a metrics store to be accessible by Mosaic’s internal dashboarding tool
- Assisting in data integrations needed for using this product data
Methodology
1. Data Ingestion: Production data was ingested into Snowflake from s3 via snowpipe. Heap was directly integrated using Heap Connect’s Snowflake integration.
2. Exploratory Analysis: We really want to understand the product, so there was a fair amount of exploratory work to sift through their data and associate it with the product functionality and features. This resulted in better understanding of schema relationships, grains, and the overall data model, as well as some limitations with data capture.
3. Metric Development: Worked with primary stakeholders to understand product priorities and derive primary metrics. Prepared preliminary queries with various dimensions to sanity check and enable stakeholder QA.
4. Data Modeling: Designed and built the appropriate data models to result in a consumer data mart for product data. Implemented a metric layer to ensure consistent metric definitions across all downstream use cases.
Tooling
Mosaic had already set up their Snowflake instance with dbt cloud as their transformation tool. Their product has a dashboarding component and various data integrations, enabling them to avoid additional BI tooling. Beyond traditional visualization, Mosaic also used some external customer success tools where this data was going to be utilized.
Snowpipe
Snowpipe is a data ingestion service for Snowflake that enables automatic and continuous loading of streaming data into the Snowflake data warehouse. It simplifies the data loading process by enabling data streams from s3 to snowflake, triggered by updates to the specific s3 bucket.
Snowflake + Snowsight
Snowflake is a cloud-based data warehouse that provides scalability and flexibility, while Snowsight empowers users to easily analyze and derive insights from their data within the Snowflake platform. With Snowsight, we were easily able to present findings and iterate quickly as we were preparing metric definitions for all of the primary product metrics.
dbt (data build tool)
dbt primary value-add is managing the business logic of our metrics, and modeling data for effective analysis. Column definitions can be managed in a documented and version-controlled fashion in dbt. While this does mean that updates and new columns require opening a pull request, the additional friction ensures quality and offers the opportunity for feedback. dbt is also useful for helping to establish a limited number of tables with these predefined metrics/structure for external systems to utilize. This “reporting layer” is a set of data models that are designed for use exclusively in reporting. This also enables permissions to be configured so that external systems can only access models in the reporting layer.
Lessons Learned
1. Data Validation:One key aspect here was understanding the “source of truth” for event data. With two event sources (Heap and Internal), it was imperative to validate against each other and be able to trust each. While there wasn’t complete overlap, being able to compare events that did overlap between the two created trust of all metrics.
2. Data Latency:Understanding the data latency needs of the end users is always a key aspect of our work. With data engineering experience, we are able to assist in implementing solutions for any latency requirements, whether live or batched is necessary. In this case, however, simple weekly metrics were needed, and the internal team had already set up the snowpipe system. This enabled us to dive straight into metric modeling and not worry about data ingestion.
3. External Systems:Utilizing external systems will require specific design patterns, and dbt enables us to manage the data modeling, and in conjunction with Snowflake, we can manage the permissions as well. In this case one of the external systems required a specific event data structure, so we were able to spin up a data model with dbt that accomplished the requirements for this integration.
Conclusion
Having the right tooling in place enabled us to work quickly and efficiently while consistently iterating for each step of this process. dbt has been instrumental in enabling us to keep the analytics data dependencies as simplified and untangled as possible. Snowflake with snowpipe enabled quick exploration as we defined metrics, and has the added benefit of managing permissions for our various external system needs.
Snowpack specializes in helping organizations solve analytical problems. For more information, follow our blog, or shoot us an email at [email protected]