Bigquery was chosen as the analytics backend of our CDP self-service. Meanwhile, Google Cloud BigTable was selected as the backend, where our services will interact to enable the personalization. In developing the storage for Big Table, the design of the scheme is very necessary. The frequency and categorization will affect how we design the column qualifier while the CDP attribute will affect how we design the row key.
We also opted to create a caching mechanism to reduce the load to huge tables for identical read activity. We construct the cache system using redis with sure Time to Live (TTL) to ensure an optimized performance. In addition, we also utilized a Role Based Access Control (RBAC) mechanism on the CDP API to ensure entry control of different services towards attributes in the CDP.
3. Monitoring and alerting
Another necessary point in building a CDP is developing the correct monitoring and alerting system to maintain stability on our platform. A soft and hard threshold on each metric is established and monitored. Once this threshold is reached, some alerts will be sent through the communication channel. Based on the current architecture, there are several parts in which we need to enable monitoring and alerting.
One of the things that we will need to monitor is resource consumption during computation and data pipeline from data sources to the CDP storages, as we operate using Bigquery and Dataflow for Data Computation and Data Pipeline. In Bigquery, we need to monitor the slot utilization that is used to compute some data aggregation or manipulation to produce the attribute.
When building the CDP, high quality data was necessary in order for it to be a trusted platform. Several metrics that are necessary in terms of data quality are Data Completeness, Data Validity, Data Anomaly and Data Consistency. Therefore, several monitoring needs to be enabled to ensure these metrics.
Storage and API Performance
Since CDP’s backend and API immediately interact with several entrance facing features, we have to ensure the availability of the CDP service. Since we’re using Big Table as the backend, the monitoring of CPU, Latency and RPS is required. This metric, by default, is provided in the Bigtable monitoring.
4. Discoverability across company
Many users have been inquiring on how they can browse attributes that our CDP offers. Initially, we initiated out by documenting our attributes and sharing it to our stakeholders. However, as the number of the attributes increased, it became increasingly harder for people to go through our documentation. This pushed us to start integrating the CDP terminology into our Data Catalog. In this case, our Data Catalog performs an necessary position in enabling users to browse attributes in CDP, including the definition of each attribute and how they can retrieve the data.
5. Implementation and adoption of the platform
Another key point for a successful CDP implementation is collaboration across teams on the entrance end services. There are several types of CDP implementation in Tokopedia: Personalization, Marketing Analytics, and Self Service Analytics.
The most common utilization of CDP would be in personalizing a user’s journey. One example of personalization is the search feature. The product team personalizes the user’s search result based on the user’s address, so that the user will be able to find products that are in proximity to their location. After discussing the definition of user address, we created a CDP API contract with the Search team, so the development can run in parallel. As a result, today our users are able to have a better user experience based on their location.
When we initiated building the CDP platform, we mentioned with the Marketing team on their existing use cases. One of their goals was to personalize and optimize marketing efforts, such as sending out notifications to the right user based on the user’s attributes to reduce unnecessary notification costs to unrelated users, and to enhance the general user experience by avoiding spam notifications. Once we understood their needs, we looked at the ways in which CDP could cater to those needs. We mentioned with the relevant team on how to combine the segmentation engine and communication channel towards the CDP platform, the type of user attributes to use when sending marketing push/notifications, and how to combine it with the segmentation engine and communication channel of the CDP platform.
CDP also often uses self-service analytics to enable quick insights on user demographics and behavior in sure segments. To construct this self-serve analytics tool, our team consulted with the Product and Analyst teams to define the user demographics’ attributes that business/product users often select for insights. After understanding the attributes required, we mentioned with the Business Intelligence team to enable the visualization for the end user. This allowed different teams to understand our users better and achieve insights on how we can improve our platform.
CDP implementation has created a significant impact on different use cases and helped Tokopedia to be a more data-driven company. Through CDP, we are also able to strengthen 1 of our core DNA, which is Focus on Consumer. By sharing the CDP framework, we hope to bring value and help others to more easily create a thriving CDP platform.