CluedIn Frequently Asked Questions (FAQ)
What is master data management in Azure?
CluedIn Master Data Management (MDM) is a Microsoft Azure native
solution that thightly integreates with 27 Microsoft
Azure services including Microsoft Purview, Azure Data Factory,
Azure Synapse and more.
Read more about the solution here:
Microsoft Azure Master Data Management
How is CluedIn different to a Data Warehouse?
A Data Warehouse is fanstastic at being able to answer known questions with known data. We still think it is and will be a critical part of your data technologies.
The typical problems that creep in with a Data Warehouse is when you want to ask a spontaneous question or you want to see the data from another angle, you often are bound to the way that this data has been modelled in the fact tables.
Although you can add more fact tables, there is obvioulsy a point where this becomes too much to maintain and the queue of requests just keeps growing as more and more parts of the business are working with data.
CluedIn is typically placed before the Data Warehouse in the overall journey of data. CluedIn is fantastic at replacing the typical ETL layer of the Data Warehouse projects where we are simply better at integrating, preparing, governing and proving flexible access to data to fuel your dimension tables.
With CluedIn in place, if someone was to request data from a different question, you won't have to go back to the raw data to fuel your dimension tables with what you need. Instead you create new columns in your dimension tables and then ask CluedIn to fill the data in.
For more details, you can read the full white-paper.
What is the technology stack of CluedIn?
CluedIn is a platform that is composed of many different components. These components are all docker containers and the application is composed together with docker compose.
We use Kubernetes as the orchestration framework that helps you manage and deploy CluedIn into your production environments.
If we dive into the application itself, the CluedIn server is a .net core application. Amongst other advantages, this allows you to host CluedIn on all the popular operating system environments. We do recommend that you host CluedIn in Unix/Linux environments.
The CluedIn server interacts with many different databases through an enterprise service bus abstraction. This allows CluedIn to be truly and horizontally scalable. The anecdote of "just add more machines" is very much a reality with CluedIn.
The data layer of CluedIn is made up of 5 different databases including:
These components can all run in a High Availability mode and are all enterprise grade. We lean on the shoulders of giants here at CluedIn and hence all of these databases are utilisng the industry leaders in their respective space.
How is CluedIn different to a Data Lake?
A Data Lake, simply put, is a place to migrate all your data to so it can be more easily available to get from the source and then will typically give you SQL as the ubiqutous language to query across files like it was all in database. The storage costs are relatively low, but the maturity of the data is still very raw.
If you already have a Data Lake, CluedIn is typically used to sit over the lake and mature the data to a point where it is consumable and usable. With CluedIn in place, we really dont' recommend that anyone goes directly to the Data Lake anymore.
If you don't have a Data Lake already, the honest truth is that we would strongly recommend that you address some other parts of your data landscape before you get to that. Although the Data Lake does have value and eventually it makes sense to get one, we often see it as a premature optimisation, where low cost storage and the value that comes from that is not outweighed by the fact, that often, the Data Lake can't show value because there is still a lot that need to be done to mature the data before anyone can use it.
Why should you not build a Data Fabric yourself?
The entire reason why
CluedIn exists, is because most
projects fail in stitching together different products into a
coherent fabric. CluedIn was
designed with the stitching first and we grew out the different
Let's be transparent and open, there are lots of open source and non-open source tools out there that are great. We believe that you could build something like CluedIn, but it is important to remember these projects take a very long time to mature, are fraught with risk and typically cost many times more than you intended.
CluedIn has the benefit of accelerating you past all that turmoil to the point where you can deliver ready-to-use data to the forefront of your business.
If you look at most Cloud providers today, they will offer all the building blocks of data management so that you can compose a data fabric yourself. This is great and in fact you can plug in many of these products into CluedIn - the problem is not with the individual products, it is more that stitching these different products togheter is REALLY hard and it is not a surprise that analyst firms are reporting that 85% of these projects fail.
What is a Data Fabric?
The Data Fabric is simply an amalgamation of the different pillars of the data management category. It turns out that many goals that you are wanting to achieve require you to go through some common pillars. These common pillars is the "Data Fabric". Think of it like stitching together different products into one platform. The value of CluedIn comes from the fact that it was born with this stitching and was not an "after-thought". This means that as other Vendors are starting to consolidate their different products into a Data Fabric, CluedIn started and was designed like this from the start.
What is Eventual Connectivity?
Eventual Connectivity is the core Data Integration pattern that CluedIn uses to automate the process of unifying data from different data sources. Quite simply, CluedIn utilises a Graph based, schemaless pattern that allows companies to not have to manually determine how different data sets join, if they can at all!
You can read more about it here:
What Is Modern MDM?
Key features that define modern Master Data Management (MDM) is:
Design-Centric Approach: Modern MDM adopts a design-centric methodology, prioritizing a thoughtful and adaptable approach to data management.
Integration of Augmented Data Management: It incorporates insights from augmented data management disciplines, leveraging advanced technologies to enhance data quality and decision-making.
External Data Sharing: Modern MDM goes beyond internal data sources, considering the integration of external data sharing to enrich and broaden the scope of master data.
Read more here:
How do you install CluedIn Master Data Management?
The Azure Marketplace is the best place to get started
with a new installation of CluedIn.
CluedIn Master Data Management is an Azure Managed Application (AMA) that is deployed within your company's Azure infrastructure. As a managed application, CluedIn is easy to deploy and operate. In addition, our support team can help you with the installation processes.
Installing CluedIn through the Azure Marketplace allows you to use simple hourly pricing and upgrade to a full license when needed. So you can freely use CluedIn for a few hours of investigation or dig deeper and integrate with a suite of Azure services to develop your master data management solution.
Read more here:
What is Azure Purview MDM?
Microsoft Purview & CluedIn Master Data Management for unified data governance in Azure.
The marriage of Microsoft Purview and CluedIn brings a powerful solution for managing and governing data across on-premises, multi-cloud, and SaaS environments. While Purview provides a unified governance platform with recent enhancements in data governance and protection, CluedIn complements it as a cloud-native, modern data management solution that unifies and prepares data for insightful analysis.
By combining Purview with CluedIn, businesses can gain a comprehensive understanding of their data, ensuring trustworthiness and governance throughout its lifecycle. The partnership allows organizations to answer critical questions about their data, such as its existence, trustworthiness, usage, and responsible parties. Demonstrating data lineage and governance becomes more robust, showcasing continuous improvement in data quality and trustworthiness over time.
Ultimately, the collaboration between Purview and CluedIn enhances the accuracy, insightfulness, and overall value of data-driven initiatives for businesses.
Why keeping Master Data Management (MDM) in the IT Domain is bad for business?
Keeping Master Data Management (MDM) solely in the IT domain limits business potential. While engineers prepare data for insights, involving Domain Experts is vital to enhance data's intrinsic value. Exclusion risks poor decision-making, as technically accurate but practically unusable data emerges. Integrating Domain Experts initiates a shift toward treating data as products, addressing conflicts, and facilitating data integration. Successful strategies include enriching data, eliminating duplicates, and enhancing accessibility. The analogy of source control illustrates a solvable data silo challenge, balancing centralized and decentralized control for scalability.
Who is the founder of CluedIn?
What is CluedIn?
CluedIn is a master data management platform giving companies the data foundation they need to fulfil their data-driven initiatives and deliver more value than ever before.