Microsoft Fabric for the Database Administrator (DBA) - This is just the beginning!
Problem:
Microsoft Fabric is a big product with lots of different data handling capabilities, from a data engineering perspective creating and innovating with Fabric as a unified tool is a great experience, ultimately delivering data insights for the business and adding value, nice! However, as with all new developments, the creativity is the fun part. The governance and movement of code into production is less fun and can become the hard/ugly part if the change management, platform and governance aren't mature enough.
Considering this in Microsoft Fabric, an engineer or business user will focus on Workspaces and Items they create from Experiences. Workspaces are therefore their world. Fine, but how are Workspaces represented in the rest of the product? Initially, to answer this question I created a manual hierarchy for a customer to help describe how to organise the different parts of Microsoft Fabric, with a focus on the happy place of the Workspace (yes, it rhymes too!).
This is what I created, version 1:
Honestly, I did not intend for this picture to be complicated, initially in Microsoft Fabric my gut feeling was, that it was simple, serverless as we are told. Software as a Service. Until I started drawing the hierarchy with the various parts and requirements for a typical analytics solution and enterprise customer with multiple teams, environment, data products and business domains.
This is what prompted my blog post and title.... Administering Fabric, Workspaces vs Everything. We gravitate to Workspaces as engineers to create stuff but we need to consider everything else in the picture above, to get what I have created into production including the access model for different personas, compute and source control etc.
Based on this I decided to draw another picture that I want to share with you to help you to think about how we manage/administer Microsoft Fabric. Manage in the context of getting our creations into production for user consumption as well as implementing good platform governance for the business.
Version 2:
Hopefully, version 2 is better than version 1 in terms of clarity but not complexity. That said, we can now start to answer some questions for our Microsoft Fabric implementations.
To clarify:
This is not yet an answer to this problem.
This is a blog with a problem statement and a bunch of unanswered questions only.
This is so, as a community and industry we can use the above to start thinking about what we need to establish in Microsoft Fabric. What the 'DBA' needs to be aware of (as a minimum).
This is not just a technology problem, as always, its about people and processes too.
Questions:
Let me start with the questions I think we need to answer with Microsoft Fabric as we grow and evolve, in no particular order, but aligned to the version 2 picture above and some slightly more generic than others.
Should we separate our environments (dev, test, prod) using a set of Microsoft Fabric Workspaces, one Workspace per Environment?
Should the data product be defined as everything across all environments to deliver a given use case or solution in Microsoft Fabric, is this natural?
Should we further separate our Microsoft Fabric Workspaces based on the "medallion architecture" methodology, with alignment to the respective bronze, silver, gold storage containers?
Should only our development environment Workspace be connected to source control?
Should we have all Workspaces connected to the same source control repository for all environments and use a complex code branching strategy vs environments, meaning branch merging becomes our method of deployment?
Should a business domain encompass all data products and all environments?
Should we use sub domains to separate production and non-production environments, or some other classification?
Should Capacities (compute) be aligned to environments, data products, sub domains or business domains, or something else? Maybe the whole tenant?
Should we use trial capacities for our development environment, creating new trials as required?
Should a role based access model be implemented at every level in this hypothetical hierarchy and if so, what roles? Experience reader, Workspace contributor, One Lake reader.
Should data plane and control plane personas be separated for the production environments?
Should we organise our folders in a Workspace based on the type of Experience Items created?
Should we have one big capacity covering all parts of the solution, with dev and prod workloads sharing compute?
Stopping at thirteen seemed appropriate for the tone of this blog.
Summary:
To summarise for now, this is only just scratching the surface, if you start looking around the Microsoft Fabric Admin Portal, there is so much we need to take control off and apply governance too. Especially when we have willing business users now exploring Fabric capabilities ready to self serve on our data products.
Once upon a time a data lake became a data swamp due to a lack of governance. Will this fabric become a tangled mess because of the same issues. Microsoft is going after user adoption, fine. Who is going to pick up the pieces here?
Somebody call the DBA!!
Many thanks for reading.
Comentarios