My guide for technology professionals who want to deliver value and quality in their projects.
Hi, I'm Paul and I write blogs to help process the thoughts in my head... :-)
AKA, the musings (post series link) of a slightly grumpy, battle hardened data engineer, technology strategist and enterprise architect.
Context
Over the many years I've been designing and building data solutions for various customers I've seen the technology industry constantly evolve and innovate, which on the bleeding edge can be exciting and challenging. However, I also see that there is a downside to this rapid change: the proliferation of software vendors who create more and more products to deliver data analytics solutions.
While some of these products offer genuine benefits and advantages, others may be driven by marketing hype, land grab user adoption or unrealistic promises. This can create confusion and frustration for technology professionals who need to choose the right tools for their projects, based on the actual requirements and needs of their stakeholders.
In this blog post, I want to explore some of the challenges and issues faced when choosing data analytics tools, and to offer some practical advise on how to avoid the hype and focus on the mature and proven tools that can deliver value, quality and (pertinent to this post), stability.
To be clear, this isn't a rant about any particular vendor or product. At least, not yet! This me taking a step back and trying to break down the common consultants answer of 'it depends' when asked 'what should we use'.
Choice
I've quoted the Matrix in conference talks before, "the problem is choice". For Neo and for us! But to bring that into focus for this blog we need to think about our platform architecture. When data and analytics are the essential components for us to deliver our projects, informing decisions to support the business etc. Choosing the right tools to ingest, transform, analyse, and tell stories with that data is not an easy task, especially in the current market where there are hundreds of options available. What is the primary focus for our design is one way to start breaking down the answer? Cost, performance, scale, resilience, rapid delivery etc. This can help, but there are still other factors we need to consider:
The complexity and diversity of the data sources. Requiring different tools and techniques to handle the ingestion processes.
The variety of use cases and scenarios, which demand different tools and approaches to address and solve.
The fast pace and short delivery cycles of projects to stay ahead of the competition, which can put pressure on engineers to deliver results quickly and efficiently, without compromising quality.
The influence and expectations of software vendors, who often market their products as the best and latest solutions for any data and analytics challenge, regardless of the actual fit and suitability! Not naming names here. Sadly, they all do it.
The lack of clear and objective criteria/standards to evaluate and compare different tools. This can lead to subjective/biased choices based on personal references, opinions, or experiences. Not necessarily wrong, but not subjective. Also, this often requires a lot of research just to cut through the noise and find some valid/trusted opinions.
I'm sure for most reading this, it all sounds very familiar. So, what do we do?
Choosing
Given these choices/challenges, how do we proceed with the right technology. How do we avoid the hype. Here are my thoughts to help with this process:
Start with the use case(s), not the technology. Before you look for any tools, you need to clearly define and understand the problem you are trying to solve, the questions you are trying to answer, and the outcomes you are trying to achieve. This will help you narrow down your options and focus on the tools that can actually help address the specific needs.
Assess current skills, capabilities, and resources. Before considering any tools, evaluate your existing environment. Covering the people, processes, and technology. This will help identify strengths and weaknesses to determine the gaps for improvement that may need to be filled.
Research different options. Before you decide on any tools, you need to do your homework and research the market and the available options. After all, we are here because the industry changes so quickly! A quick proof of concept (PoC) can often help with this and allows for an element of learning (by doing).
Validate your choices. Before you commit to any tool, you need to test your choices in a realistic and controlled environment. Maybe extending the PoC. You need to use real or representative data and scenarios to measure and evaluate the results of using different tools. Especially if performance is a key requirement in the design. Often meaning cost also needs to play a factor in the decisions. For example, we can use 9,000 compute nodes for everything if we don't care about the run costs.
Share your design thinking with others. After you choose and implement any tools, you need to review and revise your decisions periodically. A use case or outcome might have been realised, but the solution will evolve meaning we need to monitor performance over time and the impact of the tools.
Conclusion
Choosing the right data analytics tools is a critical and challenging task and one that I face every day when collaborating with customers, often bamboozled by an ocean (or lake!) of different tech. However, by avoiding the hype and following my thinking above, you can hopefully make some informed and rational decisions that align with your project requirements and goals.
Please remember, don't just choose a new and shiny tool just because it's new and shiny. Choose the right tool for the requirements.
And yes, I did ask Co-pilot to generate an image of a technology hype cycle, thinking it would bring something back inspired by Gartner etc. Not sure what this is, but we'll go with it. Loving the colours. Ha!
Many thanks for reading.
Comments