How Much Data Is Too Much Data For Your Organization?

Table of Contents

    With access to unprecedented volumes of data, organizations can perform endless analyses and get innumerable insights to base decisions on. But at some point, it’s valuable to step back and consider whether having so much data is serving the needs of your organization or costing you in ways that you’re not recognizing.

    Here are three questions to ask when evaluating whether to bring data into your data warehouse:

    Does it create unnecessary risks?

    It’s wise to consider risk factors before anything else. If the data has any degree of exposure to risk—due to compliance or other regulations, due to potential sensitivity (e.g. HIPAA, GDPR), or as an essential component in a project’s implementation—you will need to capture it. But it will also be critical to do your due diligence to define and understand the requirements first so as to avoid bigger problems further down the project lifecycle. You don’t want that sinking feeling you’ll get if you take on additional risk without receiving the added benefit of your efforts for the project.

    What value is it to your data consumer?

    Consider what you can do with the data you collect. Without establishing a clear purpose for the data, it’s hard to justify the need to process the data for analysis. If your organization is not willing or able to respond and adjust from the insights of the data, or make decisions with it, then the data might not be as valuable as you first thought. In fact, it might just be distracting you from more important efforts. It will be important to ask: How will more data add to or improve my decision-making? How does the cost of pursuing that data compare to using other data?

    What are the hidden costs?

    Everything has a cost. While the cost of storage is cheap and easy to determine these days, other costs should be considered when starting a new analytics project, even if they’re not as easy to quantify as storage costs.

    Labor costs

    You’ll need both technical and non-technical resources to build analytics. Even the tasks of understanding new data sources, what analytics to apply, and how to apply them correctly take time. You might even need to supplement your current in-house expertise with outside resources (e.g. consultants) to help springboard your development. The costs can start adding up quickly if you don’t manage and understand them right from the start.

    Trade-off costs

    Consider and compare the costs of focusing on one set of data versus another. For example, archiving strategies, index tuning/configuration, and more advanced DB techniques can add to the cost of having additional data. Will the personnel resources you need to perform these activities be blocked from working on other projects that are more meaningful and that might have more value for the customer? Finally, more data might eventually lead to a trade-off in performance! Consider whether your platform is scalable and can handle data growth. Also, evaluate whether or not you have the expertise to determine if you’re going to hit a tipping point on performance degradation.

    Compliance costs

    Most organizations have a sizeable set of audits and requirements for all of the types of data they use. Typically, IT must fill out a set of forms once a year in order to certify that the data is being maintained according to the organization’s policies and procedures. If your organization also hosts data from other organizations, you might also be required to fill out their IT questionnaires. While this is surely necessary, it’s useful to recognize that the time it takes to respond to inquiries and follow-up inquiries can add up and cost your organization in time and resources.