Common Data Issues & Solving Them With ClicData

Table of Contents

    What sets ClicData apart is the power it gives you to do two valuable things: pull your data together into one clean data source and build efficient dashboards in a very short time.

    The platform gives you tools to tidy up your data. You can define transformations on the data that run any time that data is refreshed.

    Since there are so many times you don’t have any control over the format or structure of the data you receive, this is often a very necessary step. However, once you have done the work upfront once, you can relax. You can rely on ClicData to do the work for you with each data refresh as the calculation is also refreshed.

    Think of what you can do with all of that free time.

    Below you will find some expert tips to take full advantage of ClicData’s Data Management module. The examples demonstrate how you can use ClicData’s ETL functionality to cleanse your data.

    Common Data Issue #1: Parsing characters of a string

    Two typical problems that organizations encounter are:

    1. how to bring together data from disparate data sources,
    2. the native format of incoming data is not conducive to visualizing in a dashboard.

    Older, legacy systems might have data you need, just not in a format that would merge well with your current data system or other complementary data sources. In other words, some cleansing might need to be done.

    Here’s a simple example.

    car model formatted

    You’ve got data in column format containing two values about automobiles—the make and the model of the car.

    If you want to categorize the data by car brand—that data isn’t provided directly yet—you would need to extract data from the Car Model column that contains values formatted similarly to “ABC-123.”

    Then, you need to extract the “ABC” portion of the value—the car brand info—so that you can filter, categorize, or join data and create visualizations pertaining to the auto car brands.

    By creating a view to access the Data Transformation, you can insert a calculated column and apply an appropriate SUBSTRING function.

    The SUBSTRING and CHARINDEX Functions

    The SUBSTRING function extracts a specific number of characters from a string based on a starting location of the string and for a specific number of characters after that starting position.

    SUBSTRING([column], start_location, number_of_characters)

    For our example, if we know that all values in the column will have the same structure (three characters, a hyphen, and then more characters), then we can use the simple formula:

    SUBSTRING([Car Model], 1, 3).

    This will extract the data starting at position one and only return the first three characters (e.g., “ABC”).

    While this simple scenario works, it might not be optimal if the value you want to retrieve might come in a variety of lengths, e.g., “ABC-123” and “ABCDE-123”.

    If that’s the case, you can combine the use of the SUBSTRING function with the CHARINDEX function to get the result you want:

    CHARINDEX(‘character‘, ‘string’)

    Using the example above, if you know that there will always be a hyphen, the hyphen’s starting position in the text can be used to determine the number of characters to substring dynamically.

    So, CHARINDEX(‘-‘, ‘ABC-123’) tells you the location of the hyphen in the string and returns the value of 4. Now you can combine the two to make a more robust substring function:

     SUBSTRING([Car Model], 0, CHARINDEX(‘-‘, [Car Model], 0))

    This would handle both “ABC-123” and “ABCDE-123” data entries, returning “ABC” and “ABCD,” respectively.

    Illustrated below is the resulting calculated column showing the new Car Brand values, which have been extracted from the “CarModel-Formatted” column from the original data, and the resulting substring.

    substring charindex functions
    Example of a SUBSTRING function and results

    If we look at the result of this transformation, the Car Brand is now separated from the Car Model:

    data grouping

    Common Data Issue #2: Converting a string to date

    Another common data issue is converting text columns to date format.

    You want to have values in date format so that you can use date features within your dashboards for trending charts or date filter widgets. This is where the DATEFROMPARTS function can be applied.

    Depending on the text values structure of the date, you will need to determine how to extract the different parts of the date—the year, month, and day.

    For example, we have text values for dates that look like: “20080504 00:00:000.”

    That means we will have to retrieve the first four characters to derive the year, characters 5 and 6 to determine the month, and characters 7 and 8 to get the day.

    Because this looks to be the standard format, we can use LEFT(DateString, 4) to get the year, SUBSTRING(DateString, 5, 2) to get the month, and SUBSTRING(DateString, 7, 2) to get the day.

    The DATEFROMPARTS Functions

    If we combine these functions with the DATEFROMPARTS function, we can convert the string into a date value, using:

    DATEFROMPARTS(LEFT(DateString, 4), SUBSTRING(DateString, 5, 2), SUBSTRING(DateString, 7, 2))

    converting string to date
    Example of a transformation of a string to a date

    We are now able to build a nice and clean visualization of our car sales by month:

    data transformation results visualizaton

    We could also use the T-SQL Cast as Date to convert the string to date format. While there is a problematic trailing period, we can handle it by removing the period.

    This approach might be a little more straightforward and easier to follow, and it gets you to the same result:

    CAST((REPLACE(DateString, ‘.’,’’) as DATE)

    using tsql cast

    Common Data Issue #3: Parsing JSON values

    First, what is a JSON Object?

    A JSON object allows you to store multiple attributes in one value. Unfortunately, the value cannot interact with individual attributes directly on a dashboard. Therefore, you must extract attributes that have already been extracted prior to their use in the dashboards.

    For example, assume you have the following JSON data in a column:

    {“Model”:”BMW 518i”, “Convertible”:”No”,”Type”:”Coupe”,”Country”:”Canada”}

    Using calculated columns on a view, you can use the JSON_VALUE() functionto extract an individual attribute within the JSON object and place it into its own dedicated column. The structure of the JSON_VALUE() is:

    JSON_VALUE([column], ‘$.path’)

    Where the [column] is the name of the column you are parsing and the ‘$.path’ represents a set of path steps with the key name (“Convertible,” “Type,” “Country”) identifying the data element you want to retrieve.

    For the example, JSON_VALUE([Description], ‘$.Country’) will return the value of Canada for that row of data.

    In another post, we’re going to cover 3 more common data issues that companies are facing which are when do we need to use data merge, aggregation and cumulative sums. We’re also going to show examples of use cases in ClicData.