This article was published on April 15, 2022

How we can make data science more diverse — and why that matters

Data diversity means more than just bigger samples


How we can make data science more diverse — and why that matters

We’ve experienced a radical global shift in the social perception of issues related to diversity. Studies demonstrate a clear trend towards ‘diversity awareness’ over the past decade. But has this translated into gains for STEM?

As far as we can tell, the answer’s a tepid ‘sure, a little.’ We’re seeing small changes filter through in the form of corporate and academic commitments, but continuing studies demonstrate there’s a lot of work left to be done when it comes to actual recruitment and equal-opportunity treatment.

While this is probably true throughout most traditional industries, in the STEM fields it’s especially interesting to view the issue of diversity through the lens of data science and data scientists.

I interviewed Radhika Krishnan, Chief Product Officer, Hitachi Vantara, to find out what was at the heart of the issue of diversity in 2022.

Krishnan told me that we needed to ensure we were taking steps to understand the problem, if we wanted to solve it:

What doesn’t get measured, doesn’t get progressed, so measurement and capturing data is a great place to start. Fortunately, there is more focus on tracking and benchmarking as many organizations are now collecting data for different categories like the number of women serving on boards, diversity numbers in an organization and women holding leadership positions across a company. Transparency and visibility into these numbers is key to creating change.

But there’s more than one problem when it comes to issues of diversity in data science. There’s the science problem, and the scientist problem. As Krishnan explained, bias can creep into data or systems through either of these avenues:

There are two areas that introduce bias, one is the algorithms themselves, who’s writing it and the data scientist architecting the framework — the other part where bias gets introduced, is the data that gets used to train—one needs to figure out a way to ensure both spaces are covered and there is a system for identifying bias in both areas.

Measuring how much diversity you have in the data sciences goes a long way, but we need to be careful about what we say constitutes the “data science” realm. We can’t just assume we are talking about data engineers and data architects.

It’s also pertinent to the domain experts with the expertise that are collecting the data and mining insights. There must be diversity there as well. As straight forward as it sounds, a lot of this comes down to customer centricity.

If you know your customer base that’s key. Your customer base is rarely all one type of people. It doesn’t matter what industry you’re in, you need to have diversity in the organization that’s catering to the customers, because the customer base itself is diverse.

Yet, it takes more than identifying and measuring problems to solve them. Arguably, nobody should be unaware of STEM’s diversity issues in 2022. Unfortunately, developers can’t just push a magic button labeled ‘fix bias’ to solve the problem.

Bias enters most technology systems unintentionally. The developers aren’t necessarily trying to make machines or algorithms that work better for certain demographics than others. That’s why you’ll often hear bias referred to as something that “creeps” into systems.

Luckily, it’s 2022 and we’re starting to see some solutions to these problems creep onto the scene themselves. Chief among these is transparency and explainability.

As Krishnan transparently explained to me:

There’s the notion of explainable AI that’s taking over in a big way. Explainable AI helps end users understand why certain AI decisions were made so that they can properly understand and assess the veracity of the algorithmic recommendations.

Such transparency aids in complying with regulatory requirements, but also in providing context to leadership on everything from decision-making to operational results. It also helps root out bias in the algorithms.

I would say this is where we go back to thinking about the customer. Making sure your data is representative of the customer base you’re going over is going to be important.

At the end of the day, however, solving the problem of diversity in STEM is no longer just an issue of supporting a diverse workforce. It’s also a matter of supporting clients, partners, customers, and users in the global technology space.

But it still takes effort to get there. As the above mentioned studies all conclude, STEM still gravitates toward an imbalance of straight, white, CIS-men, despite global efforts to balance the field.

In other words, there’s still work to do. According to Krishnan, that means we need to take the burden off those seeking better treatment by continuing to implement changes at the organizational level through policy:

We need to make sure we continue active recruiting of women and minorities and move beyond grassroots efforts to structured programs. Whether it’s government or organizations, making moves at a policy level will help in these efforts.

One other important aspect is mentorship programs for those in underserved demographics. The importance of having a role model can’t be understated.

Did you know Radhika Krishnan is speaking at TNW Conference this summer? Check out the full list of speakers here.

Get the TNW newsletter

Get the most important tech news in your inbox each week.

Also tagged with