Responsibilities of a Chief Data Officer

Patrick Wagstrom is the Chief Data Officer at Brightcove.

We also talk about:

his role and responsibilities as a chief data officer,
the difference between building systems that support machine learning and systems that don’t,
distributed software engineering,
data governance and GDPR,
and how to make sure your AI model is unbiased.

Episode Resources:
Learn more about Patrick here
and here
More about Effective Challenge
More about Model Monitoring
Rob High, IBM fellow

About Patrick Wagstrom
Patrick Wagstrom is the Chief Data Officer at Brightcove. Before that Patrick was the director of emerging technology at Verizon, meaning that he leveraged AI/ML, augmented reality, blockchain, IoT, quantum computing, and even 5G. Before that, he was a senior director of data science at Capital One. Even before that, he was a “research nerd” (his own term) at IBM working on the Watson project.

Book your awesomecodereview.com workshop!

Other episodes you'll enjoy

How I got into FAANG companies without a CS degree

Getting a remote position at Microsoft with Scott Hanselman

Read the whole episode "Responsibilities of a Chief Data Officer" (Transcript)

[If you want, you can help make the transcript better, and improve the podcast’s accessibility via Github. I’m happy to lend a hand to help you get started with pull requests, and open source work.]

[00:00:00] Michaela: hello and welcome to the software engineering unlocked podcast. I'm your host, Dr. Michaela. And today I have the pleasure to talk to Patrick Wagstrom. But before i start, let me tell you about my awesome Code Review Workshops. Code reviews are a wonderful engineering practice that helps you nurturing, improves the quality of the code base and enhances learning and mentoring. But code reviews come with their own set of challenges. Many teams experience evere problems like reduced productivity, low feedback quality, conflict between team members and reviews that nobody has time for or understands. I have been heloing teams get ovet those problems, and to improve the code review practices based on the latest research and my experience working with teams all over the world on code review techniques. If this sounds interesting, hop-on to awesome-code-reviews.com to learn more or to book a workshop. But now, back to Patrick.

Patrick is the chief data officer at Brightcove. Before that Patrick was the director of emerging technology at Verizon, meaning that he leveraged this AI, ML, augmented reality, blockchain, IOT, quantum computing, and even 5G. Wow! Before that he was a senior director of data science at Capitol. Yeah. Even before that he was a research nerd at IBM working on the Watson project. So what should I say? I'm so, so, so super straight to dig into all his experiences and his wisdom. So Patrick, welcome to the show.

[00:00:41] Patrick: Thank you. I'm very happy to be here.

[00:00:43] Michaela: Yeah, me too. I'm so, so happy. So what, what does it achieve data officer actually do what's that?

[00:00:49] Patrick: Yeah. Yeah. I know at a high level, you know, it's, it's an executive role that if you look at it, it means that I'm responsible for all the data aspects, but there's so many different aspects of data inside of today's companies, especially with the way that data fuels basically everything.

And so I divide what I do up in the generally three different buckets. Right. In the first bucket, it's a lot of things around governance. It's both data and machine learning governance. So this is complying with the various data privacy laws around the world. And, you know, making sure that models are well-governed, that they're maintained well, you know, all of that stuff that can kind of falls into the first bucket.

The second bucket is being responsible for data pillars, working with our teams to have data standards about how data moves from one system to another, ensuring that we have a reliable data warehouse that people can query, you know, data catalogs, those sorts of things to make it easier for folks to actually use the data.

And then the third bucket, at least here at Brightcove and a lot of other companies, the role of the chief data officer is expanding to include analytics, data science, and machine learning. So sometimes you'll see that the chief data officer sometimes call the CEDAW, which is a chief data and analytics officer, so I have all those sets of responsibilities across the entire company that I have to work with. And that includes, you know, building the teams to work on that stuff, working with the existing teams and everything like that. But it's a, it's a whole lot of fun. And you don't really have routine days at this level.

[00:02:15] Michaela: And so when you talk about compliance, the first thing that as a European comes to my mind is the GDPR compliance. Is that something that's interesting for US based folks? Are you, like for international company? Probably it is. Or sometimes they see the US you know, companies just ignore it at all.

[00:02:35] Patrick: chuckles Yeah. As a company that operates internationally, we can't ignore it. We wouldn't want to ignore it. It, it, it shows you know, our customers are asking for this.

So when it comes to filling out ropers, which is, you know, a record of processing for data and everything like that, like that stuff that rolls up to me and my team and I work with our legal team in order to get that sort of work done. So that is something that absolutely matters to us. And especially because in the United States, you're having individual states that are now enacting similar laws too.

So, all of this means we have to be totally on the ball because operating in Europe and Singapore and China and Japan and Australia, New Zealand and the United States and South America and, you know, we operate in hundreds of countries around the world.

It means that we have to comply with all of their data laws and all their data sovereignty laws. So it's something that we cannot ignore and we have to have baked into nearly everything.

[00:03:29] Michaela: Oh, yeah, that sounds really, really complex. I mean, one of the most annoying, I think about all of that, I think are those pop up messages that you have to accept the cookies, which you anyway do right. Accept all these cookies, but now you're having this really annoying pop-ups everywhere. What do you think about that? I began getting this soon

[00:03:50] Patrick: you know, the, the, the cookie. So those are the cookie banners. I've seen something where most people just kind of click through that. I kind of appreciate it because it makes it explicit where sites are actually tracking me.

And I, you know, I spend the time on opting out, but it gets annoying. Cause like some of these sites, like where they have opted out three or four different times of their tracking and everything I think that it's, well-intended. I think that, you know, it's good for people to know how their data is being used.

And I think that it's, you know, not the worst thing, cause like on the other side of the spectrum, you had sites in the United States where they, they just blocked the Europe after GDPR when, when do effect. So you had a lot of a major American news publishers where if you were coming from a European IP address initially.

It just says, sorry, this content isn't available to Europe and that was their solution for GDPR. And it, you know, in some sense, if you're looking at a newspaper from, you know, an American city, it makes a little bit of sense that you'd be focused on that. And not on, what's not on European audiences, but certainly, you know, the cookie banners they're around, they're most likely going to be here to stay.

We've tried things in the past with things going back 20 plus years, like P3P for privacy, we've tried, do not track and everything like that. And you know, it looks like the cookie banners about the best that, you know, it's, it's not what we need, but it's what we've deserved, I guess. chuckles

[00:05:11] Michaela: laughs At the moment that they offer you the choice to, you know, just have the essentials or completely opt out. I think then it makes sense. But, many sites have just like, I accept that, right? Like there's one button. I accept them. Like I always clicked or they have like accepted or not accepted, so I not accepting it.

What does it actually mean? But, yeah.

[00:05:30] Patrick: Yeah. Like, is the site still gonna work for me if I don't accept that? Or am I really missing something right. Or are they going to charge me extra for it? And that's just something that, you know, we, we don't know, you know, so it's it's, it's it is it is a reasonable solution.

It's one that obviously, you know, we try to make it easy for our customers to comply with. But is it the best solution? Is it the thing that I think everybody thought we were getting when, when these laws are drafted? I really hope not.

[00:05:56] Michaela: So, You had like these three pillars and two of them were very business oriented, at least in my mind.

And then one of them like was leveraging AI and machine learning. Which seems for me much more appropriate for your experience. And you know, how I know you from before is it, is your interest in all of those three pillars and the same thing, or is it very technical? Are there other twos also very technical?

How do you see them?

[00:06:22] Patrick: Yeah. So it's, it's a little interesting. So my background, I think you know, I have a PhD from Carnegie Mellon, but I have it in this weird person called engineering and public policy. And then I also have a bit from computer science too. And so I've always had a little bit of interest in the public policy side of things.

So the fact that I'm working on, you know, actually what does it mean to comply with all these public policies? That was actually something that really interested me when, when Brightcove was interviewing me. So. I have a lot of interest in, and then the, the aspect around data platforms and data science.

Well, those two, I view them as going hand in hand, you can't create great models or great analytics. If you don't know where the data are, if you don't know like who owns the data, if you don't have an idea of how the data are validated and everything like that. So, you know, I think when I first, you know, started talking to people about this job and everything, like I had to pinch myself, like I was being really cautious because even talking with my wife about it, because it was like, it's like, this is like everything that I've wanted to do. Like, it's got a little bit of policy, it's got some technology and engineering and it's got machine learning all rolled into one, you know?

And it was very like, I don't want to curse this by talking about it too.

[00:07:32] Michaela: I can imagine. Yeah, it sounds really cool. I didn't know your background. Yeah. You know, with policies and all of that, but now that you talk about it like that, it looks like the perfect match. So you talked about a little bit about data platforms and data science and that it goes hand in hand.

So maybe one thing that I wanted to pick your brain on, which I'm super fascinated about is software engineering practices. Right. So B testing, you know, how, how product teams work with their agile way of work. And I also worked as data scientists quite, quite some time. Right. So I always had like, I don't know, hybrid role, being, you know, a data scientist being a researcher, being a software engineer, I'd always blended a little bit bad because I was very often, you know, experimental side.

I often felt like I'm falling a little bit out of those normal software engineering practices. I don't know if this is your experience as well, but we don't have. Brains or sometimes the Headspring, but most of the time we didn't have sprint. And when we tried to use branded, wasn't really working because it was so experimental that you can't say, oh, you know, this week I'm doing that ticket or something.

You didn't have like tickets, you have like this big vision and make it happen. Right. And you have like half a year or something. Yeah. So even, you know, for code reviews sometimes have data scientists in my, in my workshops. So how does the data science do code reviews? Very often people are working alone.

What's your experience here? What do you think? Whereas data science and software engineering practices, where do they meet? Or can they meet or you know, work together?

[00:10:12] Patrick: Yeah. So I absolutely think that models need to have some sort of code review that's going on with them. But it has to be done a little bit differently because you're looking at slightly different things.

And so in, in the United States, there are a variety of regulated industries and I've worked in the financial sector and that's probably the one that I've worked in. That's the most heavily regulated in this aspect and the government of the United States after the the last financial crisis. So after 2008, they put together this document that that guides on how to manage model risk and they defined something that's called effective challenge.

And, and so their whole thing when, when they were setting up this framework is, they don't want banks and other financial institutions to create models that go like completely rogue and can cost a hundred billion dollars, right? Like if you have, if you have a lending model that goes bad with how interconnected the banks are and, and you know, it goes completely bad.

It could literally collapse the world financial system again. So what they described as something called effective challenge, and really what you want to do is look at the risk of what happens if that model goes wrong. So first you have to figure out, well, what is, what is risk? Like, how do you quantify that?

Like most places that I've been, you just kind of find some way to boil it all down to dollars. Like everything gets boiled down to dollars eventually. And so you boil it down to dollars that you have at risk, if something goes wrong and then based on that, you determine how much you have to do. So it could just be that you have somebody who reads through your code,and, and, you know, it was like, okay, that's like a standard code for you, right.

Like you read through and get the code. And they're like, oh, well, did you know that you should really should be using this library a little differently? You know? And in machine learning, it's not just about like structuring the code.

There's all these hyper parameters that you have on your models and everything. So that's like the easiest level. It's like, let's just do a code review, almost every project. But as you go up higher and higher, you may eventually get to the point where what you do is you have to completely document the model in these big, long documents that are called white papers.

And what a white paper is, is it tells basically everything that goes on in the model in English or whatever language you speak at the company. And at the highest level, you'll have somebody take that and then try to re implement the model from scratch to make sure you really understand what's going on.

And that's an extreme level. Like that's the level that you see in some software engineering also too. Like when you see like, you know, very mission critical systems, you will see that. Independent team essentially implementing it from the same from the same specification. So you'll have that in the models that really, really need it.

The other thing that's a little more interesting about models is you not only have like the code review. But you have this additional, like continual monitoring that goes on, on a model. So with a software project, you know, you'll hook it up to data dog, and you'll look at like CPU usage and everything like that.

And you'll, you'll want to make sure that those are aligned and that there's no weird spikes. And you'll, you'll send the log somewhere with a machine learning model. You do the same thing. You hook it up the datadog, you look at all that stuff, cause it's still is software artifact. But also what you want to start doing is you want to start tracking the input data and the output data,

And making sure that they're part of the same distribution that you thought they were going to be a part of. So if suddenly we have a lot more people from Canada that are, that are visiting and being funneled through this model and you thought it was going to be 5% and it's 50%, then your model is going to behave a little differently.

So you set up checks on those to make it So that way you can monitor the model and productions. This is a process called model monitoring, and it's something that's really, really critical for, you know, basically any company that has models in production, it's like double check and make sure that your assumptions were correct along the way.

[00:13:52] Michaela: So, is that also going into how we actually build systems that support machine learning a little bit different than system that, you know, support normal software. Is that, is that part of that? It sounds like.

[00:14:06] Patrick: Yeah. So when, when I started building up machine learning models, like usually the, the model was shipped as part of the software.

So like we'd have a big release. And I was like, Hey, this has an upgraded model. That's part of it. And now what's happening a lot more often is that you have a model execution platform that you use to build and train your models. And. It's almost like they're served up as API that your software projects call.

And so you'll have software like SageMaker studio from Amazon or Google's new Vertex or Domino Data Lab, you know, H2O data robotics, et cetera, that allow you to build up the model and then, you know, deploy it somewhere behind like an API gateway. So that way, you're just calling it as an API. And then that tracks the input data output data does the model monitoring.

I think one of the biggest mistakes that I made in my career was when I went from IBM to capital one. IBM, we were still kind of early in the stage of machine learning stuff. You built a machine learning model. You had to like find a way to wrap it up and put it in Docker yourself and deploy it.

And so I had the capital one, I'm like, well, of course, you know, we can expect the data scientists to be comfortable using Docker and to wrap everything up in a Docker container. And I found out that absolutely was not, was not the case at all. And so that really caused me to shift about like, how do we make it easy for them to deploy a new version of a model with a different sort of promotion process and what you have in software, because.

In essence, what it does is it, decouples the machine learning model from the software itself when you have it behind an API gateway, and that's something that I've seen, that's been really, really successful. You know, you can't have the teams completely decoupled where like the software team and the, and the data science team never talk, but you can have it where you have this API gateway as the intermediate element that's handling all that for you

[00:15:56] Michaela: And so now that we are after COVID, somehow we are more and more distributed. We are remote. What does this even mean for data science project and software engineering projects? And, you know, like, I think before we could meet each other much easier and now it's getting harder.

Do you see that also in your experience?

[00:16:17] Patrick: chuckles Yeah. Especially joining a company during COVID and I had made a big mistake, mistake in coming here and maybe it wasn't a mistake. It was just a lack of realization. So at Verizon, I probably onboarded 50 or 60 people to my organization. Why. While COVID was raging.

And you know, I thought that I had it down, pat, you know, and we, I had a great senior manager who documented everything and we have this giant flow chart for everybody to get all these software engineers onboard. And it got them up and running, but it completely missed like the whole knowledge transfer side of things.

It completely missed the whole, how do you know what is, what sort of things? And so. But I don't think that we're going back, right? Like it's, we're not, we're not going back to a situation where everybody is going to be in the same office five days a week, or at least it doesn't seem like that's happening right away.

Let's put it like that. And so what that means for right now is that, you know, documentation is still absolutely critical. Like documentation is probably even more critical than it has ever been before. And this is a mindset shift, you know, it's like before it's like, oh, I've got the code. Well, now you need something like an architectural description of what's going on, because unless you have that written down and clear, you know, you're going to have two people meeting and you're not going to be able to go over to the other table and grab your chief architect and say, Hey, is this correct?

Right. Because he's probably in another scheduled meeting. So writing that stuff down is a huge step forward when basically all the interactions are scheduled, right? Probably the number one practice that teams can adopt and then also putting stuff into systems that grab. Automatically and identify who's responsible for things because the responsibility for these components of of an enterprise, it hasn't gone away.

Like you still have somebody who's responsible for if your customer database falls over, right. They're still responsible for it.And before you used to be able to go and say like, Hey, who's responsible for the customer database. And you know, somebody would be like, oh, go, go talk to go talk to him. Now, having that documented somewhere in something like a data catalog is hugely, hugely.

Yeah. Although, one thing that I really, really love about BrightCove is the way that we have taken that process. And we've actually introduced it in the slack channel that we have. So we have a channel that was called engineering, who should I ask? And literally all that it is. It's like, I am looking for somebody in the company that can talk to me about X.

And you go there and you ask him to like, this is the first he needed to talk to you there in this channel. It's, it's incredible. It's such a great thing. If your company doesn't have one, I highly recommend it because as a newbie it's been really helpful and it's even helpful to some of the old timers when I see folks pop in and like, yep.

This, that was the exact question that I had.

[00:19:00] Michaela: Yeah, that's great. I mean, there are so many things that I want to touch base on. You're talking about SM Chronos and synchronous communication. Right. And I think, especially now with this remote work that we have with the shift to remote work, a lot of companies that haven't done it before and.

Is there also the shift to asynchronous communication? And they're not at some of those really flagship remote companies like, you know, said WordPress, for example, or, you know todo is for example, they were all about asynchronous communication, but I see that a lot of people actually, during the pandemic.

Set more on synchronous conversation. So they have meeting all over the day. What is your stand on that? What do you think about synchronous versus asynchronous communication?

[00:19:48] Patrick: I think that it's hard to take a, it was hard enough to take a company that was used to all in-person collaboration and move them to distributed collaboration.

It's another level of challenge, right? It really has to be baked into a company's DNA to move to asynchronous collaboration. And so, for myself I've got teams and I've had teams in multiple time zones in the past. I think at the worst point, there was one point in IBM that I had folks on a project that was on, that was there.

They were in 12 times zones. There was literally not a good time for us to ever have a standup call. So we had to find ways to segment it. Synchronicity still matters. I wrote a check. They wrote a paper about this in 2014 for Etsy, where, where they're, they're saying that, you know, latitude hurts long suit kills where it's saying that like distance matters, but the, like, if you're at different time zones, that's what really kills you.

And we empirically looked at it and we found out, yeah, like it's, it's much worse to have. If you're in New York to have a team in London than it is to have a team in Rio De Janeiro, the distances are about the same, but that asynchronicity kills you. So for myself, I still prefer to have teams that are somewhat synchronous, but we've had to be a lot more flexible.

What my teams usually try to do is that we try to establish core hours and norms around that. You know, I had one guy on my team at Verizon where, um, he had a very young child that was home. And he had to kind of manage that with his wife and, you know, so we said, okay, we're going to try to do core hours at these times on these days and core hours when you know, everybody is working and that's when you have team stand-ups team meetings, that sort of stuff.

And then you have a little bit of heads down working time, and then the rest of it, it's like, okay, Try to, if you can declare when your working hours are going to be realized that's not possible for everybody. So I don't think we're moving strictly to like an asynchronous world. I think that's a dramatic DNA change for companies where they have to like weave tools, like get lab into the core of everything that they do. But I think that we are moving to a world where synchronous doesn't mean nine to five.

[00:21:49] Michaela: Yeah. So I think that there are a couple of behavioral shifts that are introduced. If you think about the, as soon coroner's way of working, which means that you're documenting a lot, which you also described. Right? So when I'm stuck, it's not only that I can go to people, but I can maybe also go to our documentation somewhere or to our systems.

And we have some, some processes around where we all know this information is over there. Right? So it's a little bit more organized, but I also think that this synchronous conversations are really, really essential, maybe just for the social part, but you see each other and you know, you can talk to each other and you don't feel alone.

So I, I, yeah, I'm, I'm also a little bit split between it. I think that I would love to see a world where we leverage a lot of this asynchronous behaviors where we try to steal our knowledge and put it somewhere and it's find-able right. You know where to go. But then we still have the social connections and, you know, the, the, the nice times together as a team, I think bonding mechanism more or less, right?

Yeah. So one of the things, especially also connected to remote working and the pandemic and everything, and because you are now, I would say even in the executive role, is that, is that already, would you say is strategic executive role, right?

So. Do you think about the productivity of your folks? I think a lot of people are very, very worried about the productivity and how it plunges during the pandemic. So what were your strategic things, did you think about how to track that? How to measure that, how to improve that, was that on your mind?

[00:23 :22] Patrick: I think that this was a little bit of a different situation because we started off with lower expectations for what productivity was going to be at least because we realized that a lot of people, we were sending them home and they had kids at home and, you know, it's not a normal situation. So it was never the case that we expected normal, you know, normal whatever that even means anymore.

Normal productivity. I have never been one to try to measure productivity of my team. Like I think that, you know, it's, it's a little bit based of, in some ways, you know, once in a while you hear about like software engineering, you know, they get a pointy haired boss who wants to measure the number of lines of code that they wrote.

And I was like, what if I refactor something that I did negative a thousand lines of code? Do I get anything for that? And so, like, I've never been one to try to, to try to do that. It's more along the lines of You know, let's meet as part of our sprint cycle and set up your commitments that you have or let's meet.

And, you know, if you're not working on sprint, if you're working on Dawn, you know, make sure that there's something that you can demo periodically and using that as the metric. So and I found that that works well. I found that, you know, it shows that I have a level of trust in my employees because I hope that they would also trust me, you know?

So if I want, if I want them to trust me in my decisions, I need to trust that, you know, when we let them work for it, They can be as productive as they need to be, you know? And in this case you know, there were a few people and I, I, I'm kind of one of these people, myself, Well, I think overall my productivity went up.

That's probably a combination of working longer hours, you know, and that's probably, you know, a combination of just other factors, like the fact that I felt a little bit more free to to have a little bit more flexible of a schedule and everything like that. So if I needed a dentist appointment, you know, it was easy to go see the dentist because well, you know, people just.

It is any times work time and any times, not now, you know, but I really did try to have my teams have a balance. And I just trust them. So, and I want to say a balance like I kinda will get on people if I know they're working like more than 40 hours a week and everything. Cause that's not going to be long-term productive either.

[00:25:26] Michaela: Yeah. Yeah. I recently did a study and I was interviewing a software engineers and a lot of people talked about how they're working long hours now to compensate for different things. And a little bit, as you said, it's blurry, right? It's not, it's not a nine to five anymore. It's like work time is every time it's work time.

Right. So in between your, your. Putting life. And so it gets a little bit plural. I wonder, I think people will learn to get the boundaries straight together, what they need for their health and for their life. And maybe it's a little bit of an adjustment period that we're still in it. I think we, as you said, we can lower our expectations and I think it's really meaningful to do that.

[00:26:06] Patrick: This is where it's helpful that a lot of companies have given, you know, small stipends to their employees to buy equipment that they need to be more productive at home, whether that's like a decent desk or something like that. And I think that's just been like hugely helpful because, you know, my, my wife is a professor at, at a university. And so she was essentially working from home, but, you know, she didn't have like a big monitor or anything like that. Like she normally had in our office. So for like the first two months in the pandemic, she was working over her 13 inch laptop screen.

And then finally I'm like, all right, I'm just going to order you a nice new monitor because like, it's not worth it to see you hunched over like all that. And then like, after the first day, she's like, wow, this is so much better. And like, yeah. So like she probably had lower productivity because her. Where, where she worked the university, they didn't give those stipends to the faculty members to be more productive at home.

So I think that's a small step that has gone a long way for a number of companies. [00:26:00] Yeah.

[00:27:00] Michaela: Yeah. And I think it's really short-sighted because it's a one time cost and it's really low, and that has tremendous compound effects actually. Right. For your employees. So I don't know who did that calculation, but it's the same, I think for, for some days off when, during the pandemic, you know, I think, I don't know, this winter, a lot of people just needed some days off and I think it would have been okay for a company to say, you know, Whatever days, you need to take some time, some rest, and then you come back refreshed because even if they are not, if they need it and you don't take it, then they take it anyway.

Right. They take it on your desk or on their desk at home and you don't, you don't see them working. So yeah, yeah.

[00:26:45] Patrick: Yeah. Well, one of, one of the things that really impressed me or bright Cove when I was interviewing here is the fact that since basically, I think it's like April, or maybe it was may, the company has had half-day Fridays, every single Friday.

So this isn't just a summer or winter thing, but what it means then, and this, this can't be true across the entire company, obviously, because we have 24 x 7 operations, you know, around every country in the world. But what it means. Even at like the highest levels, you know, at the C level.

And I'm a C-level executive and even the CEO level, like you look on their calendars and Friday afternoons are blocked off now, does that mean that they're not working? Maybe, maybe not. But what it means that it sends a very powerful signal to everybody else.

Hey, this is not the time to be working, going, you know, go and take advantage of it, go for a hike. And there's been a little bit of a culture of people sharing what they've been able to do to get outside, reconnect with other people. And it's been absolutely terrefic.

[00:27:45] Michaela: Yeah, I love that. So today I was listening to another podcast and Nathan Barry was talking and he's the CEO of ConvertKit and he was talking about how he, when he scaled the company, he became insignificant that how he, he said, right.

He said that he became insignificant because if he spends two more hours, you know, doing something grinding or, you know, 15 hours more this week, it didn't really move the needle for large companies, because, you know, there, there are hundreds of people working. And how it's more powerful for him to go and, you know, have a walk and then be more strategic.

And I thought, well, that's true right on this large level. But I think it's also true for, for a data scientist or for software engineer, right? Because these two hours that you spend to re-energize yourself, you will be so much more productive and so much more focused. And, and you know, this really pays off. So I'm, I'm a big believer in taking your time, taking time off if you need it.

And coming back with more energy to, you know, to double the work or triple the work of what you would have done otherwise. Right. So what's this two hours. Enjoy it. Yeah.

[00:29:53] Patrick: Yeah, the number of great ideas I've had just by going for a run or going for a walk. And, you know, I really hope that, you know, once we're back in the office, you know, we can take advantage of that.

I had a culture that at capital one where, you know, a lot, I drank so much coffee there. Cause a lot of my one-on-ones would be like, Hey, let's go and let's walk and go grab coffee while we're doing that. And just that culture moving away from the space that can put you in a different mindset is hugely valuable.

[00:30:17] Michaela: Yeah. Yeah, that's true. So Patrick, before. Stop. I want to pick your brain a little bit about something else. It is going back to your expertise and you know, your academic career and your nerdiness actually about mission learning software and all of that. So what do you think, what does it take to build machine learning software?

What has to be different? And also how can we make unbiased machine learning models or very often has to do with the data? Or how can, how can we ensure that we are building unbiased. software.

[00:30:51] Patrick: Yeah. So, so when you say machine learning software, you're talking about the models at the core. Are you talking about the software that goes around them?

[00:30:58] Michaela: Yeah, that's a good question. Right now I think I want to like focused focus on reading the models and the data. Right? So really going, going into the center and say, let's say we have an insurance company. I don't know if this is a good example for you. You worked on other things, but in an insurance company.

And we want to make sure that whatever data we feed it. We know, you know, it's not making decisions that we wouldn't support in normal life, which I think a lot of touring company unfortunately make.

[00:31:27] Patrick: laughs Yeah, well, I don't know. They might make those in normal life too, that, that's a question you'll have to ask them about the optimization function.

So, so one of the things that that I think is the most critical is that you can't treat. Every machine learning problem as though it's a Kaggle problem. And so Kaggle has done a great thing to introduce machine learning algorithms and how to structure those algorithms to a good chunk of the world.

But it's only a tiny, tiny little sliver of what it is a successful machine learning Engineer or data scientist actually does. And that's the actual modeling space of it, but there's the whole phase beforehand of really understanding the data and doing the translation from the business problem to the data you have, making sure that that data really supports it.

And then, building the model on top of that. And this is one of the things that like, it's, it's kind of a hard skill to teach. It's like what to look at in the data to make sure that you're making the right decisions. And so the best thing that I can say is to always find the person who really understands the data.

It's probably not the data scientist. It's probably the person who owns the source. And then defined other people to go around it. And I'll use an example of a spot where I built a model on my team that turned out to be have a, have a flaw in it and we almost released it. So this was a model that was trying to predict why somebody was calling a call center, because if we knew why they were calling a call center, we could redirect them one way or another.

And we can't idea if they were going to be happy or sad. Right. And then, you know, that, that suggests we put them in different cues and everything like that. And so we [00:33:00] had a feature of this model where it was looking at how many times the customer had, had called in like the last week. Okay, because intuitively you think if you have to keep on calling back to your wireless carrier or to your, or to your bank because you have a problem, you're probably, gonna be really, really frustrated.

Right. And so like that suggests that we need to there's something else. And the model had a high lift, which was great, right? Like lift means that it makes the model work better. I ended up the feature had had a pretty high. It wasn't until we were talking to people who, you know, not only were experts in the data, but then we started talking to people who were experts about protecting classes that they raised the question of.

Do you know how much more often somebody who's deaf or hard of hearing has to call our call center. So they use these, you know services that either there's another person on the end of the line, that's reading it to them or the speech synthesis services. And they end up having to call a call center usually three to four times more often.

And so without that, without actually finding that additional expert beyond the person who was the core data expert, but the person who kind of understood more deep about the data, we would have released a model that would have said that every single person that was deaf and hard of hearing when they called the company that we built it for was angry.

And it's like, well, that's not fair. Like it's not, it's not their choice. So really finding those experts and getting other sets of eyes on your assumptions, even before you built the data. Now there's the next step after that, which is like actually finding somebody who really understands the algorithms that you're using too.

To have them like poke apart the algorithms and even really understands the implementation of algorithms to really poke around that too. But I think the first one is that data mastery step and a lot of data science like, oh, it's data munching. It's not a hyper parameter optimization, disco, dance party and everything like that.

But without that, it's just way too easy to make big mistake.

[00:34:52] Michaela: Yeah. So I have one question for you now, because it's just popped in my head. So [00:34:00] you've said it's not fair for that person. Right. But there's always misclassification in data, you know, in machine learning and so on. Right. So what, what percentage of misclassification is fair?

So, because there will be people, you know and one person will be misclassified. So they will put into this angry class, even though they are happy. So where do you, where do you make this distinction? When is it ethical to release a model and say, well, we can live with this misclassification here.

[00:35:28] Patrick: Yeah. So, so there's two different aspects of it. One is is there a human being that is interpreting the output of the model before it's going to the end person? So like, if there is a human being that's saying, all right, this person might be a little irate and then, you know, they, they changed something along the way.

That means that you always have that check. You can have models that are a little bit more noisy when you have a human in the loop is kind of what we call it. You've got a human being that's part of the process and everything like that. So you can, so they can be a little more noisy there. Yeah. I think the big thing that you want to check is, is the fact that somebody was misclassified or do we disproportionately misclassify people due to a characteristic?

They can not. And so in the United States, we call these protected classes. And so a protected class is, you know, it's your, it's your race? It's, it's, you know, disability, it's your gender, all that stuff. So you want to be sure that you actually check on that and yes, misclassifications happen. You'll put a cost on those and everything like that.

You know, usually you drive like you'll draw like an ROC curve to determine like where you set the thresholds and everything like that. But you also want to really dig in after you've set that point. Alright, let's see. Are we, you know, disproportionately affecting people who live in one area, are we disproportionately affecting people who are immigrants?

Are we disproportionately affecting people who are hard of hearing? And that can be a really difficult process, but especially when you're doing it. Yeah. Without a human being in the middle who can say, oh, Ooh, wait, wait, maybe. And even if you are doing it, there's some training after you to do, but especially when there's not a human being in the middle, [00:36:00] you need to be really, really careful about that.

[00:37:03] Michaela: So maybe the last question for you is. All of that sounds super, super fascinating for me. Like I could talk with you another hour probably or two or three. Got it. And dig deep and understand it. So really a little bit as you, as you, as I introduced you're nerding out. Right? So the nerd nerdy side of you, are you missing that as a C level?

Are you still doing that? Do you have those conversation? Do you dig deep and think about your models and these curves and how can we change it and improve it? And how can we even come up with better algorithms to do it right. Do you still have that part of your job or are you missing.

[00:37:39] Patrick: So bright co fortunately is a small enough company where I still have that as part of my job, thankfully.

I it's, it's not the full-time part of my job, but I think that I started to realize at capital one, I remember I had a discussion with, with my boss at capital one, who also was an accomplished data scientist and, and everything. And we were just sitting around one day and I'm like, man, you ever just miss, like, Opening up a notebook, like a Jupiter notebook and like crunching data.

It's like, yeah. Once in a while. And then he's like, but now it feels like I've gotten so far away from it that I like anything that I, that I would do would be just like, you know, so out of date it would be hopeless. And I think that's the bigger thing that I try to manage is. I, I don't try to like get involved with every project, but I try to stay on like one or two things to keep my skills relevant.

And so that way it can add in something, because I find that if I'm building a data platform, one of the things that you need for a data platform is empathy with your data scientists who are actually going to using the thing. So if you don't have that empathy about how they're actually using it, it's going to be a lot more challenging.

So. I try to find time to do it. You know, I listened to a lot of the same podcasts that other people do. And when I do get a chance to actually dig in and, you know, build a model on something, I really do relish it now. Like it's not like a burden, but do I ever get a chance now to [00:38:00] build a model and take it all the way through to production?

Very, very rarely. Cause you know, usually it's like, this is a POC and then you get somebody who's doing it full time that actually like. I know they're just, you know, swearing as they rewrite my code. But but I definitely, definitely miss that part of things. Like it's, it's a hard balance to walk

[00:39:18] Michaela: so do you miss

[00:39:21] Patrick: Yeah, I definitely do. I think that especially with my wife being an academic and, and everything like that, you know, it's something where like, there are times that I see what she does and it's like, oh man, I miss that.

There are times that I see what she does. It's like, I don't miss that at all. And so I, I miss the ability to pursue problems simply because they are interesting, not because necessarily they have a, a direct monetary value. And so th th there's a difference in the types of research that you do. You can still do research in it industry.

But when it comes down to like the basic research, so like, would we ever do like basic research about new neural network structures or something like that here at bright hope?

The answer is unlikely. Like there there's, we're not a large enough scale to do that, but I miss like having the opportunity to dig in and think about that stuff rather than thinking about like, okay, how do we, how is this just applied? If you're, if you, one of those people in grad school where you, you, you don't like the basic stuff, but you love the applied stuff.

I think it's an easier transition. I liked a lot of the basic stuff and I think that's the part that's that's hard for me to.

[00:40:26] Michaela: But why did you transition out of academia anyway? Like why, why are you not a professor now, but you're a C level executive.

[00:40:34] Patrick: Yeah, I think it was an accident to be honest. I, I'm entirely honest when I say that.

So I went to IBM as a postdoc from Carnegie Mellon and you know, IBM research still is a great and wonderful and magical place. It's one of the last few remaining like industry. Basic research labs. That's associated with a company like Google Scott one now, and Microsoft got one, but the breadth of the research that IBM does [00:40:00] is still astounding.

So like that wasn't a case of me leaving academia. At least I didn't see it as leaving academia because there were plenty of people and there still are a lot of people that go to IBM and then go on to very successful careers in academia and everything. So I viewed it not as leaving academia. And then it was just sort of.

For a long time at IBM. My career was a little bit of a rant and walk. Like I didn't plan to be part of the group that launched the Watson group. I was complaining to the wrong executive about, about something. I was complaining that, oh, you know, we don't have any data about this stuff. So we can't build better models for stuff because, you know, we keep all of our data siloed and Google and Facebook.

And this was like in 2012, they know everything that you do. So of course they can build better models to us. And, you know, that led to discussions that took me into Watson. And, and Watson took me away from core research and more towards the product side of things. And then it wasn't until I was in, within Watson that I really started to think about what is it that I want to do with my career.

And so I sat down with Rob high, who was the CTO of Watson group at the time. So he's an IBM fellow has, you know, he. Brilliant and absolutely brilliant guy and guy that I'm very thankful that I got a chance to work with. And, you know, I already was like a team lead at IBM at this point. And he's like, so what are you going to do with your career?

And like, I don't know. I guess I kind of like your job and he's like, what do you mean by that? I'm like, I'm like, well, CTO of like a two to 3000 person company or brand or something like that.

And I saw the position here at bright covenant really is basically kind of what I sketched out. Whatever it was eight years ago when I was sitting down with Rob hi after work one day it really is kind of what I wanted to be. You know, it's not the CTO position, but CDOs didn't exist back then really, you know, it's a little smaller company, but it's really, really interesting problems, you know, it's within the order of magnitude and everything like that.

And so, yeah, so I'm here now and it's going to keep on being fun, but I wouldn't say that I ever consciously stepped away from it, which [00:42:00] is really.

[00:42:01] Michaela: Yeah. I mean, I think it's really good if you make a plan where you want to be, and then somehow over the years, I think it doesn't happen immediately for a few people.

I think it happens immediately. It's more like a transition. Getting into that. I think for me, it took still in the process somewhere.

[00:42:21] Patrick: Cause you you're, you're sort of similar. I mean, you, you were, you were, you were this academic with this weird art background and everything like that, then you know that then you started doing research at Microsoft establish yourself as a world expert in code reviews.

And you know, now you're kind of off on your own, you know,

[00:42:37] Michaela: And I think it's good. I, I really grew into that off on my own thing, even though I just did a research project just submitting the paver, so back a little bit, but from a different angle, I think it was really good. And, you know, like, but still, I think it will never figure out exactly how it's going to be, but it's fun.

I, I love the journey, so I think that's, that's important. And the funny thing is that, you know, for me, it's always a little bit more blurry, but I wanted more flexibility and that's why I went off on my own. When I did it, it was so scary that I couldn't even enjoy it. And now that I'm a little bit more settled, I'm like, really?

Oh, I like it. It's probably

[00:43:14] Patrick: the right thing. .

[00:43:16] Michaela: So yeah. Yeah. Okay, Patrick, thank you so much for taking the time and talking with me. It was really wonderful to hear all about your experience, your journey with machine learning, with all these passwords. We didn't encounter that. I had like a question about all the buzzwords, but maybe I will invite you again.

And so we can talk a little bit more. Is there something that you think you want to. You know, tell my listeners that you want to give them on their way. If they are still searching for them.

[00:43:48] Patrick: Boy, I guess the good thing is to be relentlessly curious and when you find something cool, remember that, that person who wrote that thing, that's really cool.

If you email them a lot of times, they will [00:44:00] email you back.

So like, don't be afraid to approach these people and to ask ask questions of them.

[00:44:06] Michaela: Yeah, that's a good, that's a very good advice. I love that. So thank you so much, Patrick, for being on my show and have a good day.

[00:44:13] Patrick: Bye bye.