For decades Artificial Intelligence has been a focus of best-selling science fiction authors and an antagonist for blockbuster Hollywood movies. But AI is no longer relegated to the realm of science fiction, it inhabits the world around us. From the biggest enterprise companies to plucky startups, businesses everywhere are building and deploying AI at incredible speed. 

In fact, open source allows anyone with a laptop to build impressively good AI models in a day.

But for all the recent advances in AI, what are its limitations? And if you are a developer or business leader, what use cases can AI solve for your company? 

In this week’s episode of Dev Interrupted, Emad Elwany, CTO and Co-founder of Lexion, walks us through the practical realities of AI in today’s world and how its constraints apply to your business. He also discusses the biggest breakthroughs in AI, how to build machine learning models that actually solve a business need and why it’s almost impossible to retrofit AI once it’s already built. 

Whether you are a business leader who is considering implementing AI at their company, a consumer curious about how AI impacts your daily life or a researcher wanting to understand how to better deploy AI outside of the lab - this episode has got something for you!

Episode Highlights Include:

  • (1:18) Growing up in Egypt & Emad’s start at Stanford
  • (8:35) Why AI is “glorified pattern matching”
  • (16:38) The biggest impact of AI on your daily life
  • (21:13) What people in academia get wrong about building AI
  • (35:35) Customers don’t care about AI - they care about solutions
  • (41:17) Enterprise vs startups: how they each use AI

Transcription:

Dan Lines: Host

Emad Elwany: Co-founder and CTO of Lexion

---

[Music Plays]

Emad: [0:00-0:15] Right now, AI is really glorified pattern matching but it's really, really good at pattern matching. And computers have been good at pattern matching for a long time. This is not new. We've been doing it since the 60s or even before. But what's happening is the AI is doing pattern matching on things that it hasn't seen before.

Interact Promotion: [0:16-0:46] Over thirty-billion-dollars in engineering wisdom will be at your fingertips at Interact on April 7th, join engineering leaders from Netflix, Slack, Stack Overflow, American Express and more at Interact, a free, virtual, community driven, Engineering Leadership Conference. One day, twenty speakers, all selected by the thousands of engineering leaders in the Dev Interrupted community. If you are a developer, Team Lead, VP or CTO looking to improve your team, this is the conference for you go to devinterrupted.com/interact to register today.

[Music fades]

Dan: [0:47-0:59] Hey, what's up, everyone? Welcome to Dev Interrupted. I'm your host, Dan Lines and today I'm joined by Emad Elwany, CTO and co-founder at Lexion. Emad, thanks for joining us today.

Emad: [01:00-01:01] Of course. Thanks for having me.

Dan: [01:02-01:10] Yeah, awesome to have you on. Also, congratulations, so you did an eleven-million-dollar series A funding round last year, right?

Emad: [01:10-01:11] That's right.

Dan: [01:12-01:21] That's really, really cool. So, we'll get into that a bit. Also, we were talking pre-pod you let me know that you are from or were growing up in Egypt.

Emad: [01:22-01:01:25] That's correct. Alexandria, Egypt, on the North coast, a very beautiful city.

Dan: [01:26] What was that like?

Emad: [01:27-01:42] It was great. I loved growing up there. It's a very cosmopolitan city. It’s very old. It's been around for a couple of thousand years. So, you can like see a lot of interesting stuff, even on your way to college or work. Have a lot of family and friends still there. Go there at least once a year. It's a nice part of the world.

Dan: [01:43-01:45] Yeah, yeah. When did you move over to the U.S.?

Emad: [01:46-01:56] I came to the U.S. a few times in, like. around 2008 and 2009. I moved here permanently in 2010 and have been here ever since.

Dan: [01:57-02:12] Yeah, I want to ask you a little bit about your background, and you know, professional career software engineering, you're doing interesting things with AI, you're a co-founder. So, walk us through kind of your professional journey.

Emad: [02:12-03:36] Absolutely, yeah. So, I started initially in academia, so I got a-I went to undergrad for computer engineering, which I enjoyed very much. And then I started to join industry for a while after graduation, I had to choose between continuing in academia or going into industry. But I wanted to try industry out for a while. So, I joined Microsoft. And while working at Microsoft, were I ended up staying for a long time around eight or nine years, I went back to grad school part time and ended up getting my Master's in Computer Science at Stanford. And then after these eight years, I decided that I want to try startups. So, the Paul Allen Institute is a very, very famous AI Institute here in Seattle, they had a great incubator program that helps people like me coming with like big tech experience and some academia experience to start companies in the AI space. That's where I joined. I joined them in 2018 and I met my co-founders there. We tinkered a bit, we had some ideas, and we ended up building what's now Lexion. It's been a very exciting journey. At Microsoft, my experience spanned many teams, I worked for a bit in search and paid search, I worked in social computing, I worked in NLP and natural language processing and AI, conversationally I have-I've really worked on a lot of great teams and met some of the greatest people of my career there and I've taken all this knowledge into Lexion and enjoying now and working on using AI to solve business problems.

Dan: [03:36-03:43] That's a-actually a really interesting background. You got your master’s at Stanford, right?

Emad: [03:44] Correct. Yeah.

Dan: [03:44-03:49] Do you think it was worth it to get your master’s-you did in computer science?

Emad: [03:49-04:16] It-I did. It absolutely was just because it's-in our field, it's important to be on top of things and like the latest research and Stanford is really good at that. Also, the community there is amazing from the staff-the teaching staff to the student body and just everybody there, there's just this vibrant energy. And everybody's like trying to do great things and they inspire you and you'll learn a lot of great tools and it just helps shape you as an like well-rounded software engineer and AI practitioner, so definitely was worth it for me.

Dan: [04:17-04:34] Then you did some stuff with Microsoft for a while, big company. Then you mentioned you did the incubator program like startup incubator; I actually never did that. We had an opportunity to do it but who actually passed on it. What was the incubator like for you?

Emad: [04:35-05:04] I think it's a really good transitioning tool for somebody who wants to get into startups what-what doesn't want this early, all the huge risks that come with a startup, so some people do it like the hardcore way and like go start a company in the garage. I had a family at the time and a lot of like obligations, and it wasn't like as-I was a little risk averse. So, the incubator is like a good balance, right? You're still in an environment, you have a community. You can try it out and then go back to a job-a normal job if you want to. But if you strike like some good idea, then it allows you to pursue that.

Dan: [05:05-05:08] And you actually met your founders there? Your co-founders [crosstalk] [5:07] for Lexion?

Emad: [05:07-05:14] Yeah. Yeah, both of them. Yeah. Gaurav Oberoi, who was a serial entrepreneur. And James who has also worked in many startups before.

Dan: [05:15-05:20] Okay, let's talk about Lexion what happened to start Lexion? What's the idea? What's it all about?

Emad: [05:21-06:15] So we went into the incu-like, I went to the incubator with an open mind, my experience was in natural language processing. So, I was gravitating towards it but I was pretty open. And James and Gaurav and I tried a few different ideas. We ended up building like, as like a company that was different, and we got like a multi-million-dollar contract in a very different space, but then decided that it's not like we're not passionate about pursuing it. So, we handed over to other very capable folks, and decided to switch gears at something new. And then due to the-my experience in document understanding and language understanding, I was looking closely at use cases there. And we ended up starting a company that was around generic document intelligence. And then over time, we started looking for great use cases for this doc intelligence technology. We tried it in many domains, we get got success in multiple domains, but there was very obviously something very worthwhile pursuing in legal. And that's where we started and kept going.

Dan: [06:16-06:27] Yeah, so you went down kind of this, let's help legal people work through contracts faster, using AI, something like that. Does that sound about right? Okay.

Emad: [06:28] Absolutely.

Dan: [06:29-06:46] And you landed your series A last year, you're the CTO of the company, you’re a co-founder. What is it like to be a CTO in a pre- series A company? Like what were you doing? What was your roles and responsibility?

Emad: [06:47-07:22] A good question. It's very different. It changes a lot. Pre series A, I was doing the job that I'm supposed to be doing. So, like driving the technical strategy and doing like hiring and building product and writing code. But I was also doing a lot of like operations work and recruiting HR administrators, IT, wearing many, many hats. I think after Series A, I got a chance that we now have the resources to get people who are specialized and are probably a lot better than me at doing these jobs. And I started to have like, more focus on like working closer to the tag and focus on growing teams and cultures and tools and just setting-helping set the strategy of the technology for the company.

Dan: [07:23-07:30] Okay, so AI, it's kind of-kind of a buzzword, like how does it relate to Lexion and then your role specifically?

Emad: [07:31-08:10] It absolutely is. So, we are like AI first company, we wrote like a lot of code that’s just AI even before writing the actual user-facing application. We knew we had a vision so, we were thinking we're making a gamble and taking a bet on the technology. But we invested a lot in just generic document, understanding platforms, and built a lot of tooling around like annotation, building models, deploying them, testing them, etc. Before going to the user facing product. Now we have a lot of-a lot more around AI from beautiful applications and a lot of experiences on top. But the AI is still very much ingrained into everything in the product and the way we think about all the experiences we offer.

Dan: [08:11-08:22] When you talk about AI to like, you know, your friends or family. How do you describe it to them? Like what is AI to you when you're talking to regular people?

Emad: [08:23-09:08] Yeah, I-first I always start by just telling them it's not like you see in the movies or the media. It's not singularity and Skynet and sentient machines, we’re pretty far from that. Maybe we'll get there, maybe not. The question is still unanswered. But right now AI is really glorified pattern matching. But it's really, really good at pattern matching. And computers have been good at pattern matching for a long time. This is not new. We've been doing it since the 60s or even before. But what's happening is the AI is doing general matching on patterns, pattern matching on things that it hasn't seen before. And this is the beauty of AI you can show it like some examples. And then it will extrapolate and go further to other areas that you never expected. So, it's still kind of very much as a tool and not like something that is science fiction-y. But it's a very useful tool.

Dan: [09:09-09:25] Yeah. And you know, sometimes people are saying, I'm going to lose my job, I'm going to lose my job to a computer, maybe even when you're talking with your customers. I don't know, maybe they're afraid the legal team, they won't need me and stuff like that, like how do you respond to that type of talk?

Emad: [09:26-10:41] I tell them, there's definitely a small subset of jobs that will be heavily impacted by AI. And this is not new. I mean, automation has been replacing jobs for a very long time with the industrial revolution and beyond. This is not a new thing. It's just that AI is going after kind of information work instead of just like other kinds of jobs. However, AI is still very much in its nascent days and very much an assistive tool. It's really good at tackling problems that are repetitive and mundane. And it's not just good at innovation or creativity or things that require deep critical thinking. So, my answer to people is, it's not going to replace lawyers and doctors, it'll help lawyers and doctors be better at the job and more efficient. So now the doctor can like review hundreds of images of CT scans instead of just a few dozen and derive conclusions quicker. But they will still look need to look at them, we're gonna need these roles. For the roles that are kind of-can go away with AI they’re honestly, roles that I think are-are not a good way for humans to spend human capital properly, probably people should pursue different careers where they can kind of like do more interesting work. So, I would encourage people to kind of like look where-where we will be in ten years, what AI is gonna replace, and then pursue careers that are working alongside AI, or maybe held the-build AI as well, because there's going to be a huge demand in that space.

Dan: [10:42-10:58] When you talk about AI with your colleagues, what are the kinds of things you-you talk about? I think you told our producer, there been a lot of breakthroughs in AI in the past five years, like what's going on in the industry now?

Emad: [10:59-12:25] Yeah! So, there are a couple of questions here. So, to answer your first question about how I talk about AI with my colleagues, I talk about business value. So as a company, at the end of the day, we are there to solve customer problems and create value, right? So, I encourage the whole team to always ask themselves, like, “Why am I building this machine learning model?” Building AI, for the sake of AI, is not a good goal, you have to tie it to a good business case. And you have to pick the right, like, problem, where the accuracy of your model will be able to solve the problem because not every AI models same some of them are like at 50% accuracy, other at 99% accuracy. And they're both valuable if applied to the right problem. So, I always encourage people to start with the problem. And then I also encourage them to think a little beyond today, because if you think of AI, like, OCR is AI, right? Optical Character Recognition. Two decades ago, OCR was very cool, it was very easy to make a lot of money building OCR solution, you could charge dollars per page. Now you charge anybody who can-can do OCR at the cost of like fractions of fractions of a cent. So, it became very commoditized. Today, we're at that stage with document understanding, it's like the cool thing, you're solving a lot of new problems, you can charge a lot of money for it. But in ten years, it's going to get commoditized, right? So, you need to go one level of the pyramid of value and like offer a lot of experiences on top of it to protect your mode and to continue to be relevant. That's kind of like the message I try to share with my team. And then I can talk about kind of like some of the recent breakthroughs if you're interested in that as well.

Dan: [12:25-12:27] Yeah, absolutely. What's going on?

Emad: [12:28-15:17] There's been a lot going on, I'm gonna pick three, because I don't want to talk about this for hours and hours, even though I could. And the three that kind of like made our lives easier and give us a lot of impact at Lexion, also with unsupervised learning. So traditionally, when you're building machine learning models, you use something called supervised learning, where you need armies of a human annotators going at every-at every example in like tens of thousands, or hundreds of thousands and give it the right label. So, look at an image of a cat say this is the cat. Look at an image of a dog then say it’s a dog. It’s very time consuming and expensive. With unsupervised learning, you got rid of a lot of that, because machines became good at understanding data without labels. So, if a computer looks at a lot of text, and then notices that every time there's a person's name, and then something and then an action is very frequent, that this something is like a job title. So, you can say something like Elizabeth the Queen, honored the knight. Mark, the doctor cured the patient. So, from there just this pattern matching without anybody saying that queen and doctor are job descriptions or job titles, the machine can infer that. Now take this and apply it on hundreds of thousands of constructs that the machine can learn unsupervised, and you can build very powerful things with that. The second area that there's been a huge breakthrough is transfer learning. And transfer learning is the tool that allows you to train a model for task A or to solve problem A, and then invest in it a lot, maybe months, maybe years. And then you have it working well. Now you're going to solve problem B, instead of starting from scratch and working for months and years. Now you can take all your learnings or most of your learnings from A, plug it in into B, do a little bit of extra work like maybe days or just weeks tops and then solve problem B, this has been an instrumental, like a huge change in the way we operate machine learning. And it allows us to now build hundreds of models in the span of a year instead of just one or two. The third and last area, I think that we use a lot and are big believers in at Lexion is weak supervision. So, trying to-it kind of like helps you use the intersection of these two things. Instead of like having an army of humans and a huge annotation team go and label it as I mentioned, instead, you can have the human expert try to instill their knowledge in like a rule or a labeling function, as they call it in the literature. So now I am an expert. Instead of going and labeling these ten sentences as person, job description, action, I can then say hey, you know computer, whenever there's a person name, and then something and then the action, this something is actually probably a job title. It's not gonna be always accurate. There's gonna be examples where this doesn't work, but if it works towards 80 or 90% of the time then we can go and label hundreds of thousands of examples in one swoop. And we have tools that allow us to take these labeling functions. And then use the ones that are good and drop the ones that are bad and build models much faster. These are they will take just because they influence us so much, but there's probably a lot more there.

Dan: [15:18-15:43] That's really cool. I don't even know if this is a thing or not. But I know, I forget whose law it was, it was something with like, computers are processing X amount faster, every year, chips are running faster and faster and faster. Is there anything in AI a similar law of like new models can be created this much faster every year? Do they-anyone talk about something like that?

Emad: [15:44-16:23] 100% I mean now and because of open source and academia now like a company maybe like Google, or like some of the research institution will like literally train a model for months on their very expensive hardware. Yeah, and then put it out as open source. Anybody can download it on their MacBook laptop, write a few lines of code on top, and then build the model in a day. That's like impressively good. So, for sure that we're seeing the same kind of evolution there. And it's fair this With advancements in computer architectures, so like with GPUs, and ASICs, chips, and CPUs that allow you to train even faster, you're now able to do things that literally would have taken years in the 90s to take like, half a day today, and it's going to keep getting faster as well as we go.

Dan: [16:24-16:42] Where do you think just regular, I guess people or consumers are impacted most by AI. Is it like what ads we get served or like some? I feel like it's always something that make money off of us like where do you think like what is the biggest impact of AI on regular people's lives today.

Emad: [16:43-16:55] Good ques-it's very ubiquitously people ad is unfortunately one of them. I own it's the only I think the CEO of the ITU says this sentence like the brightest minds of our generation are working on showing more ads, which is

Dan: [16:56-17:00] That's the saddest thing ever. That's like really true, though, I think. Yeah, it is.

Emad: [17:01-17:27] But it's everywhere. Like Siri, when you're talking to Alexa, when you are basically traveling and like getting the your face scanned at the border when you are kind of like searching for something on Google and getting older. Alexa, she shares her name. She Yeah. Yeah, it's so-so yeah, it's really ingrained in everything you do. And people just don't notice it. And I think this is a good thing, because you don't have to be shiny as long as it's adding value. That's-That's what matters.

Dan: [17:27-17:37] Yeah, I'm in my transition, you know, well, here. So as the CTO, how do you talk about AI to the business side of your company,

Emad: [17:38-19:39] I try to kind of know, it's not different from what I talked to the technical to engineering and product teams, it's kind of all gets back to value except with, with our business side, I tried to work with them to explain the value we're creating to the customer. And to also not over promise to be very realistic, and seeing what AI is good at and what it's not good at so that people don't come with the wrong expectations. So, AI is going to allow our customers to find things faster. So-so like if it instead of like reading a fifty-page contract in a PDF? To answer a simple question, let's say the salesperson asked the General Counsel, do we have an active NDA with Acme Inc. Now the general counsel would have to go searching for the right contract three the dreaded figure out if it's active or not, and then answer with a do like Lexi on, they literally can just look it up in one second, just type Acme Inc. and see, oh, it's active. Or they can even tell the salesperson go look it up and Lexion it's right there. So, they'd save, it's much more efficient, instead of like waiting thirty minutes or an hour to do it, it just takes a few seconds. And then when you were unlocking a lot of value on reporting, and unlike mass analytics, so traditionally, customers and companies have like thousands of contracts, not a handful. And if they want to do any kind of reporting on their corpus of documents, or contracts, they even have to wait so long, engage outside counsel have a team of paralegals go read everyone summarized in a spreadsheet, and then report on it. Or they don't have to sample so they got to take you know what, it's impossible, we're gonna like read over a thousand, I'm just going to pick a random sample of five 5%. Or maybe take the most important ones and ignore all the rest. But with AI, you can literally now run the AI models on all ten thousand, get the report at with 90% accuracy. So instead of like having 5%, you know, up to 90%. This unlocks a lot of things that were impossible before they I now like you're literally can answer questions that you didn't have the answer to before. And it can really impact how you're running your business. And it impacts both bottom line and top line. So, this is why I work with my business team on explaining these things to the users.

Dan: [19:40-19:58] When you think about what Lexy on does and what it does with AI. How do you measure the success of AI in your business? When you were just talking there, I thought about time saving. How much time are we saving but what do you measure for success?

Emad: [19:59-20:56] We-We measure exactly that. So we measure kind of two things, we measure efficiency. So if you can do the job without the AI in an hour, or now with the AI doing in five minutes, so like you get it, I've saved so much time, right? So on average, we see our AI make people around 20x faster. So this is a very measurable thing. And then there's like a new value that was just even impossible. So you don't even have a baseline, you don't have like the one hour metric to compare against. But now like they have this report, where they will never miss a renewal of a contract ever again. So even if they were like really good at carefully tracking their big contracts, and know when to reach out to the customer or to a vendor to auto renew it correctly. Now, they can actually not miss a single one. And in not missing any single one, it's really impacting their-their key business metrics as well. So we measure these two things. Some of them are quantitative, some are qualitative. We talk to customers a lot. And we're always building kind of like new insights, and then measuring how much engagement we're getting from users and how they are receiving them.

Dan: [20:57-21:22] When they're thinking about, you know, maybe I want to become an engineer, software developer that focuses in this AI world, it sounds interesting to like prototype things, or like maybe you get to work on really, I don't know, interesting lab projects, what's the difference between doing stuff in a lab with AI versus it being out in production in the wild, having to solve business use cases,

Emad: [21:23-22:39] It is very, very different. And this is like a shock that a lot of academic people in academia and research hit when they try to industrialize the AI, building, like you can very quickly get like a Jupyter Notebook, a very famous machine learning tool, and like, grab the big model, dry two lines of code. And then you have like a cool experiment where you can like ask questions to the AI and get answers. And oh, that's cool. Getting from there to production is incredibly hard, you need a lot of infrastructure and machinery, you need to kind of like have training infrastructure, annotation, infrastructure, deployment, etc. I can speak about that later. But a lot of investments. And then also working in a lab setting is very different. Because you can have a cool, cool experiment, but once you put it in the wild in the hands of real users, that might fail miserably. So, in the early days, for Lexy on, we thought it's hard to get good contracts. So, we got very scrappy, unlike getting some datasets from like public records and like talking to cities and governments, because they have to make the conscious public right models that performed really well on them, then we showed it to a proper like to accompany. And it didn't work as well, because the shape of the data is very different from like government procurement to like enterprises. So, it's very important to experiment in the lab and get customer eyes on the data and on the results sooner, so that you don't build all this machinery explained and deploy it to find out that it's already solving the problem.

Dan: [22:40-22:51] I think it's good to talk about now, what does a technical setup and processes look like for production grade AI? Like, how are you set up today? Tell us all about it.

Emad: [22:52-25:03] Yeah, for sure. sure. I think this is the area that excites me the most actually, I really like production using machine learning models. It's it starts with the data at the heart of any AI, you need good high-quality data, well curated data. So, you have to start by like having very good data tracking systems where you can build datasets rigorously, like have your training sets, developments, as your test sets, your blind sets, make sure that you can track which data came from who from which customer from annotation from which kind of annotation. So, data management is a very big part of it. And then later, you will also need a lot of specialized annotation tools. So, there are generic annotation tools that will get you maybe 80% of the way there. But for your own vertical domain, you will want to build very specialized annotation tools that makes your team very efficient at it, then the next step is training, you want to have a training infrastructure to actually train the models, this can become very expensive, if you're not careful, like literally in dollar amounts, like our AWS Bill can easily grow into-into huge amount if we're not careful and efficient. So, you have to build this machinery, then comes inference, which is kind of like put now you train the model. Let's glue it into the user facing application and start showing predictions of AI. And then the last piece that connects all this well, there's observability and metrics. So, you have to kind of like monitor how the model is doing. How often is the getting the right thing? How often is getting the wrong thing? Is there a model drift? What's the confidence and like pump this data back into your training system so that you can use it in improving the next iteration. There's a lot of like hidden parts, like kind of like in traditional software development in machine learning, you have to worry about version control, you need to be able to deploy a model and roll it back if it has a problem. And computer software engineering has evolved a lot in these aspects. Now we all have like great version control until imagery. Machine learning is still more nascent in these things. So, we're still figuring it out. And building a lot of stitching off tools together and trying to have the same lab caliber of tooling. And then the last thing in this pipeline is the user experience. So, I think it's part of the pipeline. You can have great model but if you're not surfacing the AI and predictions to the users in a consumable and easy to understand and explain way, nobody's going to use it. So, a lot of people build all of this because exciting and ignored this important last piece and ended up failing because of that.

Dan: [25:04-25:31] That's super interesting. The thing that's catching my interest the most is, I guess, the confidence level in the AI. Is it actually working? Is it not working? That's where I would personally focus the most. Because to me that that's the value of it. How do you measure if it's working or not working? Is it qualitative? Meaning, are you do you ask the lawyers like, was this correct? Or how do you know?

Emad: [25:32-26:21] You have to do both? Yeah. So, you do it quantitatively by having like, good hidden blind data sets that nobody ever in the machine learning team looks at? That is kind of like agreed upon by the users. And you measure how far are you from that? This is the qualitative way, a quantitative way, sorry, but you still have to qualitatively because you have to ask the user like, what's your intuition? Like, maybe your numbers are really good. Maybe you're like, f1 score, and precision and recall and accuracy are all like, above 90%. But it's not being perceived as correct, because maybe you have the wrong UX, or the wrong schema, or like you're surfacing in a confusing way, or maybe surfacing the right answer, but not showing the user why so they can't really use it. Like, one thing we learned is that if you show the prediction, without highlighting the text that led you to make this prediction is as good as not having the prediction at all. So, you still have to do this qualitative analysis alongside the quantitative one.

Dan: [26:22-26:41] For your team, I'm interested in the different let's say, I don't know, how do you think about it? Do you have like an AI team? Or like, what are the different roles that a software engineer that's interested in even like working on your team, whether the different roles available? I assume not everyone is doing everything.

Emad: [26:42-28:15] Yeah, everybody does a little bit of everything. But there's specializations for sure. So yeah, like data scientists, or like researcher types who are spending most of their time doing modeling and experimentation. These are people with typical like academic backgrounds, like masters or PhDs in the field of like ML and NLP. And then you have your typical your, what we call ml engineers. These are software engineers could be generalist or read experience in it, but they basically use building a lot of tooling around the AI like building pipeline, building annotation, tools, building infrastructure, we have also data engineers, these are people who are just really good at scaling data problems. So once your training set grows beyond a certain size, or the number of models grows beyond a certain size, you have to like think about how you're gonna scale this in a cost effective way across either like horizontally or vertically, or a mix of both. This is very common. And then the there's two more roles. One of them is just typical application developers like full stack engineers who can work on building the actual beautiful experiences for annotation and exposing predictions. And there are just like good product managers who are gonna have like for them the unit of work instead of a feature, it's a model or an extraction. So, a typical product manager will like build a new UI feature or like ship some or back end API. But a typical pm for a machine learning team will decide what is the user interested in learning from the data. What is the schema for it? What how do I, like, transfer this knowledge from the cust-from the expert from the customer to the annotation team? And make sure that the data science team is working on it correctly, etc.

Dan: [28:16-28:25] The PMs in that situation; do they come from a software developer background? Or are they traditional, like product people that moved into the field?

Emad: [28:26-29:09] I think it can be both. I've seen this happen successfully, both at Lexington and prior careers, I think the most important thing is that they have to be a very good understanding customers. So, to instill this feedback back into the team, and then really good at unlock analytics and understanding of language so that you can design a schema. So, like, it's kind of like designing an ontology or like, or what exactly how do you like machines at the end are working with data and schemas, humans are working with concepts and ideas. So, they have to be just really good at transforming one to the other. And they have to have like good just like general PM, like they have to be good at writing a spec and clear communications and project, managing the project all the way from like inception to the foreman, like typical PM stuff.

Dan: [29:10-29:28] And is there any that usually in developing a product? There's kind of natural tensions or conflicts that can arise between these different roles? Is there any conflicts that you see between maybe I don't know, like the model builders versus a different role?

Emad: [29:29-30:03] There could be for sure, I honestly haven't seen it. And lexeme specifically, because we're a small team. And we all work we all like, we're very all very close to the customer. I can see it happening when you have like a much bigger organization. Yeah, like the data modelers are like four steps away from the customer. And there's like this agreement. But today, just because we're all like all one team and like very close to the use case. And our team is just very business savvy and always focus on adding value. So at the end of day, we ask ourselves, what's the right thing for the customer? And the answer is usually very easy, and we're not sure we go and ask You internal or external customers directly and find out to resolve that conflict.

Dan: [30:04-30:29] When you're thinking about the AI and it's surfacing information about a legal document, and it might find, you know, nine out of ten, it did its job nine out of ten times and then there's, you probably have competitors in the field, and maybe they're working at, you know, it works seven out of ten times, how much of a gap is that? Or how do you think about that? You know, both from a technology and business perspective?

Emad: [30:30] this is a huge gap. It might sound small on paper, but it's huge for two reasons. The first reason is practical. If somebody comes with you with a million pictures of dogs and cats, and you have two models, one that's gonna get 90% of them correctly, and one 70%, so the one that gets only 70%. And you will also have to look at three-hundred thousand. So, you made a dent of the problem. But tying like baking their model into your process is expensive. So, unless the ROI is huge, it's sometimes it's not even worth to use AI, you know, what if I'm going to use three-look at three-hundred thousand. Instead of like taking the dependency on this mysterious black box, I'll just kind of do all the whole million, but now do a 90% of them correctly. And now the human has to only do a hundred thousand. This is an order of magnitude improvement from a million to a hundred thousand It sounds like clicking and oh, there's something here, it's probably worth me taking the time to integrate AI into my process. And to read these benefits. This is the practice concentration. There's also the psychological consideration. We just found in practice, if a user is looking at a model, and it's getting four answers wrong out of or three out of every ten, they quickly lose trust. And they start like having doubts. Oh, so they got four wrong, do it trust it with the other six or seven? Should I now go and check everything? Like, I'm actually not sure I want to use this AI thing. I don't really trust it. But if you like have one wrong and every ten users are like starting to oh, this is like magic. Really, it's great. I got this one wrong. But also, when this happens, it's usually the one that your model got wrong is wrong for a reason. It's usually very hard. So, the user will justify it, they're going to look at it. Oh, I can see why the model got it wrong. There's like weird handwriting. There's like a poor scan There's like a coffee stain on the picture. I can-I feel your model, okay, it's fine. Let's move on. So, this psychological factor can be very impactful in whether the user will adopt to your model or not.

Dan: [32:22-32:38] For the psychological factor, what is the bar where they lose trust or gain trust?

Emad: [32:29-33:06] It's problem specific, some things like there's just there's very little margin of error. So, if you're gonna, like, tell them like some important legal insight, you better either get it right, or they're gonna want to, they want the expected very high accuracy. Otherwise, it's just not worth the risk for them. But you know, what, if you're extracting maybe the counterparty name, and sometimes you get it, right, but sometimes you forget, like parts of the name. So instead of like saying Coca Cola, and you just say Coca Cola, it's not the end of the world, they can still kind of use it. And if you miss it, they can easily fix it, there isn't a huge risk factor involved. So, it really depends on the problem itself. And that's why it's key to understand the potential accuracy of your model and then go after the right problem.

Dan: [33:07-33:21] And when you're thinking about, I guess, the pipeline or the cost, what does it look like to set up I guess, a data pipeline that’s useful with AI? And how do you think about the costs for your business of that?

Emad: [33:22-35:17] It's a big cost from like an engineering perspective. And from a like literal hardware cost. A lot of like the cloud providers were in the early days of cloud and AI. Now, I think everybody now most people are now running like normal, non ml workloads in the clouds, were still not there with AI. So, all the cloud providers are like offering platforms, but they're very early and they don't solve, they're like very generic, and you still have to build a lot of like customized tooling for your own domain. So, the cost is still very high, you have to still kind of understand your problem and probably write a lot of custom tooling, or at least stitch a lot of like open source together in like an awkward way. And it takes time to get this right. The pipeline is also in our field of document understanding. It's not like one model. When you take a big document, you do a lot with it. First, you have to OCR it, you have to segment it, you have to classify different sections, you have to kind of split it into agreements, sometimes, like a single PDF have multiple contracts within it, you have to like drop noise, remove headers and footers, you have to have like a lot of models extract key pieces. So, think about something as simple as I want to understand if a contract is active or inactive, even to answer this very simple question. You have to go get when did the contract start? How long? Is it Sturm? Is it going to auto renew or is it a fixed term? Or is it perpetual? If it's got to auto renew for how many terms? How long is everyone who has the right to terminate? You have to extract all these things? And then you can answer the very simple question of active or inactive like you get the final date. And then oh, if it's in the future, it's active is the past it's inactive. Now imagine like how many models you have to build for like more sophisticated things. I'm starting with a very simple example. So to begin Build like hundreds of models and orchestrate them together and have them like take dependencies on each other, you have to have a fairly sophisticated data pipeline where you can kind of like version, every node of this graph I explained. And under like, version, the inputs and the outputs, and be able to deploy train parts of it, train all of it, and do all that kind of cool stuff.

Dan: [35:18-35:46] Thanks for walking us through, you know, kind of how you think about it, and the team's set up, and some of the challenges with cost, there's a lot to unpack there. I'm gonna move us on to another topic around making customers happier through AI. I've heard you say before, I don't think the customers care at all about AI. They care about solving the actual problem and getting better at their job. What do you mean by that?

Emad: [35:47-37:16] Not everybody agrees with me. Although I do agree with that sentence very much. I agree. I said that, essentially, customers will talk to you about AI at the very like, they've got to take the meeting, because AI is attractive and shiny, and they're like intrigued. But once you pass this level, the is identity business users want to solve business problems, they if you're not like solving the problem, AI or not, it doesn't matter. And they don't really care. If you're solving the problem with AI, or have an army of people actually doing the work behind the curtains and calling it AI. You know, what, as long as you're solving the problem creates, what they do care about is the constraints that exist with the problem. So having a huge army of people solving the problem behind the curtain is going to be very expensive, because payroll will be expensive, gonna be very slow, because humans are not as fast as machines, it has very serious privacy considerations, because customers don't want to end and people all over the world looking at their data and the contracts the most sensitive documents in their company probably. And then finally, accuracy is going to fluctuate a lot. Because users are going to fluctuate, if you have like an expert human in the pool of people answering, they might have great answers, and then they go on vacation somebody else comes in, and then the good quality drops. Machine Learning models are very deterministic computers are just always going to give you the same consistent answer. So basically, AI solve these four problems it solves it's fast, it's cheap, it's accurate, and it doesn't have biases, and the only key or information ever. So customers are very happy about solving the problem with this kind of space.

Dan: [37:17-37:28] You've also spoken about the difficulties of retrofitting AI, back to a problem once the AI is already built. Can you talk to us about that concept?

Emad: [37:29-38:41] 100%, it's very hard, it's not impossible to retrofit it to like sprinkle the AI on an existing applications. I've seen it done, I've worked in teams that did it, I've seen the pain of doing it. The reason is, it is that once you build an experience without a product without AI, you just are missing some key UX constructs that make it very hard to now plug in the AI. For example, if you have an existing system with like fields, and you now want to populate the fields with AI, you might not have the right data model to do that, you probably don't have the right data model to capture user feedback. So maybe you have the field value, but you don't have the field confidence, maybe you don't have a way to capture when the user goes and changes the value to implement back to your training system. Also, if you are used on humans, like going and editing the value and putting a field with data entry, without AI, then you probably didn't build all the machinery on exposing the reason and explaining where the value came from now that you blood AI, so kind of like going back and building all this UX. And working on one of the most critical pieces of an AI system, which is taking big feedback data back from users to your machine learning platform is a huge undertaking, it's so much better if you kind of like know that you're going to use AI and then build experience around it, it can save you years of investments.

Dan: [38:42-39:08] If I'm in a situation that I'm working at a company that's been around for a long period of time, or the product is actually fairly-fairly successful. And I am asked, okay, this AI thing, it's totally happening, engineers go figure it out, go apply it to our product. What would you do in that situation to make it you know, as efficient as possible to get the AI up and running and do something positive for the product?

Emad: [39:09-39:57] Yeah, I would love-love to be in these shoes. Although I realized practically a lot of people find themselves in this situation. I think I will try to have people that work into trucks have people try to work on the models. But in parallel, I'll have people like redesign the UX, and kind of like try to bridge this gap so that you don't end up what happens the practice, they now spin off like a very sophisticated, capable machine learning team. They go on like work on great models, but it takes forever to plumb them together. So I will instead of doing that, make sure that we're making progress on both like redesigning all of our UX and have this feedback loop operational updating all our data models, all our pipelines, and in parallel to kind of like having data scientists actually trained neural networks and like decision trees, etc. You can do just one of them.

Dan: [39:57-40:36] Yeah, you know, it reminds me of I think I was watching a little clip of Steve Jobs, talking about products and that type of thing. He said something like, it always goes poorly, if you have an interesting technology, so you have a cool technology. And then you just apply it to a product that that always goes badly, because you're doing it in the opposite way. Correct. So like, going off of his advice, which I think is true, I think like, the first thing I would do is say, really, what problem are we trying to solve? And then does AI actually fit into this instead of the other way around?

Emad: [40:37-41:00] 100%. And this is something we do a lot, we call them Wizard of Oz studies. So before writing a single line of AI, build the right view X that assumes AI and have some wizards, like humans behind the curtain, pretend their AI as a very cheap, like AB test, experimental environment, try it out, see if it's resonating. And then once you make sure that this is working, okay, then you know that you can go and build the next.

Dan: [41:00-41:23] So there's maybe two different types of companies that are maybe utilizing AI, you have these big companies, Google, Microsoft, Amazon, and then you have cool up and coming, you know, startups like your company, and I'm sure there's a bunch of other ones doing stuff with it. What do you see is the differences, like, what are these big companies doing with AI versus, you know, maybe the smaller startups?

Emad: [41:25-43:43] I think big companies, they have huge resources, and they can attract very, very good talent, AI talent, they're just inherently really good at solving generic broad platform type products, like these companies are motivated not to go after a single use case. They are like one to build like a shared reusable model that plugs into hundreds of use cases. Going back to my OCR example, a big company is probably better off like making a few cents, doing OCR on like, every single document, I guess, OCR on the planet, versus like going crazy, like dollars on a much like, smaller slice of like the market within a sort of vertical. So, they're like DNA is about like scale. And I want to build the platform, I want to enable other developers to come and build experiences on my platform, I'm not like gonna go spend time and hire like a dedicated expertise in every vertical and go do it, I am seeing a shift like I'm starting to see like, for example, AWS now is going into like medical NLP. So, this is an interesting shift. I know that other cloud providers are starting to do like some, like lightweight, specialized AI experiences, but still very nascent. And I don't think they go too deep enough for it to, like, offer an end-to-end solution, somebody still has to build on top. Startups, on the other hand, are really good at like solving the problem of like, vertically all the way they can kind of like, it's all like building a platform, you can do this as a startup, like how are you going to make the money you need to revenue, it's, you basically need to start building something that's strong traction and value very quickly. You need it for survival, really, if you build a platform and wait for like some other startup to come in and build on top of it, then you're putting your old your faith in somebody else's hands. So, you have this very strong motivation to go in and actually understand as a domain, ideally, a big enough domain, you don't want to be locked in a small market or a very niche market either. But you want to have like a substantial domain where you can go so deep in. And this is just not going to be attractive for like the big type companies. Let them like do the heavy machinery. Sure, let's delegate like, I don't want to, I don't want to be in the business of like building like very low-level ml platforms. If a cloud provider can give me something I can use, I will happily use it and give them part of my pie. That's fine. Unfortunately, they're not there yet, like we evaluated every year, and we still end up having to build our own stuff. I think they will eventually get there and there'll be really good at it. But yeah, as a startup, now we just have to kind of like go deep and go specialized.

Dan: [43:44-44:09] That totally makes sense to me. I like to try to ask kind of like an interesting question or just a fun, you know, question for you, Ahmad. If you're able to decide, would you rather be the founder of a company in the early days of computers or early days of the internet? Or would you rather be a CEO of a unicorn company in modern times now?

Emad: [44:10-45:21] I love that question. I think my answer to that question will change during like different points in my career. I'm at this early stage of my career still where I read your question as the do you prefer, like with the wisdom or learning or do you prefer like riches and influence? I'm at a point where I really appreciate learning and like wisdom more, I want to learn I want to get my hands dirty. I really, really enjoy building products and teams and like working with customers and being like Part, Part leader, but also parts individual contributor and like, building stuff with my hands. I'm at that stage. I also think it's essential to do these things to get eventually to potentially becoming like a CEO of a unicorn. Of course, I'm not a fool. I completely would love to do that as well. I think at a later stage I would really enjoy it to kind of like now, if I have the wisdom that we take it and have huge influence and huge impact on the world and for Hopefully someday even legacy CEOs of unicorns can kind of like really achieve a lot, they have a lot of tools at their disposal, their decisions can like make the world a better place and impact millions of people. So yeah, and in short, in short summary, I think now I'd prefer to be a founder of a startup, maybe in ten to twenty years, I would very much love to be a CEO of a unicorn, or even like a fortune 500 company, who knows.

Dan: [45:22-46:32] It's interesting, I was thinking about this last night, when I was doing some prep for the pod and what I would rather be, I'm leaning towards kind of the early days of computers or early days of the internet versus kind of a unicorn co now, it could be that my opinions a little bit shaded because of some recent things that are happening. But it seems like early on, it was kind of more about innovation, or discovering something new with computers, or, you know, getting more common people even just access to the internet or computers. And it was if it's a little more creative, and a little more good-hearted to make some of the things that you see now and you'll see like the-all these like Netflix documentaries about it like to be a unicorn CEO, how many times can I get someone to come onto my social media platform? And like, you know, sell that you're like selling addiction? Or how, like I was saying earlier in the episode, like, how many ads can I serve up? That's the right and it seems like a little bit less good hearted. So, for me, it would be a little bit more of that inventor type in the earlier days, if I had to say.

Emad: [46:33-46:37] Completely agreed. Yeah, this is a very great way to look at it as well. And I do agree with you.

Dan: [46:37-47:01] The last area that I wanted to talk to you about and this has been I really appreciate you coming on here and talking to us about AI and talking to us about what you're doing at Lexia. And-and your career journey. If there's engineers out there listening to this pod, and they wanted to be like you and kind of get into this field of AI, what do you think's the best way to kind of break through into the field?

Emad: [47:03-47:48] there's been a lot of great tools. Now open source, like, there's a course like you can go to like fast AI, open AI has a lot of courses. Coursera is like there's no shortage of material, I really kind of if it's somebody would like a computer engineering background, there's a lot of coursework out there that helps you bridge the gap and get into like pragmatic and practical machine learning. And then you can go a lot deeper later if you really enjoyed it. So that's what I encourage people to do get into like one of these, kind of like, machine learning for coders type solutions. And if you end up liking it, because you might not like I mean, it's-it's a very compelling field, but it has its challenges. You have to be-enjoy math and be really have a certain skill set. If you like it, then yeah, go all in, you can start like getting a PhD or something later, or build a startup in the field, which will also be a great way to learn.

Dan: [47:49-48:01] And, you know, thank you for that advice. I know kind of related to it, you're doing some hiring, hopefully, on your team. In particular what do you got going for career opportunity at Lexion?

Emad: [48:02-48:37] We have a lot of very exciting roles. So, we're hiring like in every part of the engineering team. So we need like full stack application developers to build beautiful experiences with people who want to work closer to the ML, we have, want people who kind of like glue the systems together, like really good back end engineers, front end engineers, we're really hiring for like all your typical roles right now because we are in a growth phase and our customer base is growing and have a lot of demands. And my immediate goal is to kind of accelerate model development, but also accelerate application development. So full stack engineers, ML engineers are the core areas.

Dan: [48:38-49:29] Okay, awesome. So, you know anyone listening if you are interested in a new opportunity with a mod, got some good stuff going with application engineers, but also on the modeling side on the AI so you know, we'll include some links. [Music fades in] Also, a quick reminder for our listeners. If you haven't already rated and reviewed the show on your podcasting app of choice, particularly Apple podcast, please do so. I also want to say thank you to the more than two-thousand of you who are now subscribed to our weekly interruption newsletter. We bring you articles from the community inside information, weekly cards and also very important the first look at interact two dot o on April 7th, 2022. Okay, well include all the information in the links below of what we talked about today. And Ahmad, one more thank you to you. Thanks for coming on the pod today.

Emad: [49:29-49:33] Thank you so much then and the whole team was really a pleasure talking and I appreciate you having me.

[Music fades out]