Search

SE Radio 563: David Cramer on Error Tracking

In this episode, David Cramer, co-founder and CTO of Sentry, joins host Jeremy Jung for a conversation about error tracking. The discussion starts with treating performance problems as errors, why you might not need logs, and how most applications share the same problems. From there they consider other topics including capturing information by hooking into runtimes and frameworks, issues with the quality of Open Telemetry data, how front-end applications are constantly changing and why that makes them hard to instrument. Finally, they discuss how Sentry’s architecture has evolved, and why they switched from a permissive license to the Business Source License.


Show Notes

Links

Transcript

Transcript brought to you by IEEE Software magazine.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.

Jeremy Jung 00:00:16 Today I’m talking to David Cramer, he’s the founder and CTO of Sentry. David, welcome to Software Engineering Radio.

David Cramer 00:00:25 Thanks for having me. Excited for today’s conversation.

Jeremy Jung 00:00:28 I think the first thing we could start with is defining what Sentry is. I know some people refer to it as an error tracker. Some people have referred to it as an application performance monitoring tool. I wonder if you could kind of describe in your words what it is.

David Cramer 00:00:47 You know, as somebody who doesn’t work in marketing, I just tell it how it is. So Sentry started out doing error monitoring, which you know, depending on who you talk to, you might just think of as logging, right? Like that’s the honest truth. It is just logging; just a different shape or form these days. It’s hard to not classify us as just an APM tool that’s like the industry that exists. It’s like the tools people understand. So I would just say it’s an APM tool, right? We do a bunch of things within that space and maybe it’s not item-for-item the same as, say, a product like New Relic, but a lot of the overlap’s there. So it’s like errors performance, which is like latency and sort of throughput. And then we have some stuff that just goes a little bit deeper within that. The one thing I would say that is different for us versus a lot of these tools is we actually only do application monitoring. So we don’t do any SY like systems or infrastructure monitoring, meaning Sentry is not going to tell you when you need to replace a hard drive or even that you need new hard, like more disk space or something like that because it’s just, it’s a domain that we don’t think is relevant for sort of our customers and product.

Jeremy Jung 00:01:48 For people who aren’t familiar with the term application performance monitoring, what is that compared to just error tracking?

David Cramer 00:01:56 The way I always reason about it, this is what I tell new hires and what I would tell like my mother if I had to explain what I do is like you load Uber and it crashes. We all know that’s bad, right? That’s air monitoring. We capture the crash report, we send it to developers, you load Uber and it’s a 32nd spinner, like a loading indicator as a customer, same outcome for me. I assume the app is broken, right? So we also know that’s bad, but that’s different than a crash. Okay. Central captures that same thing, incentive developers. Lastly, the third example we use, which is a little bit more, I think untraditional, but our non-traditional rather, you load the Uber app and it’s like a blank screen or there’s no button to submit, like log in or something like this. So it’s kind of like a, it’s broken but it maybe isn’t airing and it’s not like a slow thing, right? Same outcome. It’s probably a bug of some sorts. Like it’s what an end user would describe it as a bug. So for me, APM just translates to there are bugs, user perceived bugs in your application and we’re able to monitor and and help the software teams sort of prioritize and resolve those, those concerns.

Jeremy Jung 00:02:56 Earlier you were talking about actual crashes and then your second case is may be more of if the app is running slowly then that’s not necessarily a crash, but it’s still something that an APM would monitor.

David Cramer 00:03:11 Yeah, yeah. And I, I think to be fair, apm historically it’s not a very meaningful term. Like I as a, when I was more of just an individual contributor, I would associate APM to like there’s a dashboard that will tell me what’s slow in my application, which it does and that is kind of court apm, but it would also, none of the traditional tools, precent would actually tell you why it’s broken, like when there’s an error, a crash, it was like most of those tools were kind of useless. And I don’t know, I do actually know, but I’m gonna pretend I don’t know about most people and just say for myself. But most of the time my problems are errors. They are not like it’s fast or slow, you know? And so we just think of it as like it’s a holistic thing to say when I’ve changed the application and something’s broken or it’s a bug, you know, what is that bug?

David Cramer 00:03:52 How do we help people fix it? And that comes from a lot of different like data signals and things like that. The end result is still the same. You either are gonna fix it or it’s not important and you ignore it. I don’t know. And so it’s a pretty straightforward premise for us. But again, most companies in the space, like the traditional company is when you grow a big company, what happens is like you build one thing and then you build lots of check boxes to sell more things. And so I think a lot of the APM vendors, like they’ve created a lot of different products. Like RUM is a good example of another acronym that lives with an apm. And I would tell you RUM is completely meaningless. It, it stands for real user monitoring. And so I’m like, well what’s not real about monitoring the application? Well, nothing’s not real, but like they created a new category because that’s how marketing engines work. And that new category is more like analytics than it is like application telemetry. And it’s only because they couldn’t collect the app, the application telemetry at the time. And so there’s just a lot of fluff I would say. But at the end of the day too, like developers or engineering teams, it’s like new version of the application. You broke something, let’s tell you about it so you can fix it.

Jeremy Jung 00:04:51 And so earlier you were saying how this is a kind of logging, but there’s also other companies, other products that are considered like logging infrastructure. Like I I think of companies like Paper Trail or Log Tail. So what space does Sentry fill that’s different than that kind of logging?

David Cramer 00:05:13 So the way I always think about it, and this is personally true and what I advise other folks is when you’re building something new, when you start from zero, right? You can often take Sentry put it in and that’s good enough. You don’t even need performance modern, you just need like errors, right? Like you’re just causing bugs all the time. And you could do that with logging, but like the delta between air monitoring logging is night and day From a user experience like air monitoring for us or what we’ve built at the very least aggregates the errors. It, it helps you understand the frequency. It helps you when they’re new versus old. It really gives you a lot of detail where logs don’t. And so you don’t need logging often. And I will tell you today at Sentry engineers do not use logs for the most part.

David Cramer 00:05:49 I had a debate with one of our team members about it, like why does he use logs recently? But you should not need them because logs serve a different purpose. Like if you have traces which tell you like like fast and slow and a bunch of other network data and you have this sort of crash report collection or error monitory thing, logs become like a compliance or an audit trail or like a security forensics tool and there’s just not a lot of value that you would get out of them otherwise, like once in a while maybe there’s like some weird obscure use case, but generally speaking you can just pretend that you don’t need logs most days. And to me that’s like an evolution of the industry. And so when, when Sentry is getting started, most people are still logs. And if you go talk to SRE teams, they’re like, oh, login is what we know.

David Cramer 00:06:29 Some of that’s changed a little bit, but, but at the end of the day they should only be needed for more complicated audit trails because they’re just not a good solution to the problem. It’s just free form data structured or not, doesn’t really matter. It’s not aggregated, it’s not something that you can really use. And it’s why whenever you see logging tools, not even the paper trails of the world, but the bigger ones like Splunk or Cabana, it’s like this weird what we describe as choose your own adventure, like go have fun, build your dashboards and try to make the locks useful kind of story. Whereas like something like Sentry, it’s just like why would you waste any time trying to build dashboards when we can just tell you when something new is broken? Like that’s the ideal situation.

Jeremy Jung 00:07:07 So it sounds like maybe the distinction is with a more general logging tool, like you mentioned Splunk and Keana, it’s a collection of all this information of things happening even though nothing’s necessarily wrong. Whereas Sentry is it’s going to log things, but it’s only going to log things if Sentry believes something is wrong either because of a crash or because of some kind of performance issue.

David Cramer 00:07:34 Yeah, I I would say it’s about like actionability, right? Like nobody wants to spend their time digging through logs, digging through dashboards, metrics are another good example of this. Like just charts with metrics on them. Yeah, they tell me something’s happening if there’s lots of log statements they tell me something’s going on, but they’re not, they’re not optimized to like help me solve a problem, right? And so our philosophy was always like, we haven’t necessarily nailed this in all cases for what it’s worth, but it was like the goal is we identify an actual problem like close to like a root cause kind of problem and we escalate that up and that’s it. Versus asking somebody to like go have to like build these dashboards, build these things, figure out what data matters and all this because most software looks exactly the same. Like if you have a web service, it doesn’t matter what language it’s written in, it doesn’t matter how different you think your architecture is from somebody else’s, they’re all the same.

David Cramer 00:08:22 It’s like you’ve got a request, you’ve got a database, you’ve got some cash, you’ve got all these like known known quantity things and the slowness comes from the same places, the errors come from the same places, they’re all exhibiting the same kinds of behavior. So login is very unstructured. And what I mean by that is like there’s no schema. Like you can hypothetically like make it J S O N and everybody does that, but it’s still unstructured. Whereas like errors, it’s, it’s a tight schema. It’s like there’s a type of error, there’s a message for the error, there’s a stack trace, there’s all these things that you know, right? And as soon as you know and you define those things, you can just build better products. And so distributed tracing is similar. Hypothetically, it’s a little bit abstract to be fair, but hypothetically distributed tracing is creating a schema out of basically network annotations.

David Cramer 00:09:05 And somebody will yell at me for just simplifying it to that. I would tell ’em that’s what it is. But same goal in mind. If you know what the data is, you can take action on it. It’s not quite entirely true because tracing is much more freeform. For example, it doesn’t say if you have a SQL statement, it should be like this, it should be formatted this way, things like that. Whereas like stack traces, there’s a file name, there’s there’s a line number, there’s like all these things, right? And so that’s how I think about the delta between what is useful information and what isn’t I guess. And what allows you to actually build things like Sentry versus just build abstract exploration tools.

Jeremy Jung 00:09:39 So kind of paint the picture of how someone would get started with a tool like Sentry. Do they need to tell Sentry anything about their application? Do they need to modify their source code at all? Give us a picture of how that works.

David Cramer 00:09:54 Yeah, like one of our fundamentals, which I think applies for any real business these days is you’ve gotta like reduce user friction, right? Like you’ve gotta make it dead simple to use. And for us there was like kind of a fundamental driving constraint behind that. So in many situations, APM vendors especially will require you to run an agent a basically like some kind of process that runs on your servers somewhere. Well if you look at modern tech stacks, that doesn’t really work because I don’t run the servers half my stuff’s in the browser or it’s a mobile app or a desktop app. And even if I do have those servers, it’s like an entirely different team that controls them. So deploying like a sidecar an agent is actually like much more complicated. And so we looked at that and also because like it’s much easier to have control if you just ship within the application.

David Cramer 00:10:38 We’re like, okay, let’s build like an SDK and dependency that just injects into the, the application that runs, set an API key and then you’re done. And so what that translates for Sentry is we spend a lot of time knowing what jengo is or what Rails is or what expresses like all these frameworks and just knowing how to plug into the right signals in those frameworks. And then at that point, like the user doesn’t have to do anything. And so like the ideal outcome for Sentry is like you install the dependency in whatever language makes sense, right? You somehow configure the a p I key and maybe there’s a couple other minor settings you add and that gives you the bare bones and that’s it. Like it should just work from there. Now there’s a lot you can do on top of that to enrich data and whatnot, but for the most part, especially for errors, like that’s good enough and that that’s always been a fundamental goal of ours and I, I think we actually do it phenomenally well.

Jeremy Jung 00:11:26 So it sounds like it infers things about the application without manual configuration. Can you give some examples of the kind of things that Sentry knows without the user having to tell it?

David Cramer 00:11:40 Yeah, so a good example on there is side, we know literally everything because an error object in each language has all these attributes with it it it gives you the sack trace, it gives you a lot of these things. So that one’s straightforward. On the performance side, we use a combination of leveraging some like open source I guess implementations like open telemetry where it’s got all this instrumentation already and we can just soak that in as well as we automatically instrument a bunch of stuff. So for example, say you’ve got like a Python application and you’re using let’s say like SQL AkaMy or something. I don’t actually know if this is how our s e K works right now, but we will build something that’s aware of that library and make sure it can automatically instrument the things it needs to get the right information out of it.

David Cramer 00:12:20 And be fair, that’s always been true for like APM vendors and stuff like that. The delta is, we’ve often gone a lot deeper. And so for Python for example, you plug it into an application, we’ll capture things like the error error object which is like exception class name exception value, right? Stack trace, file name, line number, all those normal things, function name, we’ll also collect source code. So we’ll, we’ll give you sort of surrounding source code blocks for each line in the stack trace, which makes it infinitely easier to consume. And then in Python and and php, and I forget if we do this anywhere else right now, we’ll actually even allow you to collect what are called stack locals. So it’ll, it’ll give you basically the variables that are defined almost like a debugger. And that is actually actually like game changing from a development point of view.

David Cramer 00:13:01 Because if I can go look in production when there’s an incident or a bug and I can actually see the state of the application, I never need to know like, oh what was going on here? Oh, what if like do I need to go reproduce this somehow? I always have the right information. And so all of that for us is automatic and we only succeed like it, it’s by definition inside of Sentry it has to be automatic. Like if we ask the user to do anything whatsoever, we’re failing. And so whenever we design any product or anything, and to be fair, this is how every product company should operate, it’s gotta be with as little user input as humanly possible. And so you can’t always pull that off. Sometimes you have to have users configure stuff, but the goal should always be no input.

Jeremy Jung 00:13:41 So you, you’re talking about getting a stack trace, getting the state of variable source code, that sounds like that’s primarily gonna be through unhandled exceptions. Would you say that’s the primary way that you get errors?

David Cramer 00:13:56 Yeah, you can integrate in other ways. So you can like trigger our API to capture an an exception. You can also, for better or worse, it’s not always good. You can integrate through login adapters. So if you’re already using the login framework and you log their errors there, we can often capture those. However, I will say in most cases people use the login APIs wrong and the data becomes junk. A a good example of this is like, it varies per language. So I’m just gonna go to Python because Python is like sort of quarter Sentry in Python. You have the ability to log messages, you can log them as errors, you can log like actual error objects as errors. But what usually happens is somebody does a try-catch, they capture the error, they rescue from it, they create a logging call like log.error or something, put the error message or value in there, and then they send that upstream. And what happens is the stack trace is gone because we don’t know that it’s an error object.

David Cramer 00:14:45 And so, for example, in Python there’s actually a flag you pass to the logging call to make sure that stack trace stays present. But if you don’t know that, the data becomes junk all of a sudden, and if we don’t have a stack trace, we can’t actually aggregate data because there’s just not enough information to run hashing on it. So there are a lot of ways, I guess, to capture the information, but there are like good ways and there are bad ways. And I think it’s in everybody’s benefit when they design their app to build some of these abstractions. And so, like as an example, whenever I would start a new project these days, I will add some kind of helper function for me to like log an exception when I like try catch and then I can just plug in whatever I need later if I want to enrich the data or if I want to send that to Sentry manually or send it to logs manually. And it just makes life a lot easier versus having to go back and augment every single call in the code base, you know.

Jeremy Jung 00:15:31 So it, it sounds like when you’re using a tool like Sentry, there’s gonna be the the unhandled exceptions, which are ones that you weren’t expecting. So those should I guess happen without you catching them. And then the ones that you perhaps do anticipate but you still consider to be a problem, you would catch that and then you would add some kind of logging statement to your code that talks to Sentry directly

David Cramer 00:15:57 Potentially. Yeah, it becomes a personal choice to be fair at that point, one of the ways we’ve been thinking about this lately because we’ve been changing our error monitoring product to not just be about errors. So we call it issues and that’s in the guise of like, it’s like an issue tracker, a bug tracker. But we’ve started putting what are effectively like almost like static analysis concerns inside of this issue tracker. So for example, in our performance monitor we’ll do something called like detect n plus one queries, which is where you execute a a duplicate query in a loop. It’s not necessarily an error, it might not be causing a problem but it could be causing a problem in the future. But it’s like, you know, the qualities of it are not the same as an error. Like it’s not necessarily causing the user to experience a bug.

David Cramer 00:16:36 And so we’ve started thinking more about this and, and this is the same as like logging errors that you handle. It’s like, well they’re not really error, they’re not really bugs, it’s like expected behavior but maybe you still want to keep it like tracking somewhere. And I think about like, you know, Lins and things like that where it’s like, well I’ve got some things that I definitely should be fixing then I’ve got a bunch of other stuff that’s like informing me that maybe I should take action on or not. But only I the human can really know at the end of the day, right, if I if I should prioritize that or not. And so that’s how I kind of think about like if I’m gonna try catch and then log is like yeah you should probably collect that data. It’s probably less important than like the these other concerns like like an actual unhandled exception. But you do want to know that they’re happening and whatnot. And so I dunno, Sentry has not had a strong opinion on this historically. We’re just like, send us whatever you wanna capture in this regard and you can pay for it, that’s fine, it’s like usage based, you know. But we’re starting to think a lot more about what should that look like if we, if we go back to like what’s the, what’s the opinion we have for how you should use the product or how you should solve these kinds of software problems.

Jeremy Jung 00:17:34 See you gave the example of detecting n plus one queries. Is that like being aware of the framework or the om the person is using and that’s how you’re determining this? Or is it at more of a lower level than that?

David Cramer 00:17:49 It is, yeah, it’s at the framework level. So this is actually where Open telemetry causes a lot of harm for us because we need to know what a database query is. We need to know like the structure of the query because we actually wanna parse it out in a lot of cases cause we actually need to identify if it’s duplicate, right? And we need to know that it’s a database query, not a random annotation that you’ve added. And so what we do is within these traces, which is like if you, if you don’t know what a trace is, it’s basically just like it’s a tree, like a tree structure. So it’s like A calls B calls C, B also calls D and E and et cetera, right? And so you just, you know it’s a trace and so we actually just look at that trace data, we try to find these patterns which is like, okay B was a a sequel query or something and every single sibling of B is that same sequel query but sort of removing certain parameters and stuff the value.

David Cramer 00:18:35 So we’ll look at that data and we’ll try to pull out anomalies. So m plus one is an example of like a fairly obvious anti pattern that everybody knows is bad and can be optimized, but there’s a lot of other that are a little bit more subjective. I’ll give you an example. If you execute three SQL statements back to back, one could argue that you could just batch those SQL statements together. I would argue most of the time it doesn’t matter and I don’t need to do that. And also it’s not guaranteed that that is better. So it becomes much more like well in my particular situation this is valuable but in this other situation it might not be. And that’s where I go back to like, it’s almost like a linter, you know? But we’re trying to infer all of that from the data stream.

David Cramer 00:19:10 So, so Sentry’s kind of, we’re kind of a backwards product company. So we build our product from a technology vision, not from customers want this or we have this great product vision or anything like that. And so in our case, the technology vision is like, there’s a lot of application data that comes in a lot of telemetry, right? Errors, traces, we have a bunch of other streams now within that telemetry there is like signal and so one, it’s all structured data so we know what it is, we can actually interpret it and then we can identify that signal that might be a problem. And that signal in our case is often going to translate to like this issue concept. And then the goal is like, well can we identify these problems for people and surface them versus the choose your own adventure model, which is like we’ll just capture everything and feed it to the user and they can figure out what matters. Because again, a web service is a web service, a database is a database, they’re all the same problems for everybody. All you know, it’s just, and so that’s kind of the model we’ve built and are continuing to evolve on and and so far works pretty well to, to curate a lot of these workflows.

Jeremy Jung 00:20:10 See, you talked a little bit about how people will sometimes use tracing and in cases like that they may need some kind of session ID to track somebody making a call to a service and that talks to a database and that talks to other services and inside of your application you have to instrument some way of tracking. This all came from this one request. Is that something that Sentry can infer or is there something that the developer has to put into play so that you can track that sort of thing?

David Cramer 00:20:44 Yeah, so it’s, it’s like a bit of both and I would say our goal is that we can infer everything. The reality is there is so much complexity and there’s too much of a, like too many technologies in the world. Like I was complaining about this the other day, like a classic example in web service is if we have a middleware hook, we kind of know request response usually that’s how middleware would work, right? And so we can infer a lot from there. Like basically we can infer the boundaries, which is a really big deal. Okay, that’s one thing is boundaries is a problem. What we, we describe that as a transaction. So like when the request starts, when the request ends, right? That’s a very important boundary for everybody to understand because when I’m working on the api, I care about the API boundary, I actually don’t care about what the database is doing at its low level or what the JavaScript application might be doing above it.

David Cramer 00:21:28 I want my boundaries. So that’s one that we kind of can do. But it’s hard in a lot of situations because of the way frameworks and technology has been designed, but at least traditional stuff like a traditional web stack, it works like a Rails app or a Jeno app or a p h app kind of thing, right? And then within that it becomes, well how do you actually build a trace versus just have a bunch of arbitrary labels? And so we have a bunch of complicated tech within each language that tries to establish that tree and then we annotate a lot of things along the way. And so we will either leverage open Telemetry, which is an open format spec that ideally has very high quality data, ideally not realistically, but ideally it has high quality data. Every library author implements it great, everybody’s happy.

David Cramer 00:22:10 We don’t have to do anything ever again. The reality is that data is like all over the map because there’s not like strict requirements for what, how the data should be labeled and stuff And not everything even has that data. Like not everything’s instrumented with open telemetry. So we also have a bunch of stuff that unrelated to using that will say okay, we know what this library is, we’re gonna try to infer some characteristics from this library or we know what maybe like the Django template engine is. So we’re gonna try to infer like when the template renders so you can capture that block of information. It is a very imperfect science and I would tell you like it’s not, even though like open telemetry is a very fun topic for people, it is not necessarily good. Like it’s not in a good state could will it ever be good?

David Cramer 00:22:53 I don’t know in all honesty but like the data quality is like all over the map. And so that’s honestly one of our biggest challenges to making this experience that you know tells you what’s going on in your database so, or tells you what’s going on with cash or things like this is like, I dunno, the cash might be called something completely random in one implementation and something totally different in another. And so it’s a lot of like, like data normalization that you have to deal with. But for the most part those libraries that things you don’t control can and will be instrumented. Now the other interesting thing, which we’ll see how this works out. So one thing Sentry tries to do there we have all these layers of telemetry, so we have errors and traces, right? Those are pretty high level concepts. We also have profiling data which is very, very, very, very low level.

David Cramer 00:23:36 So it’s usually only if you have like disk I like it’s where is all the CPU time being spent in my application mostly not waiting like waiting’s usually like a network call, right? But it’s like okay I have a loop that’s doing a lot of math or I’m writing a bunch of stuff to disk and that’s really slow. Like often those are not instrumented or it’s like these black box areas of a performance. And so what we’re trying to do with profiling data instead of just showing you flame charts and stuff is actually say can we fill in these gaps in these traces? Like basically like hey I’ve got a long period of time where the app’s doing something. You know, here’s an API call, here’s the database stuff. But then there’s this block, okay, what’s that function or something, can we pull that out of the profiling data?

David Cramer 00:24:15 And so in that case, again that’s just automatic because the profile actually knows everything about the application and note it, it has full access to the function and the stack and everything, right? And so the dream is that you would just always have everything filled in. The customer never has to do anything with one minor asterisk. And the asterisk is what I would call like business context. So a good example would be you might wanna associate requests with a specific customer or something like that. Like you might wanna say well it’s, I don’t know, Goldman Sachs or one of these big companies or something. So you can know like well when Goldman Sachs is having performance issues or whatever it is, oh maybe I should focus on them cuz maybe they pay you a lot of money or something, right? Since you would never know that at the end of the day. So we also have these like kind of tagging contextual APIs that will say like tell us some informations, maybe it’s like customer, maybe it’s something else that’s relevant to your application and we’ll keep that data associated with the telemetry that’s like present, you know, but the, at least the telemetry, like again application’s just worth the same. There should be a day in the next few years that it’s just all automatic. And again the only challenge today is like can it be high quality and automatic? And so that, that’s like to be determined

Jeremy Jung 00:25:27 What you’re kind of saying is the ideal is being able to look at this profiling information and be able to build a full picture of a call from beginning to end, all the different things to talk to. But yeah, I guess what’s the, what’s the reality today? Like what is Sentry able to determine in the world we live in right now?

David Cramer 00:25:47 So we’ve done a lot of this like performance detection stuff already. So we actually can do a lot now we put a lot of time into it and I, I will tell you if you look at other tools trying to do tracing, their approach is much more abstract. It’s like your traditional monitoring tool that’s like we’re just gonna collect a lot of signals and maybe we’ll find magic anomaly detection or something going on in it, which you know, props anybody that can figure that out. But a lot of what we’ve done is like, okay, we kind of know what this data looks like, let’s go after this very like known quantity problem. Let’s normalize the data and let’s make it happen. Like that’s today the enrichment of profiles is new for us but it, we actually can already do it. It’s not perfect and I think we’re launching something in April or May something around the, that timeframe where hopefully for the, the technologies we can instrument, we’re actually able to surface that in a useful way.

David Cramer 00:26:34 But as an example that that concept that I was talking about, like with n plus one queries, the team built something using profiling data and I think this, this might be for like a mobile app more so than anything where mobile apps have this problem of it’s you’ve got a main thread and if you block that main thread, the app is basically frozen. You see this on desktop apps all the time. You’ve, you very rarely see it on web apps anymore but, but it’s a really big problem when you have a mobile or desktop app because you don’t want that like thing to be non-responsive, right? And so one of the things they did was detect when you’re doing like file io on the main thread, you know right? When you’re writing a disc, which is probably a slow thing or something like that, that’s gonna block the whole thing because you should just do it on a separate thread.

David Cramer 00:27:13 It’s like an easy fix potentially, it may not be a problem but it could become a problem. Same thing as n plus one. But what’s really interesting about it is what the team did is like they used the profiling data to detect it because we already know threads and everything in there and then they actually recreated a stack trace out of that profiling data when it’s surfaced. So it’s actually like useful data what that you could like that I or you as a developer might know how to take and actually be like, oh this is where it happens with the source code. I can actually figure it out and go fix it myself. And immediately as like I, I’m still very much in the weeds with software that is like one of the biggest gaps to most things is it just, it doesn’t make it easy to consume or like take action on, right?

David Cramer 00:27:50 Like if I’ve got a a chart that says my error rate is high, what am I gonna do with that? I’m like okay, what’s breaking? That’s immediately my next question, right? Okay this is the air, where is that air happening at? Again, my next question is it’s literally just root cause analysis, right? And so that to me is very exciting. I don’t know that we’re the first people to do that, I’m not sure. But like if we can make that kind of data, that level of actionable and consumable, that’s like a big deal for me because I’ll tell you is like I have 20 years of software experience, I still hate flame charts and like I struggle to use them like they’re not a friendly visualization, they’re almost like a, a hypothetically necessary evil. But I also think one where nobody said like do we even need to use that? Do we need that to be like the way we operate anyways? Like I guess that’s my long-winded way of saying like I’m very excited for how we can leverage that data and change how it’s used.

Jeremy Jung 00:28:40 Yeah, so it sounds like in this example, both in the mobile app blocking the UI or the n plus one query is the Sentry I suppose SDK instrumentation that’s hooked inside of your application. There are certain behaviors that it knows are, are not like ideal I guess just based on people’s prior experience. Like your own developers know that hey if you block the UI thread in this mobile application then you’re gonna have performance problems. And so that way rather than just telling you, hey your app is slow, it can tell you your app is slow and it’s because you’re blocking the UI thread.

David Cramer 00:29:24 Exactly. And I actually think, I don’t know why so many people don’t recognize this gap because at the end of the day, like I don’t need more people to tell me response times are bad or anything. I need you to have an opinion about what’s good because the only way it’s like math education, right? Like yeah you learn the basics but you’re not expected to say go to calc but and then like do all the fundamentals. You’re like don’t get a calculator and start simplifying the problem. Like yeah, we’re gonna teach you a few of these sayings so you understand it. We’re gonna teach you how to use a calculator and then just use the calculator and then make it easier for everybody else. But we’re also not teaching you how to build a calculator because who cares? Like that’s not the purpose of it.

David Cramer 00:30:02 And so for me this is like we should be helping people sort of get to the finish line instead of making them run the entirety of the race over and over if they don’t need to. I don’t know if that’s a good analogy but, but that has been the biggest gap I think in so much of this software throughout the industry and it’s common everywhere and there’s no reason for that gap to exist these days. Like the technology’s fine and the technology’s been fine for like 10 years, like Sentry started in oh eight at this point. And I think there was only one other company I recall at the time that was doing anything that was even similar to like air monitoring and Sentry when we built it we’re just like, what if we just go deeper? What if we collect all this information that will help you debug the problem instead of just stopping it like a log aggregator or something kind of thing so we can actually have an opinion about it. And I, I genuinely, I, it baffles me that more people do not think this way because it was not a hard problem at the time. It’s certainly not hard these days. I mean a lot more people do it now they’ve seen Sentry successful and there’s a lot of similar implementations, but it just amazes me, it’s like why don’t you, why don’t people try to make the data more actionable and more useful to teams versus just collect more of it, you know?

Jeremy Jung 00:31:05 And so I guess based on this, it, it sounds like maybe the popularity of the stack the person is using or of the framework means that you’re gonna have better insights, right? Like if somebody makes a, a jango application or a Rails application, there’s all these lessons that your team has picked up in terms of, hey, if you use the aura on this way, your application is gonna be slow. Whereas if somebody builds something totally homegrown, you won’t know these patterns and you won’t be able to like help as much basically.

David Cramer 00:31:44 Yeah, yeah that’s exactly And you might think that that is a challenge, but then you look at how many employees exist at like large tech companies and it’s, it’s not that big of a deal. Like you might even think collecting all the information for each like programming runtime or framework is a challenge. We have like 40 people that work on that and it’s totally fine. Like and so I think actually all these scale just fine but you do have to understand like the domain, right? And so the counter version of this is if you look at say like browser applications, like very rich single page application type experiences, it’s not really obvious like what the opinions are. Like if you, and this is like real, like if you go to Sentry, it’s, it’s kind of slow like the app, it’s kind of slow. We even make fun of ourselves for how slow it is cause it’s a lot of JavaScript and stuff.

David Cramer 00:32:26 If you ask somebody internally, Hey how would we make pick a page fast? They’re gonna have no clue. Like even if they have like infinite domain experience, they’re gonna be like, I’m not entirely sure because there’s a lot of like moving parts and it’s not even clear what like, like good is, right? Like we know n plus one is bad. So we can say not doing that is the better solution. And so if you have a JavaScript app, which is like where a lot of the slowness will come from is like the render times itself. Like how do you fix it? You can’t actually build a product that tells you what to fix without knowing how to fix it, right? And so some of these newer and very fast moving targets are, are frankly very difficult for us. And so that’s one thing that I think is a challenge for the entire industry.

David Cramer 00:33:07 And so like as an example, a lot of the browser folks have latched onto web vitals, which are just metrics that hopefully tell you something about the application, but they’re not always actionable either. It’ll be like the idea with like web vitals is like, okay, time to interactive is an an important metric. It’s like how long until the page loads that a user can do what they’re probably there to do. Okay abstractly it makes sense to us but like put into action how do I optimize time to interactive? Don’t block the page. That’s one thing. I don’t know. Defer assets, that’s another thing. Okay, so you’ve gotta like, you’ve gotta build a, a technology that knows these assets could be deferred and aren’t, okay which ones can be deferred. I don’t know like it’s like such a deep rabbit hole and then the problem is six months from now the tech will have completely changed, right?

David Cramer 00:33:52 And it won’t have like necessarily solved some of these problems, it will just have changed and they’re now a completely different shape of problem. But still the same fundamental like user experience is the same, you know? And to me that’s like the biggest challenge in the industry right now is that like dilemma of the browser at the end of the day. And so even from our end we’re like okay, maybe we should step back focus on servers again, focus on web services, those are known quantities. We can do that really well. We can sort of change that to be better than it’s been in the past and easier to consume with things like our n plus one detections and then take like a holistic fresh look at browser and say okay, now how would we solve this to make sure we can actually really latch onto the problems that like people have and and we understand, right?

David Cramer 00:34:34 And you know, we’ll see when we get there. I don’t think any product does a great job these days for helping solve those problems. But I think even without the, the products, like I said like even our team would be like fixing this is gonna take months because it’s gonna take months just to figure out exactly where the, the common bottlenecks are and all these other things within an application. And so I, I guess what I mean to say with that is there’s a lot of opportunity I think with the moving landscape of technology, if we can find a way to whether it’s standardized or Sentry can find a way to make that data actionable one something in between there.

Jeremy Jung 00:35:07 So it sounds like what you’re saying is with the backend there’s almost like a standard way of doing things or a way that a lot of people do it the same way. Whereas on the front end, even if you’re looking at a React application, you could look at tenant react applications and they could all be doing state management a totally different way. They could be like the way that the application is structured could be totally different and that makes it difficult for you to infer sort of these standard patterns on the front end side.

David Cramer 00:35:43 Yeah, that’s definitely true and it, it goes, it’s even worse than that because the applications often, well one, there’s just like the nature of jobs script, which is a synchronous in the sense of like it’s a lot of callbacks and things like that. And so that already makes it hard to understand what’s going on, where things are happening. And then you have these abstractions like react which are very good but like they pull a lot of that away. And so as an example of a common problem, you load the application, it has to do a lot of stuff to make the page render. You might call that hydration or whatever. Okay. And then there’s a completely different state which is going from it’s already hydrated. Page one I I’ve done an interaction or something or maybe I’ve navigated a page two that’s an entirely different like sort of performance problem.

David Cramer 00:36:20 But that hydration time, that’s like a known thing that’s kind of like time to interactive, right? But if the problem is in your framework, which a lot of it is like a lot of the problems today exist because of frameworks, not because of the technology’s bad or the framework’s bad, but just because it’s abstracted and it’s really hard to make it work in all these situations, it’s complicated. And again, they have the same problem where it’s like changing nons. And so if the problem is the framework is somehow incorrectly re rendering the page as an example, and this came up recently for some big technology stack, it’s re rendering the page. That’s a really bad problem for the the customer because it’s making the, it’s probably actually causing a lot of CPU seconds. This is why like your Chrome browser tabs are using so much memory in cpu, right?

David Cramer 00:37:01 How do you fix that? Can you even fix that? Do you just say I don’t know, blame the technology? Is that the solution? Maybe that is right, but how would we even blame the technology like that alone just to identify why it’s happening and you need to know the why, right? Like that is such a hard problem these days. And personally I think the only solution is if the industry sort of almost like standardizes on a way to like on a belief of how this should be optimized and how it should be measured and monitored kind of thing. Because like how errors work is like a standardization effectively. It may not be like a formal like declaration of like this is what an error is, but more or less they always have the same attributes because we’ve all kind of understood that like those are the valuable things, right?

David Cramer 00:37:41 Okay, I’ve got a server rendered application that has client interaction, which is sort of the current generation of the technology. We need to standardize on what like that web request like response lifecycle is, right? And what are the moving targets within there? And it it just, to me, I honestly feel like a lot of what we use every day in technology is like beta software. And I think it’s one of the reasons why we’re constantly always having to up like upgrade and, and refactor and and and shift dependencies and things like that because it is not, it’s very much a prototype, right? It’s a moving target, which I personally do not think is great for the industry because like customers do not care, they do not care that you’re using some technology that like needs a change every few months and things like that. Now it has improved things to be fair. Like web applications are much more like interactive and responsive sometimes. But it is a very hard problem I think for a lot of people in the world.

Jeremy Jung 00:38:33 And when you refer to things feeling like beta I suppose are, are you referring to the frameworks people are using or the libraries are using to support their front end development? I’m curious what you’re thinking there.

David Cramer 00:38:47 I think it’s everything. Even like the browser APIs are constantly shifting. That’s gotten a little bit better. But even the idea like type script and stuff, it’s just like we’re running basically compilers to make all this code work. And so the even that they’re constantly adding features just because they can, which means the behaviors are constantly changing. But like if you look at a real world example like React is like the most dominant technology, it’s very well designed for managing the dom. It’s basically just a rendering engine at the end of the day. It’s like it’s managed to process updates to the dom. Okay, makes sense. But we’ve all learned that these massive single page applications where you build all your application logic and loaded into a bundle is a problem. Like I don’t know how big centuries bundle is, but it’s multiple megs in size and it takes a little while for like even on fast fiber here in the Bay area, it takes you know, several seconds for the UI to load and that’s not ideal.

David Cramer 00:39:34 Like at some point half of us became okay with this. So we’re like okay what we need to do is go back, literally just go back 10 years and we need to render it on the server and then we need some stuff that makes interactions, you know, highly responsive in the UI or dynamic content in the ui. You know, it’s like bringing back Jane Query or something. And so we’re kind of going full circle but that is actually like very complicated because the way people are trying to do is like okay we wanna, we wanna have the rendering engine operate the same on the server as on the client, right? So it’s like we just write one path of code that basically it’s like a template engine to some degree, right? And okay that makes sense. Like we can all get behind that kind of model.

David Cramer 00:40:12 But that is actually really hard to make work with a lot of people’s software and I think the challenge and frameworks have adopted it, right? So they’ve taken this, so for example it’s like react server components which is basically just like can we render it on the server and then also keep that same interaction in the ui. But the problem is like frameworks take that, they abstract it and so it’s another layer of complexity on something that is already enormously complex and then they add their own flavor on it, like their own opinions for maybe what the way the world is going. And I will say like personally I find those flavors to be very hard to adapt to like things that are tried and true or importantly in this context, things that we know how to monitor and fix, right? And so I don’t know what the be all end all is, but my thesis on this is you need to treat the UI like a template engine and that’s it.

David Cramer 00:40:56 Remove all like complexity behind it. And so if you think about that, the term I’ve labeled it as, which I did not come up with, I saw this from somebody at some point, it’s like your front end as a service. Like you need to take that application that renders on the server and the front end and it’s just an entirely different application, which is annoying and it just calls your APIs and that’s how it gets the data it needs. So you’re literally just treating it as if it’s like a single page application that can’t connect to your database. But the frameworks have not quite done that and they’re like, no, no, we’ll connect to the database and we’ll do all this stuff but then it doesn’t work because you’ve got like, it works this way on the back end and this way on the front end.

David Cramer 00:41:28 Anyways, again, long-winded way of saying like it’s very complicated. I don’t think the technology can solve it today. I think the technology has to change before these problems can actually genuinely become solvable. And that’s why I think the whole thing is like a beta, it’s very much like a moving target that eventually will get there and it’s definitely had value, but I don’t know that responsiveness for low latency connections is where the value has been created. You know, for like folks with bad internet and say remote Africa or something, like I’m sure the internet is not a very fun place for them to use these days.

Jeremy Jung 00:41:58 I guess one of the things you mentioned is there’s this almost like this split where you have the application running on the server, it has its own set of rules because it, like you said, has access to the database and it can do things that you can’t do in the browser and then you have to sort of run the same application in the browser but it’s not quite the same application because it doesn’t have access to the same things in the browser. So you have this weird disconnect I suppose.

David Cramer 00:42:26 Yeah. And and, and then the challenges is like a developer that’s actually complicated for you from the experience point of view. Cuz you have to know somehow, okay these things, these are actually running on the server and only on the server and like, so I think the two biggest technologies that try to do this or at least do it well enough or the to that I’ve used, there might be some others are njs and remix and they have very different takes on how to do this. But remix is the one I used most recently so I, I’ll comment on that. But like there’s a way that you kind of say, well this only runs on I think the client as an example and that helps you a little bit. You’re like, okay, this is only gonna render on the client. I actually can think about that and reason about that.

David Cramer 00:43:05 But then there’s this thing like, okay, sometimes this runs on the server, only this part runs on the server and it just becomes like the mental capacity to figure out what’s going on in debug it is like so difficult. And that database problem is like the normal problem, right? Like of like I can only query the database on the server because I need secure credentials or something. Okay, I understand that as a developer, but I don’t understand how to make sure the application is doing what I expect it to do and how to fix it if something goes wrong. And that, that’s why I think I’m a believer in constraints. The only way you make progress is you simplify problems. Like you just give up on solving the complicated thing and you make the problem simpler, right? And so for me, that’s why I’m like, just take the database outta the equation.

David Cramer 00:43:44 We can query APIs from the client, from the server, same security levels, okay, make it so it can only do that and it has to be run as almost like a UI only thing. Now that creates complexity cuz you have to run this other service, right? And, and like I personally do not wanna have to spin up a bunch of containers just to write like a simple like web application. But again, I, I think the problem has not been simplified yet for a lot of folks. Like React did this, to be fair, it made it a lot easier to to build UI that was responsive and just updated values when they changed, you know, which was a big deal for a long period of time. But I feel like everything after has not quite reached that area. Whereas it’s simple and even React is hard to debug when it doesn’t do what you want it to do.

David Cramer 00:44:23 So I don’t know, there’s still gaps I guess is what I would say. And hopefully, hopefully, you know, in the next five years we’ll kind of see this come to completion because it does feel like it’s getting closer to that compromise. You know, where like we used to have pure server rendered apps with some weird janky JavaScript on top. Now we’ve got this bridge of really complicated JavaScript on top and the server apps are also complicated and it’s a nightmare. And then this newer generation of these frameworks that work for some types of technology but not all. And we’re kind of almost coming full circle to like server rendered, you know, everything. But with like allowing the same level of interactions that we’ve been desiring I guess on the web. So figures cross this gets better, but right now I do not see like a clear like, oh it’s definitely there. I can see it coming. I’m like, well we’re kind of making progress. I don’t love being the beta tester of the whole thing, but we’re kind of getting there. And so, you know, we’ll see.

Jeremy Jung 00:45:17 I guess you, you’ve been saying this whole shifting landscape of how Front End works has made it difficult for Sentry to provide like automatic instrumentation and things like that for mobile apps. Is that a different story? Like is it pretty standardized in terms of how do you instrument an Android app or an iOS app?

David Cramer 00:45:38 Sort of but also, no, like a good example here is like early days mobile, it’s a native application. You ship a binary known quantity, right? Or maybe you embedded a web browser, but like that was like a very different thing. Okay. And then they did things where like, okay, more of it’s like embedded web browser type stuff or dynamically render content. So that’s now a moving target. The current version of that, which I’m not a mobile dev, so like people have strong opinions on both sides of this fence, but it’s like okay, do you use like a hybrid framework which allows you to build say React native, which is like allows you to sort of write a JavaScript ish thing and it runs on both Android and mobile but not really well on either. Or do you write a native app which is like a known quantity, but then you may maintain like two code bases have two degrees of expertise and stuff.

David Cramer 00:46:18 Flutters the same thing. So there’s still that version of complexity that goes on within it. And I, I think people care less about mobile cuz it impacts people less. Like there’s that whole generation of like, oh, mobile’s the future, everything’s gonna be mobile. That’s not become true. Mobile’s very important. But like we have desktops still. We use web software all the time, half the time on mobile. We’re just using the web software at the end of the day so at least we know that’s a thing. So I think that investment in mobile has died down some, but some companies like Mobile is like their main experience or one of their driving experiences is like a company like DoorDash Mobile is as important as web if not more, right? Because of like the types of customers. Spotify probably same thing but I don’t know Sentry we don’t need a mobile app.

David Cramer 00:46:57 Who cares? It’s irrelevant to the problem space, right? And so I think it’s just not quite taken on. And so mobile is still like this secondary citizen at a lot of companies and I think the evolution of it has been like complicated. And so I, I think a lot of the problems are known but maybe people care less or they’re just less customers and so the weight doesn’t, like the weight is wildly different. Like JavaScript’s probably like a hundred times the size from an investment point of view for everyone in the world than say mobile applications are is how I would think about it. And so whether mobile is or isn’t solved is almost irrelevant to the general problem at hand. And I think at the very least like mobile application, there’s like a tool chain where you can debug a lot of stuff that works fairly well and hasn’t changed over the years. Whereas like the web you have like browser tools, but that’s about it. So

Jeremy Jung 00:47:45 Yeah, so I guess with mobile I was initially thinking of native apps, but you’re bringing up that there’s actually people who would make a native app that’s just a web view for a webpage or there’s React native or there’s flutters. So there’s actually, it really isn’t standard how to make a mobile app.

David Cramer 00:48:04 Yeah. And even within those it comes back to like, okay, is it now the same problem where we’re loading in a bunch of JavaScript or downloading a bunch of JavaScript and content remotely and stuff. And like you’ll see this when you install a mobile app and sometimes the binaries are huge, right? Sometimes they’re really small and then you load it up and it’s downloading like several gigs of data and stuff, right? And those are completely different patterns. And even within those like subsets, I’m sure the implementations are wildly different, right? And so that may not be the same as like the runtime kind of changing. But I remember there was this, this must be a decade ago and I still am a gamer, but early in my career I worked a lot with like games like World Warcraft and stuff. And I remember when games started launching progressive loading where it’s like you could download a small chunk of the game and actually start playing and maybe the textures were lowered, like resolution and everything was lower fidelity and, and you could only go so far until the game fully installed.

David Cramer 00:48:56 But imagine like if you’re like focused on performance or something like that, measuring it there is completely different than measuring it once, say everything’s installed, you know? And so I think those often become very complex use cases and I think that used to be like an extreme edge case that was like such a, a hyper-specific optimization for like what the Warcraft, which is like one of the biggest games of all time that it made sense, you know, okay, whatever. They can build their own custom tooling and figure it out from there. And now we’ve taken that degree of complexity and tried to apply it to everything in the world. And it’s like uhoh, like nobody has the teams or the talent or the experience to necessarily debug a lot of these complicated problems just like Sentry. Like you know, we’re not dealing with react internals if something’s wrong in the React internals, it’s like somebody might be able to figure it out but it’s gonna take us so much time to figure out what’s going on versus, oh we’re rendering some html, cool, we understand how it works, it’s a no known problem, we can debug it.

David Cramer 00:49:45 Like there’s nothing to even debub most of the time, right? And so I, I don’t know, I think the industry has to get to a place where you can reason about the software where you have the calculator, right? And you don’t have to figure out how the calculator works. You just can trust that it’s gonna work for you.

Jeremy Jung 00:49:57 So kind of shifting over a little bit to centuries and internals, you said that Sentry started in, was it 2008 you said

David Cramer 00:50:07 The open source project was in 2008? Yeah,

Jeremy Jung 00:50:10 The stack that’s used in Sentry has evolved. Like I remembered that there was a period where I think you could run it with a pretty minimal stack. Like I think I may have even supported SQL Light. Yeah, . And so it was something that people could run pretty easily on their own, but things have obviously changed a lot. And so I wonder if you could speak to sort of the evolution of that process. Like when do you decide like hey this thing that I built in 2008 is not gonna cut it and I really need to re-architect what this system is.

David Cramer 00:50:42 Yeah, so I don’t know if that’s actually the reality of why things have changed that it’s like, oh this doesn’t work anymore. We’ve definitely introduced complexity in the sense of like probably the biggest shift for Sentry was like it used to be everything and it was a SQL database and everything was kind of optional. I think half of that was maintainable because it was mostly built by me and so I could maintain like an architectural vision that kept it minimal. I had the experience to figure it out and duct tape the right things. So that was one thing. And I think eventually, you know, that doesn’t scale as you’re trying to do more and build more into the product. So there’s some complexity there, but for the most part it can still be a SQL database, whatever. Could it be SQL Light forever? Probably not.

David Cramer 00:51:18 But it was gold when you could run SQL Light in the development environment cuz it was super fast, right? And so then there was like the evolution of that to where we started adding more complex services. So one example was we needed a key value store and we needed like store blob data because storing them in the SQL database was highly inefficient and becoming very expensive. And so we wrote just an abstraction that could sort in the database or in the technology at the time was React or I a K, not to be confused with React. And that was just like, you can almost think of it like uh, big table or naively S3 or something. It’s literally just like get set, delete where the operation’s available to us and then actually we’re totally fine. So it meant like we could swap it out for more scalable technology, but you could also ignore swapping it out and at some point you’ve got like a maintenance burden for like handling these different code paths, right?

David Cramer 00:52:02 Because that’s also the era when Sentry was only open source or primarily open source. So primarily just ran it yourself, you know? Then we started adopting SaaS, we’re starting to build more complexity. Basically the constraints we put into place are actually causing problems now. So we need to change the constraints, which means supporting all these things, it’s hard. So that’s one thing that happens right now. I would not excuse it and say you couldn’t still keep things simple. I think that’s a thing. But that takes a lot of rigor and at some point you’re like, is it worth it? You know? Especially when primarily, especially in this day and age things are just SaaS services and we’ve all kind of accepted that. But that was one and then there was a big shift where we wanted to consume and make scalable and fast a lot of data, right?

David Cramer 00:52:42 Like there’s already lots of errors. I don’t know how many, like I’m certain Sentry like sucks in hundreds of thousands of events a second on a bad day, on a slow day. So I’m sure it’s a huge amount of day that we get these days and that takes a certain scale, right? And it’s actually very hard to handle small scale and large scale at some point. And obviously we’re optimizing for the SaaS so at some point we’re like okay let’s, we need a better solution for storing the events, something that we’ll be able to like write them in, search them very fast, take a lot of the load off the database and so we adopt a click house. And I think that’s kind of when that’s like the, whatever the saying is, like straw that broke the camel’s back or something, I don’t know the saying, but where everything started to get more complex and that’s when all of a sudden it’s like you need 15 docker containers and this isn’t even microservices, this isn’t even like the worst version of this.

David Cramer 00:53:27 Hell, you know, it’s just like, and I think it’s unfortunate that that’s like what it’s come to is like you have to run a lot of these complex services and I think this is not even representative of just centuries growth. It’s like the whole industry where we’ve, we’ve made things a little bit more complicated than it needs to be in a lot of cases. And so I think it’s just like it was that shift and then it’s like when we wanted to build performance monitoring, we needed all these new things. And even in the product today, like if you just want to use it for air monitoring, whether you’re a SaaS customer or self-hosted, there’s still a lot going on, right? Okay, maybe that’s fine, maybe that’s a reality. But right now, like this year we’re gonna redo a lot of the product because it’s too complicated, there’s too much going on and it becomes like a distraction.

David Cramer 00:54:06 Like you load up the AWS console or even gcps just as bad these days where there’s like 800 different services and you’re like who cares? Like why do I need any of this? And half of ’em are the same thing and that is the worst part is half of them are the same thing or or they’re not achieving different goals would be how I think about it. And Sentry is not that, but the biggest companies in our space are that like if you use Datadog, it’s the same thing and there’s like go to the Datadog’s pricing page and you’re like oh my god, there are so many different things in here. I don’t even know what the difference is between some of these products, right? And so we’re very conscious of this and I think, I don’t know that we can reduce the technology complexity that much if at all these days other than maybe making it a little bit easier to not have to run everything for the open source side of things.

David Cramer 00:54:48 But the product burden is still there when you’re like, how do I even reason about all this stuff that’s going on and why does it even need to happen this way? So I don’t know, it’s like a sad thing to some degree, but at the same time, like we’ve been able to build like such a significant product that works for so many different people and it solves so many different problems and it’s still, you can not as easily, but you can still can run it on your own hardware. And I’m amazed when like a single human runs it on like a V P S or something, which it for newer generation people is a virtual private server. Okay. Like a lin node or a digital ocean, something cheap because it is a lot of stuff, it’s a lot of services to run and a lot of them are Java and they take a lot of memories. So

Jeremy Jung 00:55:27 What it sounds like is that version of the software that used to be able to run with pretty minimal dependencies, I I think, you know, you would have a relational database. I think you had Redis in there and now there’s like you said, click house, there’s zookeeper, there’s is there Kafka?

David Cramer 00:55:45 There is Kafka, yeah.

Jeremy Jung 00:55:47 There’s a lot of different pieces. And from what you’re telling me is it’s, it’s kind of a function of going from this company that’s like here’s this open source software that anybody can run themselves to. This is really a SaaS company. This is like we need to focus on being able to run this at scale and the use case of somebody running it themselves or at their own business is just like kind of not that important anymore. At least to, you know, that’s not the priority and that’s why you kind of made these decisions because they were suited to building a SaaS platform that can service a lot of customers.

David Cramer 00:56:25 Yeah, that that is a huge chunk of it. I think there was also the thing like for example that key value store, at some point we supported Cassandra, we never used Cassandra ourself, but it was in the code base. We’re not gonna maintain that. It might be broken for all we know. Yeah there’s a CI system and at some point you gotta say like no that’s not happening. And that was actually a good decision. Like less variability is huge for infrastructure, right? And that doesn’t require to be more complicated, but like if we just say too bad you can’t run Cassandra run, I don’t know, you know big table or something that we actually use ourselves. It’s like maybe you don’t want to, but it’s like we’re not making any money and nobody’s maintaining all these like adapters for these different databases like we supported like Jengo does because, so we sort of did.

David Cramer 00:57:07 But like Ms. Sql, which I assume is still a thing, I’m not sure, no idea if it ever worked correctly was performant or not. You know that and that’s the same dilemma, SQL light, right? And so I think SQL Light was fine for development environments, who cares for production? No chance does it function for us? But I don’t know. So I think it’s sort of a necessary evil but but it doesn’t have to be complicated. So a good example today is we still require something like Rabbit, MQ and Kafka. Why do we need two things that are effectively brokers? We don’t. It’s just like why would it be a priority to replace like the old system that powers all this or combine it with the new system that powers these new things. It’s like, it’s a lot of engineering work, right? And so there is that compromise of velocity in there that we’re forever facing.

David Cramer 00:57:49 And this doesn’t even begin to touch on like the newer stuff that we’re gonna open source. Like I don’t even think we’ve open sourced our first acquisition, which was the profiling stuff. I think that might still be closed source. We’re gonna open source that we acquired code cover recently, we’re gonna open source that. And so those are more moving parts that even add more complexity to the product. And so one of the, and to be fair, this complexity cascades in the development environment and experience and stuff. So it’s worthwhile to fix, but as an example, you should be able to say I’m just working on maybe the errors stuff or the issues product so I don’t need to run all this other stuff. And that same decision can also say, oh the customer only wants to like use this for the errors product so they don’t need to run all this other stuff. Right? You like you can solve both of those goals and I think it just requires intent behind it doesn’t necessarily mean it’ll make it that much simpler, but it is a path towards enabling that simplicity.

Jeremy Jung 00:58:40 It’s also probably aligning with how the business is structured, right? Like I don’t think you offer a self-hosted, commercially supported plan, right? So in that regard, there isn’t really the need from the company side to say like, hey, let’s make these cases where you can deploy it and only install two of the dependencies because you only need that much. It’s like very different from a system or a product that was not really a SaaS or it could be a SaaS but also had a on-premise option. Like you could take I suppose Atlassian’s Suite as an example where they have their cloud service, but there’s also a lot of businesses that run it on site. So there’s probably different decisions that they have to make.

David Cramer 00:59:29 Yeah, a hundred percent. And I’m actually curious, like you know, the model from a technology angle is probably similar to GitLab and I, I, I’m not a GitLab user, but I wonder if if folks like that have the same complexity where they’re like here’s our 15 different products and it’s sort of opensource, you know, and we, ours at least was intentional. We’re like we’re not building on-prem software, we don’t wanna be in that business, we don’t want these support contracts for Cassandra and random We basically don’t wanna have to have an outsourced ops team, you know, for this work. It’s just not interesting. It’s not building a product at the end of the day. And so I think it was like that was probably one of the best decisions we ever made was yes, it’s open source, we’re building an open source service like a SaaS service at the end of the day and that’s the reality of things, right?

David Cramer 01:00:07 And that was such a big deal for us because yeah, like it’s hard to keep the complexity in check no matter what if you get to any reasonable size, right? And big ambitions are one of the most important things you can have in my opinion for building anything. Even if it’s not like, oh we wanna build a big company. You wanna be able to build something that solves like significant scope of problems, right? And there’s more value for Sentry as a whole if we can solve more problems under sort of the same kind of landscape, right? If nothing else, because whether it’s true or not, the reality is like nobody wants to run 20 different tools or whatever it is, right? Like usually what happens is you have to, because like one tool cannot rule them all, but if the problem is the same shape, the technology should be able to be reused to solve that problem. And that that’s where we go back to like, well we have errors and we have n plus one issues and they’re kind of the same. So how do we make the product just surface both of those use cases, right? Same technology, same product for the most part. And then inherently here comes the complexity. So

Jeremy Jung 01:01:05 You’ve mentioned a little bit about how there are parts of Sentry that are open source and I know that there’s different licenses applied to different parts of the product. So I wonder if you could explain sort of how those decisions are made and maybe you could explain the business source license as well.

David Cramer 01:01:26 Yeah, so once upon a time Sentry is BSD licensed and that was the server itself. And Sentry’s got a lot of open source stuff or a lot of projects and services and stuff. But the core of the server was like super liberally licensed. Everything else was fairly liberally licensed. Like we never used gpls, we never used proprietary licenses, we had some close source stuff, but it’s irrelevant. It was like our data analytics and billing code and stuff like that. But so as all this, it was all just like liberal free, you could use it, no strings, no open core, no nothing, right? And then we had constant annoying conversations with people trying to sell our software. And to be clear, Sentry has always been built by myself and people that we’ve employed at the company. Sometimes people contribute small patches, but that’s wildly different than maintaining or actually developing the software.

David Cramer 01:02:07 And so it’s best to think of Sentry as like, yes it’s open source but it was built by our company, the SDKs and integrations, that’s different but the core service, right? And so we had this thing where people like companies were trying, trying to sell it, they didn’t wanna give back financially or they didn’t contribute both basically. And so we’re like this is annoying and this is literally the decision tree, this is annoying, let’s stop them from doing that so we can no longer think about this problem. And so at that point we said we’re gonna change how we do licensing at Sentry for open source. If it’s part of the service, the core service, it becomes bsl, which I’ll explain in a minute. If it’s anything else that doesn’t run the core service like an SDK or anything that needs embedded in customer applications, it must use a liberal license.

David Cramer 01:02:48 And so we did this because I did not want an open core model and open core, if you’re not familiar, usually what it translates to is here’s the version of the open source product that you can use for free, no strings attached and here’s the good version that costs minimum 50 to a hundred thousand dollars a year or something along those lines. It becomes this obnoxious selling paradigm. I hate that model so much as a consumer that I refuse to build it. And if it was not for companies trying to sell our stuff, we would still be BSD or Apache license for the server, right? Unfortunately humanity, you can’t take on good faith and definitely in like capitalism, you cannot operate on good faith. So we relicensed the BSL and bsl, the way I think about it is eventually open source. And so open source has a lot of different angles.

David Cramer 01:03:27 You can think of it as open source is like free software that I can use just for free or open source is like I can take that software and do whatever I want with it, which is the most extreme version or open source, I can use the code in other ways or something like that, right? There’s all these kind of variations of the thing, but I think the most are like I can do whatever I want with it or it’s free. And I actually think a lot of people mostly care about open source from, it’s the it’s free angle. And so bsl, what it lets us do is say it’s free, you can’t sell it. So we, we blocked off people from like sort of cannibalizing our ability to fund the development and in three years, which is the time horizon we picked, you can do, I think it, the lowest you can do with BSL is four years.

David Cramer 01:04:08 So the high longest duration you can do something like every year it’s like your personal choice. After three years it becomes Apache licensed and a lot of open source advocates who will be like, well like three years is a lifetime. I’m like, that’s cool. We’re not here to let other people build businesses outta Sentry. So I could care less about those arguments, right? I want people to be able to run it self-hosted because like I want everybody to be able to use our software. I don’t need people to be able to sell it. I don’t need people to be able to take it and do whatever they want with it. It’s irrelevant like right to the world. But after three years, all that knowledge that we’ve gained in that prior art becomes public domain, right? And so we still achieve almost like this knowledge share and you can, it’s still source, you can view this source, you can like be inspired by it and it’s software.

David Cramer 01:04:48 It’s like it’s only so protected at the end of the day. And so that’s what it is for us, right? So it allows us to keep it open source. So like what is on Sentry.io again, other than some proprietary stuff that’s like billing code is literally like that mainline branch is what we’re shipping to production at the end of the day. And I mean that’s cool like that, that’s like a lot of the ways in the spirit of open source, but it doesn’t pray for humanity to be like nice humans all the time, right? Like it protects the business and the development and the, you know, tens to hundreds of millions of dollars we poured into like r and d over the years. Right.

Jeremy Jung 01:05:20 It sounds like the decision you made is probably very similar to other vendors like Mongo and Elastic. Trying to think of some other examples. I guess Cockroach is another one. Yeah, cockroach

David Cramer 01:05:33 Is another good. It is very similar. The difference is like there’s very few companies that operate like Sentry in the world that are like a SaaS service that happens to be open source. Most of those are like infrastructure that you probably wanna run yourself. And most of those, they’re billing, the way they make money is actually still you running that software yourself, right? And so they may not be open core, some of them are, but there’s still a lot of, it’s fundamentally not a cloud service that they’re selling. Like they will try. But like if you’re looking at like Elastic, most people use it because they can run it on prem. Otherwise I’m just gonna use whatever the Google checkboxy button click thing is at the end of the day. But there are a lot of people, even the BSL model, there’s a lot of people use it.

David Cramer 01:06:10 Like Cockroach is one of the first that I recall that used it and we were inspired and learned about it from, I could not tell you who else uses it, but it’s become a more popular thing. And whether it’s the right solution or not, there needs to be something that is mainstream that people use that achieves this, that basically is the BSL and I hopefully it’s a BSL or some knockoff because the problem is if you have 15 different flavors of this license, legal review is not fun. And if legal review is not fun, it gets blocked in companies, right? And so Mongo is its own license. Elastic I think is its own license. They’re both like proprietary things that are one-offs that have to go through a legal review. BSL is a known quantity with a clause. So all legal has to do is like review that clause and say does this clause, is that safe enough for us or not? Right? So it’s a very like simple decision at that point. And so I don’t know, hopefully the industry figures that out instead of constantly bickering about what is and isn’t open source. Cuz that would be a much better spend of people’s time these days

Jeremy Jung 01:07:04 If you’re building a new product. I mean when you look back at at Sentry you started with the B S D license, which is a very permissive license. What’s the things you would look at to decide that you’re gonna start this new product with a permissive license versus go straight to the B S L license?

David Cramer 01:07:24 Honestly, I would just not get distracted by that choice early on in a project. I think it depends on what you’re building in all honesty. Like if I were building something that was designed, it was like infrastructure that was like so low friction that you could just spin it up and it worked like very well. Amazon’s gonna sell that. They’re gonna take it, they’re gonna be like, cool, we can spin this up for you too. And then they’re gonna, I mean you’re gonna struggle, right? Like I think now is that actually what would happen? I don’t know. But that’s a risk, right? And I think this was like the kind of risk people like Cockroach were worried about was like Amazon might sell it cuz they saw that like Elastic that happened to them. Now Elastic is done fine. Nobody should be sad for Elastic.

David Cramer 01:08:00 They’ve made plenty of money with Amazon selling it on top. Like it doesn’t really matter, right? Like pity the executives there or something, it doesn’t really matter. But for a early stage project it’s so irrelevant. Like if you make your license restrictive, you will get less adoption. So unless the, your open source project is only gonna be used by a hundred companies in the world, having a restrictive license is only gonna harm you. Right? And so you can do it, but I just wouldn’t. And open core is the same thing. It’s just like, yeah, I don’t know, like this is the glory of sass. It’s like you can build an open source thing and not care that it’s open source now because you can just sell a cloud service and people will buy the cloud service no matter what. Right? Like Sentry’s model when we raised our seed funding was it was BSD license at the time, super free, super open.

David Cramer 01:08:46 And people were like, well why would anybody pay for it? And I’m like, it doesn’t matter if they don’t pay for Sentry, they won’t pay for anything in the industry because we will build the best thing and it will be so free and good that nobody will have a choice like but to use it, right? And the reality is if you’re a reasonable company, you don’t wanna spend a bunch of engineering hours and time and money on running this random service that you can just outsource. Like that’s true for everything in the world, right? And that was the reality for Sentry. Like I said, there was no like actual risk of the business. It was just like, frankly it was just me off that I had this one company that was constantly trying to sell our And so I’m like, you know what? This is my middle finger to you is the BSL license and I can never think about you again now.

David Cramer 01:09:23 But we were already a successful business when that came along. Like people already used the SaaS service, not everybody, but plenty of people. And so for me it was just like, is your goal that you want lots of customers using your thing, use a permissive license, otherwise you’re like over optimizing for a non-com. It’s the same thing with like people that they’ll build like random open source stuff and they use like GPL or something. It’s almost like you wanna put your politics on another company. Like you wanna force them to contribute back, who cares? They’re either gonna contribute back or not. Like it’s still open source at the end of the day. Don’t like restrict people’s usage if you’re just building like something on the side. And so like we actually have a policy internally, we won’t license anything as a G P L variant.

David Cramer 01:09:59 Like it’s just not something we think is the right approach to open source, right? Again, you can have an opinion on it doesn’t really matter to us, but we’re like, no, if it’s open source, if it’s actually intended to be like the free open source, it should be free without the constraints. Who cares if somebody monetizes it? Like that’s the beauty of open source is like it helps everybody, you know? It’s almost like a charity to some degree. So I’d say I have a lot of strong opinions about it, but mostly it’s like, don’t distract yourself with things that don’t matter. It’s like a broad statement for anything, let alone the open source licensing.

Jeremy Jung 01:10:28 Yeah, that makes sense. I mean you don’t know that your product is gonna get adoption to begin with, right? So it’s probably you would take a path similar to Sentry where you have something permissive, you see do people care? Is this actually a business? And then once it becomes one, then you can maybe worry about licensing.

David Cramer 01:10:47 Exactly. Yeah. Because you can always change the license later or you can find another thing to augment the software you’ve built that allows you to monetize it, which is super common. Like you see this, like there’s been a lot of stuff that’s raised funding recently. The core is still open source, true free open source. And then they’re like, this is how we’re gonna monetize development via some mechanism, like some cloud services or something. That’s not true for everybody. You can’t just take open source and monetize it. But like if that’s what you wanted to do, why’d you build open source in the first place, right? Like just save yourself a headache. Like I think EngineX, I dunno how well they’ve done as a business, but like what are you gonna monetize in engine X? It already just works. Oh, you got some like cloud analytics, it’s cool. Like maybe they were useful, I never used them, but like I’m not gonna buy that. But that was just like, oh, engine X’s so successful, could we monetize it? Versus could we build a business that actually is open source one’s like very intentional versus, you know, capitalistic I would say, or predatory to some degree. I’m not talking about the EngineX folks. I love the, the product and the business, but you know, it’s a much harder thing to do.

Jeremy Jung 01:11:48 You love the product but don’t need to pay for it.

David Cramer 01:11:51 Exactly. Yeah. I, we probably still use the product, but these days that’s more of a commodity than anything like the web server. So.

Jeremy Jung 01:11:58 So as we wrap up, is there anything else that you thought we should have brought up?

David Cramer 01:12:04 No, I, I mean I think this was, this was fun. You know, I, I’m very much an authentic person. I, I think it’s like I’m not a marketer. I like talking about things that add value to folks’ lives, not just like pitching products and stuff. So I think it’s good to go deeper on the the things which is fun. I don’t know, since she’s doing a lot of cool stuff. Hopefully people like it. If you don’t tell us you have issues, tweet at me. We take every piece of feedback very, very seriously. Yeah.

Jeremy Jung 01:12:26 And if people wanna see what you’re up to, check out Sentry, where should they head?

David Cramer 01:12:31 Sentry.io. We’re GI Sentry on Twitter. Hopefully by the time this airs, Twitter still exists, but on a Sentry io GitHub, we’re gi Sentry Sentry somewhere. You search for us, you’ll find us. We’re very active on GitHub though, so it’s a good form if you wanna participate that way.

Jeremy Jung 01:12:46 David, thank you so much for coming on Software Engineering Radio.

David Cramer 01:12:49 Absolutely. And thanks again for having me.

Jeremy Jung 01:12:51 This has been Jeremy Junk for Software Engineering Radio. Thanks for listening [End of Audio]

Join the discussion

More from this show