364: Constructive vs Predicative Data

The Bike Shed - A podcast by thoughtbot - Tuesdays

Categories:

Stephanie and Joël attended RubyConf Mini, and both spoke there. They discuss takeaways and highlights from the conference. The core idea for this episode is explained in this article: Constructive vs. Predicative Data. This came up recently in a conversation at thoughtbot about designing a database schema and what constraints could be encoded in the schema directly versus needing some kind of trigger or Rails validation to cover it. This episode is brought to you by Airbrake. Visit Frictionless error monitoring and performance insight for your app stack. RubyConf Mini Episode on CFP - The Bike Shed 352: Case Expressions Podcast panel: The Ruby on Rails Podcast Episode 446: I'm Giving A Talk on Thursday Slides for FP talk: Functional Programming for Fun and Profit!! Episode on language: The Bike Shed - 356: The Value of Specialized Vocabulary Constructive vs. Predicative data Avoid the Three-state Boolean Problem Transcript: JOËL: Hello and welcome to another episode of The Bike Shed, a weekly podcast from your friends at thoughtbot about developing great software. I'm Joël Quenneville. STEPHANIE: And I'm Stephanie Minn. And together, we're here to share a bit of what we've learned along the way. JOËL: So something that's very recent in both of our worlds has been that both you and I, Stephanie, attended RubyConf Mini, and we both spoke there. What are some of your takeaways or highlights from the conference? STEPHANIE: Seeing you in person was definitely a highlight. I really enjoyed that. Because we're working remotely, I don't, you know, get to be in an office with you day to day. And it was really awesome to hang out with you, I think, for the first time as co-hosts of the podcast. And we both, I think, met some people at the conference too that were listeners. And it was really awesome to share that experience with you. JOËL: I had the interesting experience of several people who told me they recognized me by my voice, which I think is a common thing for podcasters, but as a new host, I was surprised by that. STEPHANIE: Yeah, that's weird. As a podcast listener, too, I definitely know exactly what you're talking about where it's like, oh yeah, I can identify someone by their voice. But to then be that person that people can recognize is pretty weird. I also really enjoyed being an audience member of the podcast panel that you are on at the conference with other podcast folks. It was moderated by Brittany Martin. And yeah, I just thought you represented The Bike Shed really well and spoke for both of us about podcasting in a way that I really appreciated. JOËL: And for any of our listeners who were not able to be there in person, Brittany has published that episode as a podcast, and we will link to it in the show notes. STEPHANIE: Another thing I really liked about RubyConf Mini was the smaller scale. I think it was about 150 or so attendees, which felt very different from traditional Ruby Central conferences with several hundreds of people. I heard a lot from other folks there that they really liked the regional aspect of it, the intimacy of the smaller conference. I think I got more of an opportunity to run into people that I'd met at the conference over the next few days. And there was, yeah, definitely a sense of tighter knit community there, you know, when you meet someone, and then you bump into them on the way into a talk, and then you can ask how their day was going and any highlights that they had. And yeah, I guess I haven't really attended a conference that size before, and so that felt like a very special experience for me. JOËL: I 100% agree. I think the smaller format definitely makes it a little bit more intimate, makes it much easier, I think, to build some of those social connections, to meet with people, and to have some good conversations. I think the format of the conference as well favored that. There were, I think, larger breaks between talks that encouraged people to hang out and talk. And, as you said, because it's smaller, you also get to see the same people over the course of a few different breaks instead of being like, oh, I met a stranger on the morning of day one, and then in the afternoon, I met another stranger. And it's just constantly introducing yourself. One thing that was really interesting to me is the experience of being a speaker is very different than just attending. As a speaker, you get to go to the speaker dinner and connect with a lot of the other speakers there. Some of them might be quote, unquote "famous people" that you're not quite comfortable just walking up to and introducing yourself. But in the smaller dinner, you just find yourself sitting next to them and enjoying some food or a drink and getting conversations. It's also much easier to have people come up to you during the conference. Because you're a speaker, people will come and talk to you. So if you tend to be a little bit more introverted, as long as you can get over your fear of being on stage and public speaking, it actually makes social connection interaction much easier to be a speaker. I would recommend to any of our listeners who were wondering how can I get more out of a conference? How can I get better connections, better conversations? Consider being a speaker. STEPHANIE: Yeah, absolutely. We've talked about this before; I think when we chatted about writing our CFPs for this conference that speaking doesn't have to be a really big, scary thing, but everyone has something to say. I think we had mentioned in previous episodes that your talk topic came out of just a discussion that you had internally, and you were like, wow, enumerables are so cool, like, let me dig deeper into them and just share what I learned. So I totally recommend it. And this conference was my first in real-life speaking opportunity as well, and that felt super different from my experience last time doing it virtually, you know, talking about how much I love that sense of community all the time. But it really felt true for me this time around, where I could see the audience react to the things I was saying, like, maybe go off the cuff a little bit. And then yeah, at the end, having people come up to me was really awesome to just talk about pairing, which is what I spoke about, and just share our experiences. And they asked what I thought about some things, and it was really cool to just be able to spread that knowledge around. And one thing I noticed you did a lot was come up to speakers after they wrapped up their talks. You were almost always the first person to get up and congratulate them and just get the ball rolling on following up on the things they talked about. Is that something that you really enjoy doing or find particularly valuable as an audience member or speaker? JOËL: Yes, both. I think, as a speaker, it's really validating to have people come up to you after the talk and either just tell you they liked the talk or ask a question. I generally don't like to do just open questions after a talk from the audience because then you get the classic; this is more of a comment than a question or people who will tell you that you had a typo on one of your code slides. Like, none of that is useful to anyone. So, if you're really interested, come talk to me afterwards. And then that actually makes me feel like my talk connected with people, and people were paying attention, people enjoyed it, people were learning. So I try to pay that forward as well for talks that I listened to, go up to the speaker, and tell them one thing that I appreciated about the talk or a thing that I learned, or something that got me excited in their content. STEPHANIE: Yeah, I'm sure that it's very appreciated. And it also breaks the awkward silence at the end when the speaker finishes and people aren't sure if it's okay for them to get up and start moving around. Yeah, I thought that was a really good way to kind of just encourage people to start chatting with each other and moving into those break times that we mentioned earlier, those opportunities to socialize. JOËL: Another thing that I think is really fun that you can do at in-person conferences, and I know you were doing it a lot, is going to see the talks of friends and colleagues and sitting in the front row and just being there to cheer them on and encourage them. Again, I think that makes a big difference when you are on stage, and you see these people who are your friends and colleagues there to support you. It gives you that boost of confidence. And when you're there in the audience, it's fun to cheer on somebody else. STEPHANIE: Oh yeah. You gave me a lot of thumbs-ups during my talk, and I really appreciated that. [laughs] So I'm curious if there were any talks that stood out to you that you got to see. JOËL: And I was really inspired by your talk, pair programming. I think there are a lot of things that I can take from that to improve the way I pair. I was also inspired by Aji's talk, Aji Slater, on automating manual tasks that you have to do in an iterative way. That one really hit home because, on my current project, I have been doing a lot of manual things. And I just have random snippets of code, like, some shell script lines or Ruby console lines, that I copy-paste out of Slack conversations because I've shared them with other people who are doing similar work. And I realized that a lot of his advice would apply to the work that I'm doing and how that could really make things better. So that was one of those talks I was listening to, and I was like, oh, you know what? Monday morning, when I go back to my project, this is something that I'm going to start doing. This is something I'm going to change in the way I do my day-to-day work. STEPHANIE: Yeah, absolutely. I have so many tasks that I would like to get automated, and think that one day I will magically have more time in my schedule to get to it. But I liked that his talk gave pretty concrete strategies for baking it into your regular, like you said, day-to-day workflow, and that lowers the activation energy to getting them done. And then those things can be iterated on and could eventually become, in an ideal world, a fully-fledged feature that you put together from doing those repetitive tasks. And yeah, they provide a lot of value not just to you but can eventually provide value to your co-workers and then even your users in the future. JOËL: Were there any talks that stood out for you? STEPHANIE: One talk that I really enjoyed was Jenny Shih's about Functional Programming for Fun and Profit. I have attended a lot of functional programming talks within the Ruby realm, at least to try to get a better sense of how it can apply to my work and the languages and paradigms that I use. And honestly, what I liked about it was that it didn't get too in the weeds about functional programming. What she did was provide mental models for understanding the paradigm that I think was a good vehicle for understanding things very generally. And, for me, like,¬¬ a talk, it's really hard to pay attention to lines of code and to read code on the fly while people are presenting. For me, that is just not how I like to consume that information. And so she provided themes and, like I said, those mental models, which I know you really like to use a lot too in teaching people new concepts. For me, I didn't fully learn what a monad was, once again, but at least having that repeated exposure to those foundational aspects, I think, will eventually lead me to be able to grok those things a little more comprehensively the next time I see it or whenever I decide to dig deeper. JOËL: What was a mental model that was shared that connected with you particularly? STEPHANIE: So one of the main mental models that she shared was thinking about a program in terms of these three dimensions: value, behavior, and time. She had a nice slide that showed the difference between the object-oriented paradigm, where value and behavior are contained by objects, where time is kind of inherently wrapped up in those objects that hold information about the state through values and behavior. Whereas in her functional programming example, those three dimensions were a bit separate. And I found that distinction to be really helpful in separating things that felt very implicit before, but it was nice to see them broken out into very clear concepts in terms of building blocks of a program. JOËL: So it's helpful then when thinking...when you look at code, if you can think about it in those three different dimensions to help think about, am I taking a functional or other approach in this particular dimension when working with this code? STEPHANIE: Yeah, exactly. I think it also gave me more of a vocabulary to describe the pros and cons of each and a lens of thinking about which I might want to choose for the particular problem at hand. JOËL: So you mentioned there's a visual for these three dimensions from the slides. Are those slides publicly available? STEPHANIE: They are. I will link to them in the show notes. JOËL: So all of these talks were recorded. They're not yet available to the public, but I think the plan is to publish them on YouTube sometime in the new year, so that means probably January 2023. And a big shout out to the AV team and everyone who is involved in recording these. STEPHANIE: Yeah, I am definitely looking out for a link to my talk so I can send it to my mom. I also wanted to give a little shout-out to the organizers of RubyConf Mini: Jemma Issroff, Emily Samp, and Andy Croll. JOËL: Woo! STEPHANIE: They put on just a really awesome conference, and I feel very grateful that I got a chance to attend with you, Joël. JOËL: It was definitely a delightful experience. STEPHANIE: Delightful. That's a reference to Joël's talk for those of you who are listening. MID-ROLL AD: Debugging errors can be a developer’s worst nightmare...but it doesn’t have to be. Airbrake is an award-winning error monitoring, performance, and deployment tracking tool created by developers for developers that can actually help cut your debugging time in half. So why do developers love Airbrake? It has all of the information that web developers need to monitor their application - including error management, performance insights, and deploy tracking! Airbrake’s debugging tool catches all of your project errors, intelligently groups them, and points you to the issue in the code so you can quickly fix the bug before customers are impacted. In addition to stellar error monitoring, Airbrake’s lightweight APM helps developers to track the performance and availability of their application through metrics like HTTP requests, response times, error occurrences, and user satisfaction. Finally, Airbrake Deploy Tracking helps developers track trends, fix bad deploys, and improve code quality. Since 2008, Airbrake has been a staple in the Ruby community and has grown to cover all major programming languages. Airbrake seamlessly integrates with your favorite apps to include modern features like single sign-on and SDK-based installation. From testing to production, Airbrake notifiers have your back. Your time is valuable, so why waste it combing through logs, waiting for user reports, or retrofitting other tools to monitor your application? You literally have nothing to lose. Head on over to airbrake.io/try/bikeshed to create your FREE developer account today! JOËL: Coming back from the conference, I recently had a really interesting conversation with some other colleagues at thoughtbot. We were looking at a database schema for a new application and talking about some of the trade-offs involved in how that schema is structured, so what tables we want to have. Do we want to have indexes? Things like that. And particularly around some of the assumptions are business rules that would come into play. So we're looking at...we'd drawn out this Entity Relationship Diagram (ERD). In it, we're looking at all the tables, and something that comes up immediately is like, oh, it's possible to have some bad data that could show up in these columns. Or it's possible that this relationship could exist where this table has a foreign key on this table, but really, that should never happen in this particular way of working. And so then the question became, how do we try to prevent these things that currently the schema allows but that are not valid in this particular business domain? Do we want to change the schema somehow and make that stricter or find some way to prevent it? Do we want to add some kind of validation that will check some business rules first before inserting or updating a record? I'm curious, have you ever been in a situation like that where you had to balance those two approaches to enforcing business rules on your database? A classic small example of this is a situation where let's say, you have a users' table and you have a name column on there. And you want to ensure that that name must always be present; all users must have names. Do you try to enforce that via the schema with a NOT NULL constraint? Or maybe you try to enforce that with a validation, maybe a presence validation at the Rails level. Or if you're really into SQL, maybe some fancy trigger, but do it in a validation style rather than trying to force this using the schema. And our particular scenario was a little bit more complex than just one column; it was more to do with associations. But I think this sort of problem shows up even in constraints as small as a required field. STEPHANIE: That's really interesting. I think that, in my experience, when we are spinning up new tables, at that point, we do try to put some intentional thought into what the schema should look like and what requirements we might need to encode at the database level. But things that are more complex might need a little more code, like Ruby code. I have then pushed to an ActiveRecord validation. One thing that I think is important to know is that when you do set those things on the schema, it's harder to change. And so you usually have to feel pretty confident that that's what you want. Otherwise, you'll run into issues later if that does have to change and making changes to whatever existing data you might have. But it's also pretty common to just do your best when you are deciding on a database schema and then having to make adjustments down the line as you know more about your domain. JOËL: This conversation reminds me a little bit of the idea of database normalization. I think that might almost fit as a subset of general tactics of using the schema to ensure your data is more correct. When you are generating new tables, let's say you're creating a greenfield app and you need to create four or five tables; how much emphasis do you put on database normalization when you're initially designing those? STEPHANIE: I think for a greenfield project when you are setting everything up and creating tables for your main domain models, there is an aspect of it that should be considered because you're in this unique position where nothing really is in existence yet. And you do want to try to set yourself up to be successful and hopefully have information about your main use case for this app and can kind of make decisions about the schema then. At least in my experience, that has been part of the conversation, though, to be fair, because it's so early, you do have the opportunity to change things without as much effort or pain. But I think it's worth considering when you're just sitting down and working through what those models are going to look like. JOËL: And for our listeners who may not have heard the term normalization before, it's a series of...you can think of them as rules that you apply to your database design to try to avoid data redundancies in your tables. There are different levels of this; they're typically referred to as normal forms. So you'll see things like first normal form, second normal form, third normal form; those are kind of the fancy terms for them. But they generally involve breaking out other tables so that you don't have data redundancies. And in many ways, this is similar to principles such as the single-responsibility principle that we apply to objects when we're designing our objects in an OO system. But this is more at the table level for databases. STEPHANIE: I do think that it is so hard, maybe even impossible, to plan something out, to not have any of those redundancies, to begin with. And I do think sometimes they are a bit inevitable. But I also have had the experience of having to figure out what the heck I'm looking at when I am querying data and see all these things that are duplicated or maybe slightly different. And yeah, I think when you are in that position of starting a greenfield application, it is really interesting to see how you make those decisions about what needs to be enforced and where. Where did you end up landing, or what did you discuss in this conversation with the co-worker? JOËL: I think we went with a bit of a hybrid approach. Some things, we can use the schema to prevent bad data, and then some things either cannot be represented with a schema, or it's possible, but it's really cumbersome and painful. And so, we chose to try to enforce it with a validation. To me, this feels very similar to a problem in typed languages. So some communities that use a lot of types try to use those types to only allow data to come through that's in a valid shape. And so you'll hear things like make impossible states impossible or make illegal states unrepresentable. And that works for many things, but it's not always possible to enforce all of your business constraints through a schema. Or sometimes it's possible but just not practical. And so, I think there is a balance of finding when you can use the schema or when it's better to use the validation.¬ STEPHANIE: Yeah, I think my general rule of thumb is, like I mentioned earlier, things I feel really confident about that we want to make sure that we have in our database or in our data for sure. I do lean towards requiring those in a schema, and it also communicates that confidence or communicates that intent that it's something that at one point was decided is important. And so, if a future developer comes in, it would take a lot of work for them to write a migration, to remove some database constraint. Whereas I think sometimes validations at the Rails level are potentially a little more open to change and then even more so if you get to validating on the client side. JOËL: That can get to be a really, like, it's a useful tool, but one that you can really hurt yourself with. If you modify your validations at the Rails level or at the front-end level, but then you don't backfill those changes on your data in the database, then you might have records in your database that if you were to load them into memory and hit save on them again, would refuse to save because they no longer match the validations. And on longer-lived applications, I've seen that happen sometimes where not all rows in the database pass the Rails validations. STEPHANIE: Yeah, I think I've seen that be a problem either for developers who then have to backfill that data or write some migration to change some of the data to meet the new requirements, or just unexpected bugs on the users who discover something new but like you said, have been there long enough before those things were implemented. JOËL: The more I think of this, I think maybe constraints that are enforced at a validation level might still require changing the data in your database. So if you had a constraint enforced via a schema, you don't have a choice. You have to write some way to migrate that data so that it fits the new schema. You can kind of lie to yourself with validation and not change the historic data, and sometimes that is the case; you want to keep the old data and only prevent new data from being written in the old format. But if you need consistency, then you probably need a data migration regardless of which approach you take. STEPHANIE: Yeah, that definitely sounds like the more robust way to go about it for sure. JOËL: I have an article that I like to reference a lot by Hillel Wayne on Constructive Versus Predicative Data, which is basically looking at these two general approaches to enforcing data correctness and formalizing them a little bit. So do you try to enforce them based on the construction or the shape of the entity that you're creating, be that a database table, an object, a type, something like that? Or do you enforce it via some kind of predicate? So that could be a validation or other similar logic that runs kind of at runtime to enforce your constraints. STEPHANIE: That's interesting. I hadn't heard of those terms before, but I think they provide a lens through which you can look at the problem. Did the article end up suggesting different strategies for solving that problem, or was it more theoretical in different ways to look at it? JOËL: I think the article does two things. First, like you said, it gives us the words to talk about those approaches. And having those labels now, I start seeing them everywhere. I see them in databases, I see them in objects, I see them when doing types across a variety of languages. So that's already a huge win for me. I think you and I had done an episode a couple of months back where we talked about the value of having labels to put to ideas. And I think for me reading that article gave me those two labels. And all of a sudden, it really helped to make connections that I wasn't seeing before. The second thing that the article does is, I think, explore some of the limitations that each approach has and when you might want to use one versus another. The constructive approach, so using a schema, is more consistent because you know it is impossible for the program to create data that's in the wrong shape. That being said, not all constraints can be represented in a constructive manner, or it might be possible but really cumbersome. Also, sometimes it's not really invalid data; it's just sort of undesirable data. So you might want a looser schema. And let's say that you're storing some kind of intermediate state or some kind of raw input from another system that you might want to layer validations on top of, but you don't want to reject that data out of your database. You want that sort of incomplete or imperfect data in your system. Something that I find myself doing more and more these days when I create new tables is to really lock down the schema as much as possible. I think that might be contrary to maybe the way a lot of people in the community like to work. Some people might prefer to start with a very loose schema with no constraints and then work towards making things stricter as they explore the domain, and that's kind of the default that Rails has. If you're creating a new table, all columns, for example, are nullable by default. Personally, I will put a null false on every column and every migration that I make unless somebody can make a convincing case otherwise, and even then, I might try to think of is there any possible way that we could avoid that scenario and put that null false. Part of the reason for that is that it is much easier to loosen constraints on existing data than to tighten them afterwards. So if I have a column where no value is allowed to be null, and then later on we decide, you know what? It is okay for some of them to be null, I can change the requirement on that column, and I don't need to make any changes to the existing data. It just works. If the reverse happens, if I have a column that allows a bunch of nulls and then I want to make that column required, now I have to go and find a way to backfill all the empty spots in that column. And that could be a very challenging process. It might even be impossible. There might be some values there that it's just like, the user did not supply them at the time because we didn't ask for them. And now there's nothing we can put in there. So do you put in, like, unknown or not available? Then you have to ask yourself some really difficult questions about your data. STEPHANIE: Yeah, absolutely. I think I agree with you there. Another thing I like to do is provide default values for columns, especially ones where they can't be null, because, like you were saying, that helps me have a better understanding of just what is going on in the database. An issue I have seen come up involves a Boolean column where if a default value of false, for example, if that's what we're going with, is not encoded in the schema, you end up with potentially three values for a Boolean, which would be true, false, and null, and that I think has been -- JOËL: The infamous three-state Boolean. STEPHANIE: Yeah, exactly, the three-state problem, which is just inherently contradictory to what a Boolean is, to begin with. And I've definitely run into issues with that where you have to decide, or figure out, or write code to determine is null false? Is that what we mean here? It's not clear. But if you, like you said, locked it down at the beginning, provided those default values, that puts in those guardrails to prevent things from getting out of hand. JOËL: It also makes it easier for users of your database, application, whatever to interact with your code. I've run into this a lot when working with GraphQL APIs. And the default in many GraphQL server implementations is to make all fields nullable by default. When you build your schema, you have to add some extra things there to say, "This field is non-nullable," which means that a client that's now consuming it, anytime they deal with the data they need to check, is it present or not? You can't have the confidence that that data is there. And so it can force a lot of extra checks on the client. Or I guess you could just take it on faith and hope nothing breaks. STEPHANIE: Yeah, it's funny you mention that because I definitely think there's like spheres of impact. So as a developer, you maybe start having to write code that checks those kinds of things, like if it's null or not in your code. Then that can even extend to, like you said, your users or consumers of the API, who then have to contend with data that they have no control over. And I've been there too, and that can be frustrating as well. JOËL: We've talked a lot about data correctness and different ways to achieve it, different strategies. Why is this something that we care so much about? STEPHANIE: I think data correctness is really important from a developer experience perspective. And it's way easier to fix a bug in your code than it is to wrangle a lot of accumulated bad data. JOËL: Yeah, sometimes bad data is not fixable at all, and those are situations where you have a really bad day as a developer. STEPHANIE: Agreed. JOËL: Well, on that note, shall we wrap up? STEPHANIE: Let's wrap up. Show notes for this episode can be found at bikeshed.fm. JOËL: This show has been produced and edited by Mandy Moore. STEPHANIE: If you enjoyed listening, one really easy way to support the show is to leave us a quick rating or even a review in iTunes. It really helps other folks find the show. JOËL: If you have any feedback for this or any of our other episodes, you can reach us @_bikeshed, or you can reach me @joelquen on Twitter. STEPHANIE: Or reach both of us at [email protected] via email. JOËL: Thanks so much for listening to The Bike Shed, and we'll see you next week. ALL: Byeeeeeeee!!!!!!! ANNOUNCER: This podcast was brought to you by thoughtbot. thoughtbot is your expert design and development partner. Let's make your product and team a success.Sponsored By:Airbrake: Deploy fearlessly and fix bugs faster with Airbrake Error & Performance Monitoring. Airbrake notifiers are available for all major programming languages and frameworks, and install in minutes, with an open-source SDK-based install and near-zero technical debt. Spend less time tracking down bugs and more time developing. Visit Frictionless error monitoring and performance insight for your app stack.Support The Bike Shed