399: Data Munging
CodePen Radio - A podcast by CodePen Blog
Categories:
There was a small problem in our database. Some JSON data we kept in a column would sometimes have a string instead of an integer. Like {"tabSize": "5"} instead of {"tabSize": 5} of the like. Investigation on how that happened was just silly stuff like not calling parseInt on a value as it came off a <select> element in the DOM. This problem never surfaced because our Rails app just papered over it. But we're moving our code to Go in when you parse JSON in Go, the struct type that you parse it out into needs to match those types perfectly, or else it panics. We had found that our Go code was working around this in all sorts of ways that felt sloppy and inconsistent. One way to fix this? Fix any bad data going into the DB, then write a script to fix all the data in the DB. This is exactly the approach I took at first, and it would have absolutely fixed this problem. But Alex took a step back and looked at the problem a bit wider, and we ended up building some tools that helped us solve this problem, and solve future problems related to this. For one, we built a more permission JSON parser that would not panic on something as easy to fix as a string-as-int problem. This worked by way of some Go reflection that could tell what types the data was supposed to be and coerce them if possible. But what should the value fall back to if it's not savable? That was another tool we built to set the default values of Go structs to be potentially other values than what the defaults for their types are. And since this is all in the realm of data validation, we built another tool to validate the data in Go structs against constraints, so we can always keep the data they contain good. Once all these tools were in place, the new script to fix the data was much easier to write. Just call the safe JSON function to fix the data and put it back. And the result is a cleaned up code base and tools we can use for data safety for the long term. Time Jumps Sponsor: Split This podcast is powered by Split. The Feature Management & Experimentation Platform that reimagines software delivery. By attaching insightful data to feature flags, Split frees you to quickly deploy, measure, and learn the impact of every feature you release. So you can safely deliver features up to 50 times faster and exhale. What a Release. Start raising feature flags (and lowering stress). Visit Split.io/CodePen for a free trial.