Skip to main content
Podcast

How to Build the Right Data Workflow with Blake Burch (Shipyard)

How-to-Build-the-Right-Data-Workflow-with-Blake-Burch

There’s a lot of monotonous development work that goes into building solutions. 

Blake Burch wants to relieve some of that burden for developers and help them waste less time on unnecessary tasks. 

Blake co-founded Shipyard, a data orchestration platform, to give data engineers the tools to quickly launch, monitor, and share resilient data workflows. 

He talked to us about the importance of inter-tool communication, how Shipyard is working to improve the developer experience, and why you shouldn’t focus on your infrastructure.

Show Topics

  • Get rid of monotonous development work
  • Run compatible code
  • Be prepared to meet your client’s needs
  • Don’t separate out your tools
  • Let your tools talk to each other
  • Improve the developer experience 
  • Deal with new tools
  • Consolidate tools
  • Prevent data leaks
  • Focus on the problem you want to solve

Show Links 

Listen to the podcast

Watch a video clip

Key Takeaways

14:06 – Get rid of monotonous development work

Shipyard’s goal is to build low-code tools that minimize the amount of development work you need to build a solution.

“There’s a lot of monotonous development work that goes into building some of these solutions. There’s tons of hours wasted on just being able to connect to a different service’s API, to get some function running, or to be able to get your data downloaded from a particular database, whether it’s something like Snowflake or BigQuery. Out of the gate, we build out a lot of low-code blueprints, which are doing very specific actions against a vendor. So for example, if you want to work with Snowflake, we have blueprints to be able to download the data using a sequel query as a CSV. You can also upload a CSV to a table with it. You can execute a query directly against the database as well to do any data management-type queries. So that’s one example. For another service, something like Google Sheets, you can upload a CSV to the sheet. You can download a sheet as the CSV. You can clear the entire sheet to try and keep things up to date in different services.”

16:12 – Run compatible code

Shipyard lets you combine low-code blueprints with your own scripts to build the exact program you need.

“For us, it’s very much plug-and-play of being able to download data from one service, send it to another service. But as a practitioner myself, I realize that can’t do everything for you. That gives you 80% of the way there. So while we have these blueprints, we make sure that you can run your own code as well. So all those marketing solutions that I talked to you about, all those were highly custom to our naming structures and the way that a client’s feed was actually formatted. So it wasn’t something that we were able to always reuse all the time. So it might be something where we were creating custom Python scripts. And the beauty of what you have on Shipyard is that you can combine the low-code blueprints with your own scripts. And if you have a script that you need to repeat continuously, you can turn that into a low code blueprint as well so that someone else can use it without having to know a lick of code. That’s really how things are structured on our side. We don’t take a marketing-specific approach, even though that’s what our background was. It’s very much about how do you take these generic things that you might want to build with your data and break them down into their core components that you can string together to accomplish some sort of larger solution that you envision for your organization.”

18:22 – Be prepared to meet your client’s needs

People often come to Shipyard looking for a simple A to B solution, but Shipyard’s goal is to help them orchestrate all their data.

“When it comes to data orchestration, we find oftentimes that people that come through our doors aren’t necessarily looking for data orchestration, they are looking for, ‘How do I get service A connected to service B?’ But if you are just an A to B solution, that’s all you’re ever going to be able to do for the client. Our goal is we’ll bring them in the door with A to B, but then they can connect B to C, and C to D, and D to E. And all of a sudden they have a larger workflow that actually has every touchpoint of the data connected together, so if something breaks they know about it immediately and C, D, and E don’t actually run, someone gets an alert, they know that something broke along the way and they have a little bit more sustainability and reliability in how their data is deployed.”

19:32 – Don’t separate out your tools

With low-code tools you often run into limitations. Shipyard wants to write low-code tools that integrate easily with any company’s in-house code so those limitations no longer apply.

“Inevitably with most low code tools, you run into some sort of issue where it can’t do quite what I need it to do. And as an engineer, what that means is, ‘Okay, well, I still have to get this project done. So now I have to write a script that runs on its own to subsidize what the low code tool couldn’t do.’ So, our theory is why not bring them together? You could write your own script to ingest data, or you could use someone like Fivetran or Stitch. We don’t care, but you can actively connect that to running your DBT jobs. They could run on DBT cloud. You could be running it on DBT core, which just uses Python. It’s really more of we’ll get you most of the way there with an A to B, but inevitably when you run into that situation, build it all together rather than having to deal with some situation that forces you to maintain scripts that live separately from other processes.”

22:38 – Let your tools talk to each other

If your tools aren’t talking to each other, you’re going to have problems. Let your tools talk to each other so you can address problems quickly.

“When you have all these tools and they aren’t talking to each other, that’s how you get into the issue where you have 10 tools running scheduled systems, and you don’t know what ran when, which potentially took longer that screwed up some other tool that delivered bad data that the business user found and told your team about a week later. Now they’re mad because you didn’t find it. That’s what we’re trying to solve for. You have all these tools. Link them all together so you know what broke when, you can deal with it immediately, you can send proactive alerts so that people internally know, ‘Hey, we’re aware this data broke and it affects these things.’ That’s our play in this space. As there’s more tools, let’s connect them all together and make it super seamless to be able to see end-to-end how your A to B solutions are connected all together.”

23:34 – Improve the developer experience 

Shipyard wants anyone who is using its platform to have a great experience, regardless of their needs or developer experience.

“Number two focus for us is just improving the developer experience in general. And when I say developer, it could be citizen developer, anyone that’s actively using the platform. Our goal is to try and make sure that we have the right power features in place that you can do really unique type workflow. So one thing we just released was web hook parameters, which allows you to basically take data from another tool and react to it. So if someone clicks a button in Slack, being able to specifically update the message where the button was clicked or being able to manage your project management services. So when someone kicks off a project and puts it into a new field that they’re finished with it, it might be able to knock it over to some other team so that they can do Q&A or something like that. But the possibilities become endless with stuff like that. But we’re trying to make sure we have those power features, but they’re really simple to use and that the interface remains snappy, friendly, and that you’re able to develop within the platform itself at all times.”

25:59 – Deal with new tools

It’s harder to create blueprints for new tools than it is for old tools because new tools don’t have as many use cases.

“I think the hardest part about it from the partnership perspective is that every tool is brand new and nobody knows exactly what to do with it and how to connect them together. So the established players, it’s really easy to build out blueprints for because people have a lot of existing use cases. They’re tried and true technologies that people have been using for a while. For some of the younger technologies, it’s a little bit scrappier to try to figure out how do you make something that’s useful that you could integrate into someone else’s workflow who is actively using this that I might be able to bring in, because in a few instances it’s not necessarily our customers that are actively asking for some of the integrations, although that definitely happens, it’s us trying to proactively say, ‘Hey, how can we help you, this new tool, be able to help your customers connect what they’re doing in your tool to every other tool in the data stack?’ It’s a win-win situation there, but coming up with the right use cases, figuring out how you effectively articulate that, how you build something that is inherently useful for the less technical users of those platforms. That’s really where the trickiness lies.”

28:36 – Consolidate tools

There are a lot of great tools in the developer space, but Blake thinks in the future many tools will be consolidated.

“We’re seeing a fragmentation in the space, because every data practitioner has said, ‘Oh, I dealt with this very specific problem in the data space. I’m going to build a tool around it.’ And there are very powerful tools that excel at one very specific thing. But you end up in the situation where we were talking about earlier, where now you have five to tens of data tools to manage internally. I don’t know how long that’s going to last before we start seeing an inevitable consolidation in terms of what these tools can do. A really good example is that there’s the idea of these reverse ETL tools. If you’ve ever heard of them. An ETL tool gets data from a SaaS tool into your database, reverse ETL gets it out of your database and into the SaaS tool. But that’s just hanging out on its own right now, and I really feel like these ingestion providers are probably going to provide a holistic view that allows you to both load data in and out of the data warehouse. That’s a natural consolidation that I see happening over the next two to three years.”

43:54 – Prevent data leaks

If you want to keep your personal data safe, try to keep things like your email address as private as possible.

“A lot of them boil down to just trying to make sure that each vendor that you provide your data to, that you try and keep it as unique as possible, because the problem is when there’s these data leaks or if people are actively payment processors or taking your credit card data. I remember back in the day, I don’t know if this vendor still exists, but Avast, which was a free anti-virus software, was selling all of their customer’s data about what sites they were visiting and stuff. I don’t remember what the name of the company was that they sold it to. But that was one of those things that creeped me out, or things like smart TVs taking pictures every five seconds and sending it to a server so they could map up what shows people were watching. That was happening like five years ago. It’s probably worse now. But all those things, they ultimately get sold on a brokerage to where a marketing agency or a company might buy it and try and map it up to their data. The best way that you can protect yourself is trying to make sure that that data can’t be mapped up. So certain things like emails, lots of companies when they’re doing advertisements are trying to upload their list of customer emails and matching it to Google or Facebook or anyone else to better target those individuals. If you have a unique email for every service that you use, it’s something where they can’t map that up.”

54:18 – Focus on the problem you want to solve

There’s too much focus on tooling and infrastructure and not enough focus on solving the problem at hand.

“There’s too much focus on getting the infrastructure right and getting the right tooling and making sure that everything is clean, and not enough focus on what is the business problem that you’re actively trying to solve. I don’t need all my data to be set up in it, for it all to be in a clean state. I have a specific team that needs a specific piece of data so they can do a specific action. That’s what I should be focusing on. And I think a lot of people get lost in making the process right rather than actually figuring out how do I make this viable? How do I drive an ROI from this data? And how do I make sure that what I’m doing is actually worthwhile for the business?”


Get the Field Notes

Weekly learnings from working on B2B content & SEO for dozens of companies.

David Khim

David is co-founder and CEO of Omniscient Digital. He previously served as head of growth at People.ai and Fishtown Analytics, and before that was growth product manager at HubSpot where he worked on new user acquisition initiatives to scale the product-led go-to-market.