Categories
Engineering Management

How to Deal With a Seismic Change in Strategy

This is a quick story of how I led my department through a massive, rapid, and uncomfortable change in direction. View on Plato

Problem

My department was organized across geographies and parts of the journey of transportation. We had a team in North America focused on the North American transportation network and a team in Berlin, Germany, focused on the EU and UK transportation network. Within these two, we had teams that were working on the first, middle, and last-mile part of the journey. The teams were relatively independent and each was looking at their section of the problem space. Coming into 2020, our leadership team decided to eliminate the organizational structure by geography and miles. They decided to have one transportation team for both North America and Europe and instead of being organized by miles, to organize teams by functions (for example, Planning, Drivers, or Warehousing Technology). However, not only did the leadership team announce this seismic change in strategy, but they also introduced an initiative to revamp our entire transportation network by the peak holiday season of 2020. My fellow leaders and I were left to figure out how we could respond to all these unfolding challenges.
 

Actions taken

This significant strategic change was initiated by a series of reorgs that had to be executed at full tilt.
 

Reorg One

There was a team focused on middle-mile warehousing technology for North America, and another one focused on last-mile warehousing technology also for North America. Furthermore, there was a team in Europe responsible for warehouse technology, and we had to figure out how to consolidate all of these. It was decided that the team in Europe would be responsible for global warehousing technology that left us with extra 12 engineers here in North America. I had to find an adequate engagement for them while also managing through the transition.
 

Reorg Two

The second reorg was a bit more subtle since we were not restructuring teams but changing their focus and ownership. They had to change their roadmaps to incorporate the new global strategy that was disruptive and felt disempowering. Their past effort of developing these roadmaps was suddenly overwritten by leadership.
 

A month into the reorgs, the Covid-19 pandemic hit, and all of a sudden we were thrust into working remotely without having a solid plan for executing against the soon-approaching deadline. In other, more regular, circumstances, we would all come together for design sprints dissecting what we would try to achieve and how. Because we were all thrown remote unexpectedly, this turned out to be a communication challenge as well.
 

Finally, the plan itself was rather aggressive. It implied rebuilding top-to-bottom everything about our system and attempting to roll that out within nine months.
 

Reorgs — What to do

We had to be fully transparent to the team involved. I brought the team together and shared with them what was happening. It was important to impart to the team Why this was happening, Why it mattered, Why this is the right thing to do for both the business and the team and help them get bought into that.
 

Also, we had to make sure that we were able to bridge the gap between where we were and where we would like to get. That meant doing two things. First off, we had to ensure the transition of ownership of what the team was building to the team in Europe. That included compiling a list of all the things that this team owned and ensuring a smooth transfer of ownership over to the other team while maintaining continuity of service for our users.
 

We also had to continue running a legacy system and a new system in parallel for at least some time. Therefore, we needed to build some type of bridge solution that would allow us to operate using both of those in parallel. To do so, I repurposed the warehousing technology team to build this bridge-layer solution that I called the transportation interface. I wrote up a team charter, set prioritization for the team and got them bought into it. That went well, the team was excited about new opportunities but I had to shrink the team a bit. I found other teams within my department that had a strong need for engineers and I was able to identify engineers that I could move over without sacrificing what we needed to do on this interface.
 

As for the other reorg, revisiting roadmaps was done on a case by case basis. I would sit with each team and discuss how these changes impacted their existing roadmap, how we could leverage what they had already done, how they should proceed with the planning, etc. Since they already completed most of their planning, we looked at their objectives with an intent to make them global instead of having them be focused on North America. That required a lot of talking with our European partners and learning about things they were previously solely responsible for and knowledgeable about. We established a close relationship with people within our company that we never worked closely with before and better understood their problem space. This intensive exchange allowed us to craft roadmaps that were encompassing of global technology and allowed us to develop a holistic approach to the problem.
 

Changing the strategy
Considering the significance of the announced change, we made sure to implement our activities without boiling the ocean. We had to rebuild our entire technology stack and re-architect everything in an exceedingly short amount of time. Also, we had to be iterative and break off work into small pieces. The bridge solution allowed us to run legacy and new in parallel and launch a single warehouse. Launching warehouse technology in our pilot facility helped us learn what we needed to learn before bringing it to the rest of the network.
 

That, too, enabled us to be methodical and roll it out over time since we would have the bridge solution in place to facilitate the change. We also had to bring together senior engineers and leads from all groups to look at the architecture and the architectural vision as a whole and provide feedback. That was something we tried to do early on and then continuously as our vision evolved.
 

Lessons learned

  • Leverage positive momentum. While we were trying to figure out how to execute the new strategy, we kept working and adapting to the change in the roadmap. We found some vital pieces that we could include in the new roadmap. For example, one of my teams was focused on building a platform for managing truckloads and we were able to use that platform and make it central to some of our new system designs. That gave us a real leg up and allowed us to build on top of the already existing framework.
  • Simplify the scope wherever possible. We wouldn’t have been able to deliver if we hadn’t cut some things down. We could only complete scaled-down versions of the final solution — it would allow us to test our hypotheses and deliver something tangible instead of looking two years ahead.
  • The Covid-19 pandemic made us suddenly transition to working remotely and we were not entirely prepared for that. We should have cleared everyone’s schedule, done a design sprint and gotten everybody together upfront to align on things before going forward. Instead, we ended up having ad hoc 30/60/90 minutes calls over several months to get aligned and figure out how we would fit all the pieces together. In hindsight, I wish I had spent more time up front doing the design sprint.
  • Make sure that the team feels bought into the changes and understands the Why. By no means should the team feel ordered around for no good reason.
Categories
Engineering Management

Reversing the Tide of Application Support

This is a quick story of recognizing and attacking a problem of ballooning application support in my department. View on Plato

Problem

We had a process on my team for dealing with critical user issues. Support would enter a ticket describing the problem, and we would have a rotation of engineers responsible for addressing these problems as they would come in.
 

When we started, we had one engineer on a part-time basis, and a few years later, we had three full-time engineers dealing with support issues with no end in sight to the proliferation of new issues.
 

I identified three main problems:

  • The high cost of Support.
    These three engineers were doing something that was not valuable to the business.
  • Employee satisfaction.
    Being on a rotation would make engineers reasonably unhappy. I dared to assume that there was a risk of retention every time an engineer was on a critical rotation.
  • Negative customer outcome.
    I assumed that these issues were tied to lost items or real delays in customer deliveries.
     

Actions taken

Before I could delve into solving burdensome technical support, I needed validation for the three problems mentioned above.
 

Negative customer outcomes
I did a sampling of the orders that had issues reported on them over several months and compared them across cancellation rate, the number of days to deliver, and a couple of other metrics. I found out that the delivery time was four days longer, and the cancellation rate was around 50 percent higher on those orders.
 

Employee satisfaction

I sent out a survey to all the developers on my team, asking them to gauge job satisfaction when they are on criticals. I did some back-of-the-napkin math on risk attrition and it showcased that every critical rotation entailed a 10 percent increase in the risk of attrition.
 

The high cost of Support

This assumption was the most tangible and thus most straightforward to prove empirically. I was able to show the exact amount of money we were wasting on paying three full-time engineers by multiplying their median salaries and adding opportunity costs.
 

I put together a one-page document outlining my findings and brought it to my Director of Engineering. He was convinced with no effort and encouraged me to take my old team and go solve this problem next quarter. I was excited to share the news with my team, whom I explained what I thought would be the best way to go after this problem.
 

First off, we needed to classify the issues coming in. I assumed that — following the Pareto principle — 80 percent of all issues were caused by 20 percent of bugs. This turned out to be true. We started by having someone manually classify the issues that allowed us to figure out the biggest root causes that accounted for the most issues. We were able to put together a roadmap to address these root causes. But, I didn’t feel it would be enough because we were going to get back in the same situation unless we address the process itself.
 

I couldn’t help but notice an overall lack of accountability. Engineers within that department were three to four times a year on the rotation, but not in succession, and were not incentivized to drive changes in the underlying products that were causing the issues in the first place.
 

I envisioned a centralized triage that would take the incoming tickets and then route them to the team responsible for the products causing these issues rather than having all the departments pull in for a standard rotation. I came up with an automated ticket routing system where users would fill out the information about their location and the problem they were dealing with, the tool that was impacted, and some general personal information. Using that information we were able to implement a rules engine on top of it and automatically route most issues that were coming in to the teams responsible for them.
 

The burden on the centralized rotation was thus reduced and it would take only one person to spend an hour a day at most double-checking or dealing with anything that the automation wouldn’t pick up. The newly established process increased accountability of individual teams as they could see the common problems coming to them and prioritize addressing root causes.
 

After building out the tools to do that automated ticket routing and classification, we had to present the new process and tools and get the department bought into the changes. Getting them on board was rather easy since most people were burnt out on the way things were operating.
 

Finally, we switched to the new process. We continued to measure over time if we would be able to keep a single person on rotation part-time and if the overall volume of critical tickets was reduced. Indeed, that is what happened — the number of tickets dropped from 400 tickets a week to 50 over the course of six months.
 

Lessons learned

  • Great ideas can come from anywhere in the organization. My organization is largely feature-driven — the business proliferates a lot of ideas and there are never enough people to do all of the feature work. But if you have a good idea that you could justify, you could drive your initiatives from anywhere in the organization.
  • Don’t sit idle when you see the problem. Even if you can’t address it on your own, get started and try to rally other people around it.
  • Problems are more likely to be dealt with when there’s accountability. If there is none, make sure to introduce it. Let data drive your decisions.
Categories
Engineering Management

Building a Culture of Automated Testing

This is a quick story of taking my team on a journey of enlightenment to automated testing. View story on Plato

Problem

As we were rapidly growing, it became increasingly hard to work with the codebase because of a lack of automated tests. The problem was that most of the engineers didn’t understand the value of automated testing and were resistant to the change. They hadn’t seen it before in practice and it took significant effort to buy them into it and build a culture that would support that practice.
 

Actions taken

To start with, I developed a visible measurement of where we were by creating a page that would display for each team the number of tests that were being written every sprint and setting a target of having 50 percent of our commits accompanied by automated testing. This would make things more visible and get people thinking about the problem.
 

We also celebrated people that were writing the automated tests. In all-hands, we would praise and talk about systems and teams that had been doing well in terms of having automated tests. I identified a testing champion on one of my teams who didn’t need any convincing at all and embraced the testing unquestionably. Their advocacy on the ground was persuasive and they were able to show the value to their peers in real-time. Our testing champion came up with another thing that I ended up incorporating into my all-hands — the test of the week contest.
 

We would have engineers submit their automated tests and then before a meeting, everybody would vote for their favorite one based on a handful of criteria. I would announce the winner in all-hands and they would be rewarded with $10. In addition, anytime I saw an opportunity to showcase a concrete benefit of testing, I would communicate it broadly. For example, “This test prevented us from having a massive outage” or “This test showed us right away where we were missing key acceptance criteria.” I would send out an email or announce it at all-hands explaining how things could blow up if our automated tests hadn’t caught it before it reached production or before we left for the weekend.
 

Over the course of about a year, we continued with celebrating small wins and acknowledging our champions’ effort. By the end of the year, almost every change that went out was accompanied by automated tests. It became a habituated practice that engineers didn’t question anymore. Product managers also understood the value and weren’t pushing back against us. Through all of those things, we were able to change the culture and have the team genuinely buy into automated tests.
 

Lessons learned

  • When you are trying to change your culture, it is vital to celebrate positive momentum. Celebrate broadly anything that aligns with the change that you’re trying to make.
  • Finding a champion or champions on your team that would buy into the vision could help efficiently advocate for the change.
  • Try to show the value wherever possible. People should buy into it for the intrinsic value, not just because they’re being asked to or being rewarded for it.
Categories
Software Stories

Enterprise Software Integrations – Part I: Buying It!

Buying third-party software rarely goes exactly as planned. Most of us aren’t trained for it, I certainly never took a class in college on how-to buy software, yet many of us may find ourselves involved in a software purchase. This is a story from 2014 about integrating enterprise software while I was at Wayfair, taking you through the journey of procurement, MVP, expansion, and long-term support.

What do Wayfair Transportation engineers do? What are these integrations you need?

So Wayfair is a website that sells furniture right? What’s so hard about that?

They sell furniture yes, but let me ask you a quick question, when was the last time UPS dropped a brand new vanity off on your front door? …Probably never. The truth is, shipping what’s called “large parcel” goods across a large geography like the United States is incredibly complicated. 

One complicated component of shipping large goods is in the “home delivery” segment of the journey. If you’ve ever bought large furniture, you probably either A) rented your own truck, grabbed some friends, and slowly struggled to get it into your home or B) you had a company deliver it for you and they probably had to bring it into your home and put it in a room and maybe even help assemble it. If it was option B you also probably had to schedule a date and time window of when that delivery would be made and you had to be home. Maybe they were late and you missed them or they broke your lamp installing the furniture and told you tough luck. When you think about all that goes into delivering furniture: reservation dates, route planning accounting for time to bring things into people’s homes, managing the fleet of delivery trucks, customer experience, you find yourself with a very complex problem space. 

In the past Wayfair outsourced this problem to partners who ran the last-mile delivery operations for us, but the problem was they weren’t very good at it. Customer satisfaction was low, there was almost no visibility into how efficient the route planning was (it wasn’t) and so they spent more money than necessary, and had no ability to schedule deliveries in advance (couldn’t ensure these delivery partners could fulfill on customer promises). It was clear that the right business decision was to bring these operations in-house. That decision was made and so the task of supporting it was brought to my team in engineering. The problem: managing all of these operations requires fairly sophisticated software, AND…Wayfair had already signed the lease on its first building.

The Build vs. Buy Conundrum

Wayfair has a large, incredibly talented team of engineers, in fact technology is a competitive advantage for the company. Sometimes though, what we want to do is incredibly ambitious, and we gain a huge advantage by getting it out the door quickly. As we looked over the laundry list of features that needed to exist to make this delivery operation work (optimal route planning/fleet management/delivery reservations) we realized that it would be a monumental effort to build this all ourselves. Our unfamiliar team would need months if not years to build it all to scale. So we figured out which parts made sense to build ourselves and for the rest, somewhat reluctantly, looked to the software market for help.

What do we need?

After our product managers did some research, we ended up looking at four vendors. Two of them were eliminated right away after a couple intro calls, as we realized they didn’t fulfill all of our cases. So, we had driven down to two finalists:

            Descartes Systems Group: The big guy, one of the leaders in the space with a mature product and several large, well known clients. Very enterprisey, ugly-looking software but seemed to do the job. More expensive. Less leverage in negotiations

            DispatchTrack: A small start-up in the space. New flashy UI/UX, but not as tried and true. Cheaper, looking for their first big client and highly willing to negotiate.

This was my first integration project as a tech-lead. Suddenly I had to evaluate these vendors and help choose the right one. I was nervous. I spent the days leading up to these demos researching “questions to ask when buying software”:

  • What APIs exist 
  • How do I use them (SOAP, REST, FTP)
  • What does their documentation cover
  • API response time
  • Cloud or on-prem
  • How many concurrent connections can it handle
  • What environments exist (dev/sandbox/QA) 
  • What redundancy exists in their platform 
  • Have they tested failover procedures regularly
  • How often is the system patched
  • How long does it go down for during updates

Descartes was kind enough to give us a 300-page pdf, documenting all of their APIs, for my enjoyment. I printed it out and read through the whole thing so I had a good starting place with them. I was determined not to miss anything. Of course, hindsight… I would’ve asked some other questions, not just of the vendors, but of ourselves:

  • What can we and what can we not configure in the system? What will we need to configure?
  • Can we access the database directly? Will we need to?
  • Can you do a demo of a high-load? How much volume will we need to pump through this?

We started evaluating. Having several day-long presentations is rough, there’s so much material to cover and of course we don’t want to miss anything. Meanwhile the sales folks from these vendors want to make sure we see all of the “cool stuff” their products can do, eschewing conversation on some of the important topics like security and resiliency. We had to be sure we got all of the information we needed for making a decision.

How much does it cost?

There are two big categories of cost in buying software. The obvious, visible cost of the software itself, and the hidden cost of your own teams integrating it, which also drives the time-to-market. There’s a common misconception that when you buy software, you just “plug it into your system” and everything’s good to go. I wish it were that easy. The cost of buying a harder-to-integrate piece of software is engineering’s responsibility to surface.

When it came to this delivery product, we already had a ton of custom software Wayfair had built over the years. Integrating a new product into all of these existing complex systems and workflows would be no easy task. When evaluating the cost of integration, we had to be thorough:

  • What documentation is available and how robust is it?
  • What is the breadth of available APIs?
  • What communication methods can we use?
  • How much implementation support will the vendor provide?
  • What environments are available to develop in?

The list goes on, but these all ultimately determine how easy a vendor is to work with. 

From the answers we got, we had a strong sense that we would be forced to code our way around DispatchTrack’s system in order to get our desired behavior. Despite DispatchTrack being significantly cheaper, we estimated that we could get an MVP out into the wild 2-3 months quicker with Descartes, and the cost of delay drastically outweighed the initial software savings. We successfully made this case and went with Descartes!

So we’ve bought some software

The hard part is over, we’ve gone through the grueling evaluation and decision-making process and finally have our shiny new software. Job well done. Now, it’s time to figure out how to integrate the product. The first rule, how do we make contact with reality as quickly as possible! 

Thanks for reading! Part two will be all about building an “MVP” and how we managed business’ expectations.

Categories
Engineering Management

Technology and Innovation Panel

In January 2019 I spoke to an audience of over a thousand students and retail industry veterans on Technology and Innovation, and my role as a Software leader.

Categories
Engineering Management

Setting a Code Review Culture

Doing code reviews is crucial for any successful engineering team, but without being given care it is easy for the process to devolve into one of stress and hurt feelings. As an engineering leader it’s well worth your time to encourage a culture of respect and professionalism towards code reviews. Outlined below is a way to frame this to your engineering team.

Treat each other with respect

Bringing respect to code reviews means being thoughtful and empathetic on both ends, reviewer and reviewee.

  • As a reviewee- you should assume good intent on the part of your reviewer. The reviewer wants to help you make the best decisions for our team and our code base. They have taken time away from their own priorities to contribute to the improvement of the team’s systems. Your initial reaction to comments may be “this is a waste of time, this is nitpicky and not a big deal, this is dumb”, etc. but challenge yourself to take a step back and see your reviewer not as an adversary, but as a partner in the effort to make smart engineering decisions. Everyone who is reviewing code has a chance to add context, provide knowledge, and illuminate risks, and should never be immediately disregarded.
  • As a reviewer- you should assume the reviewee is not stupid. Sometimes simple mistakes are made, anyone who says they’ve never made a simple mistake is either lying to themselves or has never written any substantial code. You should never treat a mistake with condescension or disdain. Additionally, you should recognize that in this profession there is rarely a single right answer on how to do something. Building systems is a series of tradeoffs – time, complexity, performance, cleanliness, etc. Your immediate reaction to seeing a review may be “that’s bad that’s not how I would have done this”. Challenge yourself to take a step back and consider the tradeoffs the reviewee may have made, and consider if in their shoes you can clearly say that it was the wrong decision. 

So what’s this mean?

  • As a reviewee-
    • Don’t be a jerk – if someone leaves a comment you don’t agree with, consider it an opportunity for a discussion, to learn from and build stronger ties to your peers.
    • Don’t just drop someone’s comments. Give the reviewer the respect they deserve and not only think critically about the comments they’ve provided you, but also talk to them and learn why they wrote them. If you’re totally positive this is not a comment that needs to be resolved, reply and clearly explain why. Admit that you could be wrong and welcome a discussion if they want one prior to dropping.
  • As a reviewer-
    • Don’t be a jerk – If there’s time to write a comment then there’s time to make sure the comment isn’t antagonistic
    • Ask probing questions without an ego. Assume the reviewee has thought about the problem at least as much as you have.
    • Offer your availability to talk things through. – Don’t just leave some comments and run, treat leaving a comment as an invitation for a discussion. You are not an all-knowing code wizard, and just as a reviewee might make mistakes, you might make mistakes in your review.

Some tactical suggestions

  • As a reviewee-
    • Provide a good overview that contains context on what the patch is for, and if there is additional context e.g. “this is a quick scrappy MVP that we don’t plan to use long-term“ or “this is the new architecture pattern we are trying out to see how well it works”, etc
    • Make your code understandable – many miscommunications happen because it’s not clear what the code is doing, be careful about unnecessary complexity.
    • If what you’re doing is so complex that it’s hard to understand if you didn’t write it, then add descriptive, well-written comments to your code
  • As a reviewer-
    • Don’t leave comments that are simple assertions e.g. “use foo.go() not foo.start()”. Your comments should be teaching opportunities e.g. “Based on what you’re trying to do here, to make foo go, you might want to use foo.go() instead of foo.start(), foo.start() does not do the {go subroutine} which based on the overview it seems you want”
    • Point to good examples to help support better understanding. 
    • Try not to harp too much on code style (whitespace, newlines indentation).

On a final note, building a culture is hard. To get this to stick you should reinforce the message in meetings with your team. Encourage every engineer to call out instances of disrespectful review behavior.