Shaping a Machine Learning Team

You may or may not remember this, but there once existed such a time that everyone talked about something completely different than the novel coronavirus, its impact on businesses everywhere and the global economy.

In this fabled period, eons ago, companies everywhere struggled to adapt to a new reality, a reality in which all that data they had collected during the “Big Data” phase of the industry would actually be used for something. Some of them are still working on this.

At Strongbytes we’ve worked on this problem for quite some time, helping our partners gather data scattered all around their infrastructure, analyze and display it using a variety of dashboards, and of course, use machine learning to forecast everything from the price of energy in a certain market, to how much money a company will have in the bank in six months’ time, to how many attendees a conference will actually get.

How did we do it? It all started with us having to answer a lot of questions about building a good team. Questions like what kind of a team? Which skills should they possess? Should it be a specialized machine learning team, or have machine learning specialists be part of larger teams? How should they collaborate with the other developers? Should they do Agile? What about DevOps?

As with all complex questions, the answer is invariably, “it depends”. It depends on the size of your company, your culture, your goals. Strongbytes is an Agile company by definition, centred around multiple cross-functional teams, with each team handling one or more projects by themselves. We don’t have specialized backend, frontend, and testing teams, but instead our teams each have people with backend, frontend, and testing skills. This setup has worked really well for us, and it makes the integration of machine learning specialists obvious - we should not have a machine learning team, but instead we should have machine learning specialists embedded in our existing teams.

Embedding machine learning specialists in our teams enables us to take full control of our projects, from building the necessary data pipelines to developing responsive data-entry web apps and dashboards, to training and consuming machine learning models. All in an Agile fashion, delivering meaningful results every two weeks.

There is a catch though - not all machine learning roles are created equal, same as with development roles. A recent Workera report identifies several possible machine learning roles, depending on the particular tasks they need to tackle. The potential tasks can be:

  • Data Engineering - collecting, preparing and transforming data in order for the team to be able to use it
  • Modeling - analyzing data, identifying features, training machine learning models, predicting the outcomes of various decisions
  • Deployment - making the model available to users, combining data streams with running models, running everything in production
  • Business Analysis - evaluating model performance and comparing it with perceived business value
  • AI Infrastructure - building and maintaining software systems that enable & enhance all the other machine learning tasks

Now, it doesn’t always make sense to define a role for each and every one of those tasks, especially since they do have a certain amount of overlap over the various roles. Similarly, a single individual cannot realistically handle all the tasks. So how do we define the roles? The Workera report comes to the rescue once again, defining the following roles:

  • Data Scientist - able to do Data Engineering, Modeling, and Business Analysis tasks
  • Machine Learning Engineering - handling Data Engineering, Modeling, and Deployment tasks, knowing a lot of the same things as a Data Scientist, able to get a trained model up and running in a secure & scalable manner, but not quite there on the presenting-conclusions-to-a-business-audience front
  • Data Analyst - if your data person only knows how to process and display data, unable to do any modeling and/or deployments (Data Engineering, Business Analysis)
  • Machine Learning Researcher - if one can only process data and train models, without being able to deploy said models in production or interpret their results to a business audience (Data Engineering, Modeling)
  • Software Engineer - these are the people that build out solid data pipelines, gathering and processing data, while at the same time developing the necessary infrastructure for the team (Data Engineering, AI Infrastructure).
  • Software Engineer - Machine Learning - one of our favorite roles, these are the people able to process data and deploy models, write solid infrastructure code, and also train effective machine learning models. Their mathematics background isn’t as strong as that of Data Scientists, but they make up for this by being strong developers (Data Engineering, Modeling, Deployment, AI Infrastructure)

Which of those roles will you choose for your team? Again, it depends on what you want to achieve. A different mix of skills is needed depending on your business domain,  your development expertise, your culture. You may choose to postpone developing a very sophisticated AI infrastructure in the next two years, and choose instead to focus on processing and analyzing the data you have available in this timeframe. Or you might choose to invest in right infrastructure that allows you to experiment with different machine learning approaches fearlessly. The choice is yours.

There’s a third choice, too. It is about having an experienced partner help you with these decisions, make the right recommendations, help you get a feel of what insights hide in your data and of what can be realistically achieved with machine learning. Strongbytes has been helping partners improve their businesses for the past five years, from taking over existing projects to working alongside internal teams, to developing products from scratch. We are led by a team of industry veterans, each one having 15+ years of experience in the industry. We can be that experienced partner. Contact us.