In my previous blog post I wrote about how to calculate the utility cost of a cloud migration project, and in which cases that tool can help you make the decision NOT to migrate between cloud providers fairly trivial. Assuming you tried to use it and were not able to disqualify the project based on it, the below will give you some more food for thought. I'll discuss how to estimate the cost of porting the application, migrating it and planning a switchover.
The cost of porting the application
There are three steps in this stage, where each step is dependent on the other. The first two steps must be executed by someone who is internal to the organization, even if a 3rd party consultant is involved in the project.
Step One - API migration
Mapping cloud services used
When an application is developed for the cloud, it usually includes components of three types:
- Components developed and managed by your team
My assumption is that migrating your own code to another cloud provider is fairly easy and straightforward. - Components managed by your team but were not developed by them
Example: any open source DB such as MySQL or MongoDB. These can be migrated fairly easily as well with the caveat that they are not dependent on a specific cloud service (e.g. AWS Container service). - Managed Cloud services, provided by the cloud provider vendor
Example: AWS S3, Azure functions, GCP Big Query/table etc.
The third point can become a real nightmare. You need to map all the managed cloud services, their names and how heavily your application is dependent on them. In addition, you should consider the fact that not every service has the exact equivalent on the target cloud provider, which might force you to work around it by
- Adopting an external tool and managing it by your team, increasing the DevOps costs while (usually) decreasing the cloud vendor bill as well as the vendor lock-in.
- Adding logic to your application to replace a missing service with your own code.
- Keep using the source cloud service in spite of the migration. This is only realistic for very few services.
If you have a small number of applications hosted on a cloud vendor and/or you plan to fully migrate all your apps from one cloud vendor to another, the easy path goes through the monthly bill. Have a discussion with your architects and review the cloud services listed there and understand their usage.
If you have many cloud applications, and you only want to migrate a subset of them due to regulation, strategy or simply a test, you should ask your architects to pull out that info for you. Naturally, this is riskier than using the bill, so a good idea is that once an architect review was made, do a mutual bill review and make sure nothing was left out.
A good tool to capture this knowledge, either in the planning process or as an outcome of the project is Draw.io which is easy to use online and includes all the major cloud services. You'll need to click on "More shapes", left-bottom corner in the application, and then add AWS/Azure/others.
Mapping how cloud services are used, and where
Once all cloud services were mapped, we need to map how they were used and where. The "How" relates to which verbs of the cloud service API were used, and the "Where" relates to the actual number of calls each verb was used and the location in the code. Another analogy is that the "How" represents the variance in a population, while the "Where" represents the number of items in a population.
My take is that the higher variance is, the riskier is the project is, and thus its cost. We need to understand each verb, why it was used, and the level of similarity in the API of the target cloud. For example: "delete object" will probably have a high level of similarity (requires some research), but the usage of more exotic features of the cloud service might not have an equivalent on the target service, will require more thought/planning/cost and workaround.
Dealing with a large population of many code instances doesn’t affect the project dramatically. It can be somewhat automated and even if not, it’s a quick task with opportunity for improving the overall code, for example, by refactoring all calls to go through an internal service.
Porting the API code
The planning of this step is fairly straightforward. Need to look at what we aim to achieve (porting all verbs), the coding resources we have, the cost and time for the porting. Very few people have regretted testing too much in such process so add some test slack time.
A point to notice is that if the above porting project takes a long time for an application that gets new features on an ongoing basis, it might “drift” from the main application code, so some strategy is required to mitigate it and also some cost needs to be considered for the effort.
Step Two - Data migration
Mapping the data to be migrated
The source cloud may hold data that is not intended for migration: either this is an opportunity to scrub old and unused data, or an opportunity to re-map the data and
For example, we
The project costs for data migration project
“A migration project?!?!?! I’ll write a script of three lines: list all items in the source, for each of them - write it to the target. Hell, with some nifty coding I can do it all in one line!”
Well, sure, the data resides on one cloud provider e.g. AWS S3 and you want it to end up at another cloud provider e.g. Azure
- Monitor migration stages, failures, causes.
- Troubleshooting migration issues, such as file size limitation, target size limitation, performance, etc.
- Planning and executing successful metadata and ACL’s migration
- Monitoring newly created data at the source and creating a path for closing gaps.
- Management cost - who reports what to whom? Better to plan in advance and do some simulated reports to verify nothing is missing or left behind.
You may find that it is a bit challenging to capture all of the above in one line of code (or even three for that matter...). It may be worthwhile to evaluate some migration tools.
Egress Costs
Shameless reference to part 1 😊.
Step Three - Switchover and estimated cost of failure
You’ve executed your application porting and you managed your data migration project like a champ. Unfortunately, this has all been in preparation for the “money time” – the switchover point. Naturally if your application is not a live/production/internet class application, the tension is lower at this point, but if you’re lucky enough to manage a critical application, the switchover is a project of
- The cost of downtime. Per minute, customer, missing data – chose the point that fits your case best.
- Is there any failback and what are the failback triggers?
Assuming we did everything as planned and the application simply does not come up at the target, how long do we spend on troubleshooting before we try to bring it up at the source? If it does come up, but with some features not behaving as expected, do we fail-back or not? - How do we perform the switchover and how do we test it beforehand?
New environment learning curve
This might seem a bit off topic, but you should at the very least be aware of the fact that even after your application and data reside on the new cloud, your team needs to acclimatize to a new environment and some level of “altitude sickness” is expected. This is really different from team to team, how extensively and widely were they using the old cloud and how do you plan to use the new cloud.
Last words
After reading the above I might seem bleak or pessimistic but I’m neither. My goal was, at the very least, to get you seriously consider the costs of data cloud migration before finding yourself chest deep in the swamp.
Let me know what I can improve, what was helpful as well as what content you care about going forward.
The sheet includes some helpful items for part2 as well