LY Corporation Tech Blog

We are promoting the technology and development culture that supports the services of LY Corporation and LY Corporation Group (LINE Plus, LINE Taiwan and LINE Vietnam).

This post is also available in the following languages. Korean

Rebuilding one of Japan's largest delivery apps from the ground up - The Recode Project

 

"Demae-can" is one of Japan's largest food delivery services which started in 2000. Demae-can was later acquired by our company, then known as LINE, in 2020. Currently, Japan's delivery service market is smaller than Korea's, indicating significant growth potential.

ABC Studio has been working on a comprehensive renewal of Demae-can since spring 2021. Among these efforts, under the name "Recode", we undertook a project to completely replace the codebase and architecture while keeping the service specifications the same. In this article, we'll explain why we chose to replace the existing code with a new codebase and how we executed it. By the way, many people mistakenly wrote "Record" instead of "Recode". If you're planning a similar project, consider using a different name. :)

An old dream

Every engineer dreams of something: cleaning up legacy systems overnight with a magic broom and making all users switch to the latest version with a snap of the fingers (in reality, forcing such a change might result in losing half of the users). Why is it so difficult to overhaul everything at once, even when the specifications are clear? It's because the business must continue without interruption, the company must recognize the value of technological improvement as much as other urgent matters, and there must be internal agreement within the product team on the challenging overhaul. After reaching an agreement, there needs to be a consensus among other members, such as management, business, marketing, QA, and CS teams, who understand the product team's challenges.

The hopes of the surrounding teams can be summarized as follows:

  • Management: Achieve the GMV (gross merchandise volume) target to win in the market and respond positively to shareholders.
  • Business team: A contract with a new major partner is imminent. Let us know how long it will take to add a new payment method. If delayed, there's a risk they'll sign an exclusive contract with a competitor (just writing this is stressful...).
  • Marketing team: We want to hold a half-price discount event during the holidays. We plan to issue a large number of coupons and run messenger ads, so we expect peak traffic.
  • QA team: The newly written code will likely have many bugs, so we need to allocate a long time for QA. How about three months?
  • CS team: There will likely be many bugs for a while, and there's a risk the call center won't be able to handle them properly.

What was the value everyone could agree on? And how could we persuade each of them? While developers might want to clean up legacy systems, many prioritize implementing business requirements to contribute to the company's revenue and economic growth. This is because it's a more visible achievement, leading to evaluation and rewards. Therefore, it's necessary to persuade the company to understand the value and contribution of this effort and reach an internal agreement within the product team. For colleagues who try to understand the issue only in numbers, you can explain that it's impossible to achieve the goals with the legacy system by linking story points to development time and release cycles as KPIs.

Convincing the need for Recode

The biggest challenge in convincing those outside the product team of the need for the Recode project is that it seems to have no business contribution. Here are three representative ways to persuade them.

The first is to emphasize the potential security risks found in the current app. Security issues are critical enough to render a service meaningless in an instant, so if there are clear security issues, highlight them significantly.

The second is to show the next steps after Recode. Since Recode itself doesn't result in visible changes, its effects can't be felt. ABC Studio demonstrated the changes that two future releases would bring, including a monthly UI refresh that could be done after Recode. They also introduced UX and new features that could be applied thanks to Recode. When considering six months later, instill the belief that enduring with the existing app for about three months and then rapidly accommodating various requirements in the remaining three months after Recode will yield better results than gradually modifying the app over six months.

The third is to use the logic of money. While LY Coroporation prioritizes security as the most important factor, many companies don't. For companies that only care about business KPIs, expressing the economic impact of Recode in a simple formula can be persuasive. It's helpful to collect and organize various metrics of the current service in a table, such as bug and crash frequency, user feedback, negative service quality indicators (loading time, etc.), and the time required to add new features. By including the benefits of more frequent releases and calculating them in a table, you can effectively convey the necessity of Recode. Additionally, comparing with competitors can be a sensitive but effective method.

Demae-can is a listed company in Japan. As a service with strong sales capabilities, it prioritized revenue, but there was a consensus that the service quality was poor. They acknowledged the need for innovation beyond product renewal and the time required for it. However, they couldn't wait indefinitely, so we persuaded them to endure for just three months. Three months was a tight period from a development perspective, but it corresponded to one quarter from a business perspective.

The characteristic of Recode is that it changes the code and architecture while keeping the existing app's UI and functionality the same. Although new specifications and designs could be applied while rewriting, there are four reasons why we didn't:

  1. It reassures the operational organization, including the CS team. Saying that only the version number changes while the app remains the same reduces confusion in the field. It also serves as a buffer in operations until new features are added.
  2. It simplifies the QA team's work. Instead of causing confusion with new specifications and bugs, it allows them to focus only on new bug types. This enables the QA team to focus solely on code quality when the product team issues hotfixes.
  3. The product team can stably learn the entire workflow. By going through the experience of releasing a public version with surrounding teams, they can learn communication methods and release procedures.
  4. The development team can gain confidence in their code. When only small releases are made without fully understanding the entire structure, they may panic when bugs arise. Especially with old architectures, they can easily collapse.

Where to start?

There are various approaches to completely reconfigure a part of the overall architecture of a service. You can start with the most critical part or proceed from the end. The decision depends on the urgency, impact, stability, and business requirements of the current service. Here's a glimpse of Demae-can's case:

ComponentMember serviceOrder serviceDelivery serviceInternal managementFront appDelivery appMerchant app...
Urgency of requirementsLowHighLowMediumHighMediumLow 
Security riskMediumLowLowLowLowHighHigh 
Team structureInsufficientAdequateInsufficientAdequateSufficientInsufficientInsufficient 
ArchitectureInsufficientAdequateInsufficientInsufficientAdequateInsufficientInsufficient 
...        

Like any food delivery service, Demae-can's users can be divided into three main groups: users who order food, franchise owners, and riders. At Demae-can, these are referred to as front, franchise, and delivery personnel. ABC Studio decided to prioritize understanding and renewing the franchise and delivery apps based on the table above, which summarizes the current situation. The decision was made because these two user groups had already established relationships with the CS center to the extent that they could be contacted directly (Demae-can's riders are hired through direct contracts at regional bases), and the apps were highly functional due to the nature of the service. On the other hand, the front app, which had a large user base and extensive operational tasks, was excluded from the Recode target because the service was not yet fully understood.

From a technical perspective, both the franchise and delivery apps were typical iOS/Android apps, meaning there were four apps in total. The franchise app was developed for tablet devices using Xamarin. The delivery app was developed as a typical mobile app using React Native (hereafter RN). For reference, the front app was developed using RN and Expo. It was quite interesting. Each was developed by different external companies, but all were implemented with multi-platform considerations in mind.

The franchise app is dedicated to receiving orders, and the delivery app is dedicated to delivery functions. Being dedicated means there are no other functions besides those. As a result, the scenarios were relatively simple, and the absolute amount of specifications was small. However, as life doesn't always go according to specifications, the implementation of the specifications was not straightforward. As mentioned earlier, the apps for these two user groups had significant differences from the desired form of ABC Studio in terms of technology set and structure. ABC Studio is a team strong in native app development. While they are proficient in Flutter, they had no experience with Xamarin and RN. They managed to build the code after receiving it, and they set up a test server that mimicked the main protocols to understand the overall flow. Now that they had a rough understanding of the flow, they considered learning new technologies and improving the code.

Back to square one

No, we decided it was better to redevelop from scratch. In software, the technology set is secondary. Architecture and code quality come first. The existing app had security issues and was based on the most basic structure, not a modern one, making it difficult to accommodate future requirements. The libraries used were all outdated. Such old versions of libraries are generally vulnerable to security issues and hinder support for the latest OS platforms. We wanted to secure a UI and UX that could smoothly handle more features in response to future requirements, ensure high stability, and actively perform collaboration and unit testing. Therefore, we decided to overhaul with native code and develop four apps (franchise app (iOS/Android) + delivery app (iOS/Android)).

For some reason, the planning documents were not fully preserved, so we had to trace the code and touch each button to recreate the planning documents. We rewrote the missing specifications and recreated the design page by page using Figma. This process of recreating the planning documents and development took two and a half months, followed by a three-week QA period, all under the project name Recode.

Sharing the Recode experience with Xamarin and RN, the conversion was much easier with Xamarin. This is because the app structure respected native apps while only varying the language and development environment with Microsoft C#. Therefore, even without Xamarin knowledge, native developers could read the code without difficulty. However, RN required a lot of effort due to differences in file structure and syntax.

Quality requirements

There are various criteria for evaluating software quality, such as maintainability, scalability, and reusability. Depending on where the focus is placed, the design, structure, and build pipeline of the software can change. The quality attributes that were important in the Recode project are as follows1.

  • Security: The app was using Realm as its database, so we added logic to move it to SecureStorage. Additionally, redefining the data update relationship between the device and server was set as the most important quality requirement. We also included the smooth transition and utilization of local data, including existing authentication tokens, as essential test items to avoid the inconvenience of having to log in again after an update.
  • Reusability: ABC Studio is strong in native app development, but with limited personnel to develop four apps simultaneously, we decided to consider multi-platform partially. However, instead of sharing the entire stack, including the UI layer like Flutter, we decided to respect native development for the UI layer to enhance app design completeness and explore technologies that allow libraries to be shared.
  • Testability: Most modern architectures are evolving to facilitate better testing. Beyond clear separation from the view, recent declarative syntax, which builds views with code, is also intended to make testing easier. The Recode project aimed to align with this trend. For example, Android actively utilized Jetpack Compose, LiveData, and the MVVM (model-view-viewmodel) architecture.
  • Stability: Upon investigation, it was found that the existing app had no stability checks. Although Google Analytics was applied, no one was monitoring it. To improve this, we needed to organize access accounts, redefine responsibilities, organize the notification system, and set crash metrics as KPIs.

Introducing multi-platform technology - KMM (Kotlin Multiplatform Mobile)

In the current situation where multi-platform technologies like hybrid apps, Xamarin, RN, and Flutter have been sufficiently verified (though verified, not mature), developing function-focused apps with almost identical UX on different codebases is inefficient. ABC Studio is applying several commonization technologies, such as synchronizing string resources at build time. The new attempt this time was KMM. The following expectations were considered when choosing KMM:

  • JetBrains is creating an IDE dedicated to Android/iOS, so they will pay attention to the overall development experience, not just the code.
  • As the company that creates the Kotlin language, it has high potential for development.
  • Most technologies propose a new language common to both platforms, but KMM is entirely native on Android, so the learning cost and risk are low.
  • As the latest multi-platform technology, it would have considered past trial and error.
  • By limiting the sharing to libraries only, it respects the responsiveness and expressiveness of native UI.

ABC Studio introduced KMM and open-sourced five of the libraries on GitHub (reference). However, looking back while writing this article after operating for three months post-release, we realized that KMM still has shortcomings. As the application becomes more complex, the lack of integration with iOS leads to increased debugging trial and error. No matter how great the architecture is, maturity is a separate issue.

"Anyone can draw a plausible architecture until they get hit by error messages." - An anonymous developer's famous saying

QA

During the QA process, we tested for three weeks to ensure that the existing app and the Recode app operated identically. Professional QA personnel placed two phones side by side and tested by pressing each button according to the scenario. Even if the code and planning levels were accurately recreated, verification by a QA team completely separate from the development team is essential.

The verification results showed that there were many parts that behaved differently than expected, even though we thought they were identical at the code level. Especially, RN and native UI had different calculations for the size of various graphic components, including fonts, resulting in different degrees of text overflow in text boxes. Many bugs were also found in the existing app. Some of them were already specified bugs, so we had to discuss whether to respect them each time. Additionally, the delivery service was sensitive to the competition for order acceptance among drivers, so there were many race conditions that would be difficult to experience without professional QA personnel. As a result, although the QA period was initially planned for three weeks, it actually took about a month.

The update API is used forever

We wanted to set a forced update to make everyone update to the new version at once, but since it's a nationwide service and a service that affects the livelihoods of franchise owners and delivery personnel, we couldn't be so bold. Instead, we are coordinating with the CS team to proceed with updates by dividing user IDs by prefecture (equivalent to a province in our country) in the forced update function of the legacy API server specifications.

There are various reasons why an app fails to update even after receiving a forced update command from the server. Some realistic reasons include salespeople setting up the device and logging out of the Apple or Android account, or part-timers turning off the message without knowing what it is.

Nevertheless, a forced update function is essential for an API server that communicates with the app. It's the only function among the various protocols of the legacy API server that is utilized until the end. Therefore, if you're launching an app for the first time, it's good to define and implement a forced update specification for the app and API server from the start.

Forced updates can be implemented in various ways. The OS provides it at the platform API level, but sometimes it doesn't work if the store account is logged out or the store app is an old version, so it should also be controllable at the server level. In cases like this, where updates are done by region or need to be distinguished by business entity, it's good to have room to filter by the logged-in user's ID. Additionally, when implementing the forced update UX in the app, if you only implement a popup that covers the UI, there may be cases where it communicates normally with the server, claiming to be in a normal operating state. Therefore, it's advisable to implement it so that some functions are definitely stopped in the app.

Fortunately, in this case, the legacy server allowed forced updates to be adjusted by user ID, so the burden on the CS team could be appropriately managed.

Release, and after

The interesting point of the Recode project is that the more users who don't notice any changes after the release, the more successful it is. Yes, it successfully went unnoticed by anyone. After Recode, the product team's release cycle started to operate normally. It was implemented with the architecture and technology set that the team most wanted, and security issues were resolved. We are releasing every three weeks and occasionally doing hotfixes, raising the head number over 10 times according to HeadVer, and moving forward.

Recode is a step back for two steps forward. Once started, no requirements can be implemented for three months. To endure this period, it's necessary to discuss and reach agreements on strategies with various teams, including the development team. There are many hurdles to overcome.

Nevertheless, the reason for starting the Recode project was to gain the development team's confidence in the architecture. There may be various criteria for judging the soundness of software development, but among them, the development team's confidence in the architecture is more important than anything else. With that confidence, high-quality software can be released regularly. Recode is a radical but most certain way to gain such confidence.

  1. For more details on quality requirements, refer to Define the Quality Attributes in Design It!: From Programmer to Software Architect, a book that I've translated to Korean and had published in 2021.