Hello, I'm Munetoshi Ishikawa, a mobile client developer for the LINE messaging app.
This article is the latest installment of our weekly series "Improving code quality". For more information about the Weekly Report, please see the first article.
Gordian variables
Suppose that data called FooData is stored on two storages, local and remote, and you are implementing FooSynchronizer to synchronize them.
The expected situation is as follows:
- The remote
FooDatais the original, and it is used to update the localFooData. - The remote and local data can be obtained from
FooLocalDaoandFooRemoteClient, respectively. - (For simplification of the problem, transactions, etc., do not need to be considered.)
In the function synchronizeWithRemoteEntries of FooSynchronizer, the list of IDs of FooData is compared between local and remote, and additions, updates, and deletions are performed according to the result.
- Addition: Executed when the ID exists only in remote (
id !in localEntryIds && id in remoteEntryIdsistrue) - Update: Executed when the ID exists in both local and remote (
id in localEntryIds && id in remoteEntryIdsistrue) - Deletion: Executed when the ID exists only in local (
id in localEntryIds && id !in remoteEntryIdsistrue)
The FooSynchronizer is implemented as follows to meet this specification.
class FooSynchronizer(
private val fooRemoteClient: FooRemoteClient,
private val fooLocalDao: FooLocalDao
) {
fun synchronizeWithRemoteEntries() {
val remoteEntries: List<FooModel> = fooRemoteClient.fetch()
val remoteEntryIds = remoteEntries.map(FooModel::id).toSet()
val localEntries: List<FooModel> = fooLocalDao.getAllEntries()
val localEntryMap = localEntries.associateBy(FooModel::id)
val localEntryIds = localEntryMap.keys
val createdEntryIds = remoteEntryIds.subtract(localEntryIds)
val deletedEntryIds = localEntryIds.subtract(remoteEntryIds)
remoteEntries.forEach { remoteEntry ->
// Add entry
if (remoteEntry.id in createdEntryIds) {
... // Logic to "add" using `remoteEntry`
return@forEach
}
// Update entry
val localEntry = localEntryMap[remoteEntry.id]
?: error("This must not happen.")
... // Logic to "update" using `remoteEntry` and `localEntry`
}
localEntries.asSequence()
.filter { it.id in deletedEntryIds }
.forEach { deletedEntry ->
... // Logic to "delete" using `deletedEntry`
}
}
}
Here, List.associateBy is a function that creates a Map from a List. In other words, localEntries.associateBy(FooModel::id) returns a Map with IDs as keys and FooData as values. Also, subtract is a function that returns the difference of sets (elements included in the receiver but not in the argument).
Are there any improvements that can be made to this code?
Intermediate goals to untangle the knot
Following the flow of this code requires considerable effort. One of the causes is the complex entanglement of data dependencies. For example, the flow to find the list of elements to be added, createdEntryIds, is as follows:
- Find
remoteEntryIdsfromremoteEntries. - Find
localEntryMapfromlocalEntries, and then findlocalEntryIdsfrom it. - Compare
remoteEntryIdsandlocalEntryIdsto findcreatedEntryIds.
Furthermore, since createdEntryIds only holds IDs, it is necessary to obtain FooData to actually add it. Therefore, remoteEntries is used again in the above code. Using the same data multiple times makes the flow of processing complicated.
Additionally, there are problems in the update and delete code due to the complexity of dependencies.
- Problem with update code: Due to dependency on
localEntryMap, it is necessary to define a runtime error that should not occur. - Problem with delete code: Inconsistency with the addition and update code.
To solve such problems, it can be effective to imagine what the ideal intermediate data would be and reverse-engineer the function structure from there. In this case, since data is added, updated, and deleted, having a triplet of createdEntries, updatedEntries, and deletedEntries might simplify the code.
To create this triplet, the following steps can be taken:
- Create a set of all IDs,
allEntryIds, from both local and remote. - Create a pair of local and remote data,
Pair<FooData?, FooData?>, for each ID. - Create a function
partitionByNullityfromSequence<Pair<S?, T?>>to create the triplet. Each element of the triplet is as follows:- 1st element
List<S>: WhenSis non-null andTis null - 2nd element
List<Pair<S, T>>: When bothSandTare non-null - 3rd element
List<T>: WhenSis null andTis non-null
- 1st element
By using this partitionByNullity, the flow of the function can be simplified. The general flow is as follows:
- Create a
Mapof ID to remoteFooData,remoteEntryMap. - Create a
Mapof ID to localFooData,localEntryMap. - Create the items for addition, update, and deletion,
(createdEntries, updatedEntries, deletedEntries), fromremoteEntryMapandlocalEntryMap. - Execute addition, update, and deletion for each of
createdEntries,updatedEntries, anddeletedEntries.
An example implementation is as follows:
fun synchronizeWithRemoteEntries() {
val remoteEntries: List<FooModel> = fooRemoteClient.fetch()
val remoteEntryMap = remoteEntries.associateBy(FooModel::id)
val localEntries: List<FooModel> = fooLocalDao.getAllEntries()
val localEntryMap = localEntries.associateBy(FooModel::id)
val allEntryIds = remoteEntryMap.keys + localEntryMap.keys
val (createdEntries, updatedEntries, deletedEntries) = allEntryIds.asSequence()
.map { id -> remoteEntryMap[id] to localEntryMap[id] }
.partitionByNullity()
createdEntries.forEach { createdEntry ->
... // Logic to "add" using `createdEntry`
}
updatedEntries.forEach { (remoteEntry, localEntry) ->
... // Logic to "update" using `remoteEntry` and `localEntry`
}
deletedEntries.forEach { deletedEntry ->
... // Logic to "delete" using `deletedEntry`
}
}
companion object {
/** ... */
private fun <S : Any, T : Any> Sequence<Pair<S?, T?>>.partitionByNullity():
Triple<List<S>, List<Pair<S, T>>, List<T>> {
val leftEntries: MutableList<S> = mutableListOf()
val bothEntries: MutableList<Pair<S, T>> = mutableListOf()
val rightEntries: MutableList<T> = mutableListOf()
forEach { (left, right) ->
when {
left != null && right == null -> leftEntries += left
left != null && right != null -> bothEntries += left to right
left == null && right != null -> rightEntries += right
else /* left == null && right == null */ -> Unit
}
}
return Triple(leftEntries, bothEntries, rightEntries)
}
}
By doing this, the most important code in synchronizeWithRemoteEntries, the forEach for addition, update, and deletion, can be highlighted.
In a nutshell
When data dependencies are complex, it may be possible to organize them by creating ideal intermediate data.
Keywords: data dependency, function flow, intermediate data structure