Hello, I'm Munetoshi Ishikawa, a mobile client developer for the LINE messaging app.
This article is the latest installment of our weekly series "Improving code quality". For more information about the Weekly Report, please see the first article.
Gordian variables
Suppose that data called FooData
is stored on two storages, local and remote, and you are implementing FooSynchronizer
to synchronize them.
The expected situation is as follows:
- The remote
FooData
is the original, and it is used to update the localFooData
. - The remote and local data can be obtained from
FooLocalDao
andFooRemoteClient
, respectively. - (For simplification of the problem, transactions, etc., do not need to be considered.)
In the function synchronizeWithRemoteEntries
of FooSynchronizer
, the list of IDs of FooData
is compared between local and remote, and additions, updates, and deletions are performed according to the result.
- Addition: Executed when the ID exists only in remote (
id !in localEntryIds && id in remoteEntryIds
istrue
) - Update: Executed when the ID exists in both local and remote (
id in localEntryIds && id in remoteEntryIds
istrue
) - Deletion: Executed when the ID exists only in local (
id in localEntryIds && id !in remoteEntryIds
istrue
)
The FooSynchronizer
is implemented as follows to meet this specification.
class FooSynchronizer(
private val fooRemoteClient: FooRemoteClient,
private val fooLocalDao: FooLocalDao
) {
fun synchronizeWithRemoteEntries() {
val remoteEntries: List<FooModel> = fooRemoteClient.fetch()
val remoteEntryIds = remoteEntries.map(FooModel::id).toSet()
val localEntries: List<FooModel> = fooLocalDao.getAllEntries()
val localEntryMap = localEntries.associateBy(FooModel::id)
val localEntryIds = localEntryMap.keys
val createdEntryIds = remoteEntryIds.subtract(localEntryIds)
val deletedEntryIds = localEntryIds.subtract(remoteEntryIds)
remoteEntries.forEach { remoteEntry ->
// Add entry
if (remoteEntry.id in createdEntryIds) {
... // Logic to "add" using `remoteEntry`
return@forEach
}
// Update entry
val localEntry = localEntryMap[remoteEntry.id]
?: error("This must not happen.")
... // Logic to "update" using `remoteEntry` and `localEntry`
}
localEntries.asSequence()
.filter { it.id in deletedEntryIds }
.forEach { deletedEntry ->
... // Logic to "delete" using `deletedEntry`
}
}
}
Here, List.associateBy
is a function that creates a Map
from a List
. In other words, localEntries.associateBy(FooModel::id)
returns a Map
with IDs as keys and FooData
as values. Also, subtract
is a function that returns the difference of sets (elements included in the receiver but not in the argument).
Are there any improvements that can be made to this code?
Intermediate goals to untangle the knot
Following the flow of this code requires considerable effort. One of the causes is the complex entanglement of data dependencies. For example, the flow to find the list of elements to be added, createdEntryIds
, is as follows:
- Find
remoteEntryIds
fromremoteEntries
. - Find
localEntryMap
fromlocalEntries
, and then findlocalEntryIds
from it. - Compare
remoteEntryIds
andlocalEntryIds
to findcreatedEntryIds
.
Furthermore, since createdEntryIds
only holds IDs, it is necessary to obtain FooData
to actually add it. Therefore, remoteEntries
is used again in the above code. Using the same data multiple times makes the flow of processing complicated.
Additionally, there are problems in the update and delete code due to the complexity of dependencies.
- Problem with update code: Due to dependency on
localEntryMap
, it is necessary to define a runtime error that should not occur. - Problem with delete code: Inconsistency with the addition and update code.
To solve such problems, it can be effective to imagine what the ideal intermediate data would be and reverse-engineer the function structure from there. In this case, since data is added, updated, and deleted, having a triplet of createdEntries
, updatedEntries
, and deletedEntries
might simplify the code.
To create this triplet, the following steps can be taken:
- Create a set of all IDs,
allEntryIds
, from both local and remote. - Create a pair of local and remote data,
Pair<FooData?, FooData?>
, for each ID. - Create a function
partitionByNullity
fromSequence<Pair<S?, T?>>
to create the triplet. Each element of the triplet is as follows:- 1st element
List<S>
: WhenS
is non-null andT
is null - 2nd element
List<Pair<S, T>>
: When bothS
andT
are non-null - 3rd element
List<T>
: WhenS
is null andT
is non-null
- 1st element
By using this partitionByNullity
, the flow of the function can be simplified. The general flow is as follows:
- Create a
Map
of ID to remoteFooData
,remoteEntryMap
. - Create a
Map
of ID to localFooData
,localEntryMap
. - Create the items for addition, update, and deletion,
(createdEntries, updatedEntries, deletedEntries)
, fromremoteEntryMap
andlocalEntryMap
. - Execute addition, update, and deletion for each of
createdEntries
,updatedEntries
, anddeletedEntries
.
An example implementation is as follows:
fun synchronizeWithRemoteEntries() {
val remoteEntries: List<FooModel> = fooRemoteClient.fetch()
val remoteEntryMap = remoteEntries.associateBy(FooModel::id)
val localEntries: List<FooModel> = fooLocalDao.getAllEntries()
val localEntryMap = localEntries.associateBy(FooModel::id)
val allEntryIds = remoteEntryMap.keys + localEntryMap.keys
val (createdEntries, updatedEntries, deletedEntries) = allEntryIds.asSequence()
.map { id -> remoteEntryMap[id] to localEntryMap[id] }
.partitionByNullity()
createdEntries.forEach { createdEntry ->
... // Logic to "add" using `createdEntry`
}
updatedEntries.forEach { (remoteEntry, localEntry) ->
... // Logic to "update" using `remoteEntry` and `localEntry`
}
deletedEntries.forEach { deletedEntry ->
... // Logic to "delete" using `deletedEntry`
}
}
companion object {
/** ... */
private fun <S : Any, T : Any> Sequence<Pair<S?, T?>>.partitionByNullity():
Triple<List<S>, List<Pair<S, T>>, List<T>> {
val leftEntries: MutableList<S> = mutableListOf()
val bothEntries: MutableList<Pair<S, T>> = mutableListOf()
val rightEntries: MutableList<T> = mutableListOf()
forEach { (left, right) ->
when {
left != null && right == null -> leftEntries += left
left != null && right != null -> bothEntries += left to right
left == null && right != null -> rightEntries += right
else /* left == null && right == null */ -> Unit
}
}
return Triple(leftEntries, bothEntries, rightEntries)
}
}
By doing this, the most important code in synchronizeWithRemoteEntries
, the forEach
for addition, update, and deletion, can be highlighted.
In a nutshell
When data dependencies are complex, it may be possible to organize them by creating ideal intermediate data.
Keywords: data dependency
, function flow
, intermediate data structure