Kai Moritz [Sun, 24 Mar 2024 19:34:07 +0000 (20:34 +0100)]
fix: GREEN - Postponed the resetting of the sink
* The completion of a sink of an instance of `ChatRoomData`, that belongs
to a revoked partition is postponed until the following partition-
assignment happens.
* Also, the sink is only completed and recreated, if the partition it
belongs to, was not assigned to the instance during
`onPartitionsAssigned()` again.
* That way, listeners have a little more time to receive the messages, that
were send shortly prior to the deactivation, and are not disturbed
unnecessarily, if the partition is reassigned to the same instance again.
* *TODO:* Listeners still have to be enabled, to start listening on the
responsible instance where they left of on the old instance, if they
were to slow to receive all messages, before the sind was completed.
Kai Moritz [Mon, 11 Mar 2024 17:32:46 +0000 (18:32 +0100)]
test: HandoverIT-POC - GREEN: The `TestListener` automatically reconnects
* If a rebalance happens, an instances, that is no longer responsible
for a specific chat-room, closes the connections to listeners of these
chat-rooms.
* Hence, the `TestListener`, like the real frontend, will have to
reconnect, if that happens, to be able to see newly send messages.
Kai Moritz [Fri, 8 Mar 2024 08:14:14 +0000 (09:14 +0100)]
test: HandoverIT-POC - RED - Started a second backend
* A second backend is started, after the writers and the listener were
instanciated.
* The test fails, because the listener does not see messages, that are
send by the writers after the rebalance finishes.
* Also changed the durations, that the setup waits for newly
started backend-instances.
Kai Moritz [Wed, 20 Mar 2024 21:02:09 +0000 (22:02 +0100)]
fix: GREEN - Dynamic changes to the sharding-map are synced back to disk
* HAProxy distinguishes managed and unmanaged maps.
** Only Maps, that are loaded from the directory `maps`, that lives in the
config-directory of HAProxy, are treated as managed.
** Only managed maps are listed by default by the Data Plane API.
** And last but not least, only managed maps _can be synced back to disk_,
which makes dynamically changes persistent, so that they can survive
a restart or reload of HAProxy.
* And only managed maps can as well be referenced by their _name_, as by
their id.
* The name of managed maps is the name of the file, _without_ the path and
without the suffix `.map`.
* What makes it more difficulte in the case of the HAProxy running inside
a docker-container is, that this mechanism only seems to work, if the
config-directory is identical to the default `/etc/haproxy`, which _is
not_ the case in the default-configuration of the used official image.
* Hence, changing the config-directory to the default (in this case)
enables managed maps, which is crucial for the setup of the Handover-IT,
whicht has to reload the HAProxy-process, in order to let it detect newly
added backend-instances, that are otherwise ignored, because the name-
resolution inside the docker-network only works _after_ an instance is
started -- _without this change_, the dynamically made changes to the
sharding-map are lost during the reload, that is performed shortly
afterwards.
* The refined configuration for HAProxy enables the following refinements
of the `HaproxyDataPlaneApiShardingPublisherStrategy`:
** Because the sharding-map is managed now, applied changes can be *synced
back to disk* forcibly, which is not possible for unmanaged maps.
** That means: *Dynamically made changes persist a reload/restart!*
** Also, because the sharding-map is managed now, it can be simply
referenced by its name `sharding`.
** Therefore, the `HaproxyDataPlaneApiShardingPublisherStrategyIT` does
no longer have to detect the ID of the map to be able to reference it.
Kai Moritz [Wed, 20 Mar 2024 22:02:22 +0000 (23:02 +0100)]
test: RED - Updated keys must survive, if HAProxy reloads
* The Handover-IT hast to signal HAProxy to reload, after a new backend was
started for the first time.
* Otherwise, HAProxy is not able to detect newly started backend-instances,
because with docker, the name of a backend cannot be resolved, before it
is started.
* This test formulates the expectation, that dynamically added changes,
that were applyed with the help of the Data Plane API, persist a reload.
* *The test fails*, because HAProxy restores all maps from disk, if it
reloads, but the dynamically made changes are never synced back to disk.
Kai Moritz [Sun, 17 Mar 2024 09:48:17 +0000 (10:48 +0100)]
test: HandoverIT-POC - FIX: writers/listeners must send the `X-shard`-header
* `@Disabled` the Handover-IT temporarily, because the fix disclosed,
that some fixes and refactorings are needed, before the IT can work
again.
* That is, because the implemented `HaproxyShardingPublisherStrategy`
does _not_ properly updates the sharding-map.
Kai Moritz [Fri, 15 Mar 2024 15:10:31 +0000 (16:10 +0100)]
test: HandoverIT-POC - FIX: `TestWriter` must use `Flux#concatMap()`
* `Flux#flatMap()` executes in parallel. Hence, the enqueued messages
are _not_ send in the order, that they were enqueued.
* Therefore, the `TestWriter` was refactored to use `Flux#concatMap()`,
which executes serially.
Kai Moritz [Mon, 11 Mar 2024 12:08:21 +0000 (13:08 +0100)]
fix: GREEN - Implemented activation/deactivation of `ChatRoomData`
* Introduced `volatile ChatRoomData#active`, which initially is `false`.
* `ChatRoomData#listen()` throws `ChatRoomInactiveException` if inactive.
* `ChatRoomData#addMessage(..)` throws `ChatRoomInactiveException` if
inactive.
* `SimpleChatHomeService` explicitly activates restored and newly created
instances of `ChatRoomData`.
* `DataChannel` explicitly activates instances of `ChatRoomData`, if
they are restored during partition-assignment or, if a new chat-room
is created.
* `DataChannel` explicitly _deactivates_ instances of `ChatRoomData`,
if the associated partition is revoked.
* Also: Introduced `ChatMessageService#getChatRoomId()`.
Kai Moritz [Mon, 11 Mar 2024 10:59:50 +0000 (11:59 +0100)]
test: RED - Formulated expectations for active vs. inactive `ChatRoomData`
* Introduced the methods `ChatRoomData+activate()` and
`ChatRoomData#deactivate()`.
* Added tests to `ChatRoomDataTest`, that assert the expectations.
* Added tests to `AbstractConfigurationIT`, that assert the expectations.
Kai Moritz [Sat, 9 Mar 2024 10:17:33 +0000 (11:17 +0100)]
fix: GREEN - `ChatRoomData` obeys to the added expectations.
* Switched `ChatRoomData` from a multicast- to a replay-sink.
* Before, listening was implemented with a multicast-sink, that enabled
back-pressure.
* Now, it was refactored to use a replay-sink, that enables a (configurable)
limitted replay.
Kai Moritz [Fri, 8 Mar 2024 17:10:43 +0000 (18:10 +0100)]
test: RED - Added test for multiple parallel Listeners to `ChatRoomDataTest`
* The test formulates the expectation, that late listeners should see all
messages (for a reasonable long period before their subscriptions)
* The test _fails_, because the implementation only buffers messages for
backpressure -- _not for replay!_
* In the case, that the request that adds a messag failed, the longly
process of creating a new chat-room, until it has the correct shard,
was unnecessarily repeated.
* Different workarounds have to be applied at the same time for the two
implementations:
** Because `ShardedChatHomeService` only throws a `ShardNotOwnedException`,
if the shard, that is picked by the strategy is not owned, the result
is retried until the requests succeeds by chance.
** The `KafkaChatHomeService` always creates the chat-room in the
partition, that is derived from the randomly picked id.
Hence, the loop, that is only left, if the randomly picked partition
matches the partition `2`, that the test-instance owns.
** Since `SimpleChatHomeService` does not know the concept of sharding
at all, the loop is also left, if no shard is set (shard is ``null``).
* The first request in the awaited assertion must create a chat-room,
that is owned by the instance.
* Hence, _it has to_ assert, that the shard of the created chat-room is
`2` - the only shard, the test-instance owns, or _empty_, if the instance
under test does not implement sharding.
Kai Moritz [Fri, 8 Mar 2024 11:27:15 +0000 (12:27 +0100)]
test: Refactored `ChatRoomDataTest` - made mocking more clear
* When mocking the results of calls to `ChatMessageService`, the returned
message does naturally _not_ reflect the parameters of the call.
* Hence, a arbitrary message is used whenever the test only asserts,
that a value, that was returened by `ChatMessageService` is handed
through as expected by `ChatRoomData`.
Kai Moritz [Thu, 7 Mar 2024 16:43:30 +0000 (17:43 +0100)]
test: HandoverIT-POC - Refactored the startup of backend-containers
* The backend-containers are explicitly started during the test.
* When a backend is started, it is waited for, that the started backend
reportes its status as `UP`, _and_, that alle writers are again able to
send messages, afterwards.
Kai Moritz [Wed, 6 Mar 2024 14:34:20 +0000 (15:34 +0100)]
refactor: Simplified the configuration for the kafka-services
* Removed class `ChannelTaskRunner` and `KafkaServicesApplicationRunner`.
* Instead, the method `ChannelTaskExecutor.excuteChannelTask()` is executed
as `@Bean.initMethod` by Spring.
* Adapted the test-cases accordingly:
** Joinig the channel-tasks is not necessary any more, because that is
done by the imported production-config
** `KafkaConfigurationIT` has to call `executeChannelTasks()`
explicitly
** Therefore, it has to overrule the default-config for the bean
`dataChannelTaskExecutor` in order to drop the configuration of the
`initMethod`.
** Otherwise, the test would (might) not restore the data from the topic,
because the messages, that are send into the test-cluster, might arrive
only after the initial loading of the data is done.
Kai Moritz [Wed, 6 Mar 2024 09:07:53 +0000 (10:07 +0100)]
refactor: Simplified shutdown - channel-tasks were joined multiple times
* `KafkaServicesApplicationRunner` does not have to join the channel-tasks.
* The channel-tasks are already joined by `ChannelTaskExecutor.join()`
automatically, because the method is annotated with `@PreDestroy`.
* Simplified the test-configuration accordingly.
Kai Moritz [Wed, 6 Mar 2024 07:26:03 +0000 (08:26 +0100)]
fix: The shutdown of the application was blocked
* The auto-configured bean `applicationTaskExecutor` must not block, while
it is shutting down, because otherwise, it infinitly waits for the
completion, of the channel-tasks, which are stopped in a _later_ phase
of the "smart" lifecycle.
* The bean is destroyed first, becaus it is is associated with the lowest
lifecyle-phase (``Integer.MAX_VALUE``), which apparently cannot be
overruled by `@DependsOn` (although suggested by the spring-
documentation)
* Joining the channel-tasks was blocking infinitly, because the tasks were
waiting for the kafka-consumers to be closed, what only happens _after_
the joining completes.
* Renamed attributes and method-names according to the class-renames.
* Introduced interface `Channel` and `enum ChannelState`.
* `Data` - and `InfoChannel` maintain a `ChannelState`, instead just a
plain boolean, that only reflects the loading-state.
* The `ChannelTaskRunner` waits, until both channels entered the State
`ChannelState.SHUTTING_DOWN`.
* Renamed and moved `LoadInProgressException`
** Moved exception into implementation-specific package
** Renamed exception to `ChannelNotReadyException`
* Renamed `ConsumerTaskExecutor` into `ChannelTaskExecutor`
* Renamed `ConsumerTaskRunner` into `ChannelTaskRunner`
Kai Moritz [Sun, 3 Mar 2024 09:08:36 +0000 (10:08 +0100)]
test: HandoverIT-POC - Working fix: using `delayElements()`
* Switched from `Flux.flatMap(Mono.delay()..)` to
`Flux.from(..).delayElements()`FIX:delay_vs_delayElements.
* This delays eache element of the `Flux` by the same amount.
* The requests are made, when the according element of the flux is
executed - not when the `Flux` is created, as before.
Kai Moritz [Sat, 2 Mar 2024 17:21:44 +0000 (18:21 +0100)]
test: HandoverIT-POC - Not working fix: using `delay()`
* Switched from `Mono.from(..).delayElement()` to `Mono.delay().then()`.
* This does _not_ solve the problem, that all delays are calculated and
scheduled, when the `Flux` is created.
Kai Moritz [Sun, 3 Mar 2024 09:20:11 +0000 (10:20 +0100)]
WIP:test: HandoverIT-POC - Excuting Flux...
* Droped the waiting for `TestListener` alltogehter.
* The waiting can be droped, because waiting for the `TestWriter`-instances
ensures, that all messages are send (and therefore very likely received)
Kai Moritz [Wed, 28 Feb 2024 10:50:11 +0000 (11:50 +0100)]
fix: Fixed `ConcurrentModificationException` when accessing a chat-room
* If a new chat-room was created, `InfoChannel` only reacted with the
creation of the according `ChatRoomInfo`-instance.
* The creation of the accompanying `ChatRoomData`-instance through
`DataChannel` was posponed until the new chat-room was accessed the
first time.
* That way, `InfoChannel` did not need to know `DataChannel`, so that a
cyclic dependency could be avoided.
* As a downside, this approach was open to a race-condition: if several
accesses to the newly created chat-room happend in parallel, a
`ConcurrentModificationException` was thrown, since the instance of
`ChatRoomData` was created multiple times in parallel.
* To circumvent the locking, that would be necesarry to evade this race
condition, the approach was refactored, so that `InfoChannel` now
explicitly triggers the creation of the `ChatRoomData`-instance.
* To do so without introducing a cyclic dependency, the class
`ChannelMediator` was introduced, so that `InfoChannel` and `DataChannel`
need not to know each other.
Kai Moritz [Wed, 28 Feb 2024 10:14:32 +0000 (11:14 +0100)]
refactor: Introduced `ChannelMediator`
* `InfoChannel` and `DataChannel` must not know each other directly.
* This is necessary, to prevent a cyclic dependency, that would otherwise
be introduced, if `InfoChannel` also has to communicate with
`DataChannel`.