[CONTROLLER-1333] Clustering : Performance issues in BGP scale test Created: 22/May/15  Updated: 10/Jun/15  Resolved: 10/Jun/15

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: Post-Helium
Fix Version/s: None

Type: Bug
Reporter: Dana Kutenicsova Assignee: Moiz Raja
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Duplicate
is duplicated by CONTROLLER-1072 Clustering: akka.pattern.AskTimeoutEx... Resolved
External issue ID: 3340

 Description   

After connecting BGP to a feed, it timeouts after about 3 minutes with ask.timeout exceptions. Build is: distribution-karaf-0.3.0-20150522.085838-1738. I'll send steps to reproduce in an email. Here's the exception:

2015-05-22 07:48:52,740 | INFO | oupCloseable-3-1 | AbstractBGPSessionNegotiator | 227 - org.opendaylight.bgpcep.bgp-rib-impl - 0.4.0.SNAPSHOT | BGP Session established successfully.
2015-05-22 07:53:15,620 | ERROR | lt-dispatcher-39 | LocalThreePhaseCommitCohort | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Failed to prepare transaction member-1-txn-6043 on backend
akka.pattern.AskTimeoutException: Ask timed out on ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-operational/member-1-shard-default-operational#-1012054239)] after [30000 ms]
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)[154:com.typesafe.akka.actor:2.3.10]
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)[154:com.typesafe.akka.actor:2.3.10]
at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)[151:org.scala-lang.scala-library:2.10.4.v20140209-180020-VFINAL-b66a39653b]
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)[151:org.scala-lang.scala-library:2.10.4.v20140209-180020-VFINAL-b66a39653b]
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)[154:com.typesafe.akka.actor:2.3.10]
at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)[154:com.typesafe.akka.actor:2.3.10]
at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)[154:com.typesafe.akka.actor:2.3.10]
at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)[154:com.typesafe.akka.actor:2.3.10]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_79]
2015-05-22 07:53:15,621 | ERROR | lt-dispatcher-38 | LocalThreePhaseCommitCohort | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Failed to prepare transaction member-1-txn-4520 on backend
akka.pattern.AskTimeoutException: Ask timed out on ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-operational/member-1-shard-default-operational#-1012054239)] after [30000 ms]
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)[154:com.typesafe.akka.actor:2.3.10]
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)[154:com.typesafe.akka.actor:2.3.10]
at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)[151:org.scala-lang.scala-library:2.10.4.v20140209-180020-VFINAL-b66a39653b]
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)[151:org.scala-lang.scala-library:2.10.4.v20140209-180020-VFINAL-b66a39653b]
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)[154:com.typesafe.akka.actor:2.3.10]
at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)[154:com.typesafe.akka.actor:2.3.10]
at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)[154:com.typesafe.akka.actor:2.3.10]
at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)[154:com.typesafe.akka.actor:2.3.10]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_79]
2015-05-22 07:53:15,622 | WARN | lt-dispatcher-23 | ConcurrentDOMDataBroker | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx: DOM-CHAIN-6-2369 Error during phase CAN_COMMIT, starting Abort
akka.pattern.AskTimeoutException: Ask timed out on ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-operational/member-1-shard-default-operational#-1012054239)] after [30000 ms]
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)[154:com.typesafe.akka.actor:2.3.10]
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)[154:com.typesafe.akka.actor:2.3.10]
at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)[151:org.scala-lang.scala-library:2.10.4.v20140209-180020-VFINAL-b66a39653b]
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)[151:org.scala-lang.scala-library:2.10.4.v20140209-180020-VFINAL-b66a39653b]
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)[154:com.typesafe.akka.actor:2.3.10]
at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)[154:com.typesafe.akka.actor:2.3.10]
at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)[154:com.typesafe.akka.actor:2.3.10]
at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)[154:com.typesafe.akka.actor:2.3.10]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_79]



 Comments   
Comment by Moiz Raja [ 27/May/15 ]

I enabled debug logging, grepped for YimeoutException and then grepped on some of the transactions that were failing. I see a pattern (which I cannot explain yet). Most of the transactions that fail with the timeout happen because "FinfPrimaryShard" fails with a timeout.

See this snippet -> http://pastebin.com/yBh2nrSs

A couple of curious things,

a. The AkkaTimeout shows the actor to be the shard
b. The timeout is 30 seconds which is the same as the commit timeout

This does not make sense because the find primary request is sent to the ShardManager which is the parent of Shard.

Comment by Moiz Raja [ 27/May/15 ]

I reran the test and now I see the timeouts to be 5 seconds with the actor still being the shard - puzzling - I'm wondering if there is some mixup happening here and the timeout is being delivered to the wrong handler.

Comment by Moiz Raja [ 27/May/15 ]

Based on what I am seeing I think the following patch (which was merged May 26th) will help because it avoids looking up the primary shard unless ShardManager actually removes it from the PrimaryShardInfo cache.

https://git.opendaylight.org/gerrit/#/c/20571/

Dana, if you could try this out it would be good. I'm hitting some trouble building an integration distribution otherwise would have tried it myself.

Comment by Tom Pantelis [ 27/May/15 ]

That is odd. What's even odder is that the ActorContext uses the shard-initialization-timeout for FindPrimary which is set to 5 min.

Did you have ActorContext and ShardManager debug on as well? If the PrimaryShardInfo future wasn't cached yet then we should've seen see the "Sending message FindPrimary..." output from ActorContext. It seems the PrimaryShardInfo future was not yet cached b/c the "Find primary for shard default failed" output was logged from an akka dispatcher thread which indicates it was invoked via the OnComplete callback. The AbstractTransactionContextFactory first checks Future#isCompleted to elide the async OnComplete wait on the Future so if it was already completed then the output would've been logged on the "oupCloseable-3-1" thread.

Getting back to the AskTimeoutException indicating the "default" shard actor path, a possible explanation is that the call to actorSelection in ActorSelection#onPrimaryShardFound is what threw the AskTimeoutException. However, where did the 30000 ms timeout some from? Also the "findPrimaryShardAsync received ..." debug output should've been seen (assuming you had ActorContext debug on).

In the paste bin, the NoOpTransactionContext didn't print the AskTimeoutException stack trace - possibly b/c it was already printed earlier too many times. Hopefully that would give an indication as to the source of the ask failure.

Comment by Moiz Raja [ 27/May/15 ]

Since bugzilla won't let me I've uploaded the whole debug enabled log file here https://cisco.box.com/s/eesdiopbocetz2qkv4orjy3inknde7v1.

Note that in this log you will see Akka timeouts happening at 5 seconds but the actor is still the Shard.

Comment by Tom Pantelis [ 27/May/15 ]

Notice the time interval between these 2 log entries for tx member-1-txn-4668:

2015-05-26 21:11:39,003 | DEBUG | oupCloseable-3-1 | SingleCommitCohortProxy | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx member-1-txn-4668 canCommit
2015-05-26 21:18:51,324 | DEBUG | ult-dispatcher-3 | bstractTransactionContextFactory | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx member-1-txn-4668: Find primary for shard default failed

Over 7 minutes passed between the time it was created and readied by the caller and the time it failed to find the primary shard.

The ActorContext sent out 2 FindPrimary messages ~9 sec after the tx was created:

2015-05-26 21:11:48,516 | DEBUG | lt-dispatcher-39 | ActorContext | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Sending message class org.opendaylight.controller.cluster.datastore.messages.FindPrimary to Actorakka://opendaylight-cluster-data/user/shardmanager-operational#967672581
2015-05-26 21:11:48,517 | DEBUG | ult-dispatcher-5 | ActorContext | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Sending message class org.opendaylight.controller.cluster.datastore.messages.FindPrimary to Actorakka://opendaylight-cluster-data/user/shardmanager-operational#967672581

I assume 1 of them was for this tx. It appears the 9 sec gap is primarily due to waiting for the previous ready Future to complete. The requests were processed by the ShardManager and the responses came back quickly to the ActorContext:

2015-05-26 21:11:48,517 | DEBUG | lt-dispatcher-39 | ShardManager | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | shard-manager-operational: Found primary for default: LocalPrimaryShardFound [primaryPath=akka.tcp://opendaylight-cluster-data@127.0.0.1:2550/user/shardmanager-operational/member-1-shard-default-operational#-701003399, localShardDataTree=org.opendaylight.yangtools.yang.data.impl.schema.tree.InMemoryDataTree@58bde7ca]
2015-05-26 21:11:48,517 | DEBUG | lt-dispatcher-39 | ShardManager | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | shard-manager-operational: Found primary for default: LocalPrimaryShardFound [primaryPath=akka.tcp://opendaylight-cluster-data@127.0.0.1:2550/user/shardmanager-operational/member-1-shard-default-operational#-701003399, localShardDataTree=org.opendaylight.yangtools.yang.data.impl.schema.tree.InMemoryDataTree@58bde7ca]

2015-05-26 21:11:48,517 | DEBUG | ult-dispatcher-5 | ActorContext | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | findPrimaryShardAsync received: LocalPrimaryShardFound [primaryPath=akka.tcp://opendaylight-cluster-data@127.0.0.1:2550/user/shardmanager-operational/member-1-shard-default-operational#-701003399, localShardDataTree=org.opendaylight.yangtools.yang.data.impl.schema.tree.InMemoryDataTree@58bde7ca]
2015-05-26 21:11:48,517 | DEBUG | lt-dispatcher-33 | ActorContext | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | findPrimaryShardAsync received: LocalPrimaryShardFound [primaryPath=akka.tcp://opendaylight-cluster-data@127.0.0.1:2550/user/shardmanager-operational/member-1-shard-default-operational#-701003399, localShardDataTree=org.opendaylight.yangtools.yang.data.impl.schema.tree.InMemoryDataTree@58bde7ca]

So the FindPrimary message itself was quick. Also it succeeded so the AskTimeoutEx didn't come from that message. So what accounts for the 7 min gap between "findPrimaryShardAsync received" and the AbstractTransactionContextFactory receiving the Future callback? There could've been a long GC pause (for that long?). Or it could've taken that long for a thread to become available and process the OnComplete callback (not likely). But neither of those account for the AskTimeoutEx.

I suspect it was onFindPrimaryShardFound that took 7 min. The only code I see that could be the culprit are the call to actorSelection and the call to onShardInfoUpdated while holding the lock. The latter shouldn't take that long even if the lock was contended. The call to actorSelection looks suspicious. I'm not sure what it does internally - maybe it tries to "ping" the shard actor but times out and throws the AskTimeoutEx - again that would explain the presence of the shard actor path in the AskTimeoutEx. It would be interesting to put a try/catch around actorSelection.

I agree that https://git.opendaylight.org/gerrit/#/c/20571/ should help here. It should alleviate the excessive FindPrimary messages.

Comment by Moiz Raja [ 27/May/15 ]

Another curious thing about most of the transactions failing with AskTimeout - there were at most 1 or 2 modifications in those transactions. Why would that be when most transactions seem to have thousands?

Comment by Moiz Raja [ 27/May/15 ]

Tom, I don't think that actorSelection taking a long time is the root cause. Creating an actorSelection is supposed to be cheap. Using an actor selection is more expensive than an ActorRef. If we go with your theory then actorSelection would have to do an ask to the actor and wait for a response.

Note that the timeouts we are seeing 5s/30s are the kind of timeouts we have set.

I think there is something else going on here.

Also note a bunch of consecutive transactions failed.

550777:2015-05-26 21:18:51,324 | DEBUG | ult-dispatcher-3 | bstractTransactionContextFactory | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx member-1-txn-4668: Find primary for shard default failed
550905:2015-05-26 21:19:01,529 | DEBUG | ult-dispatcher-3 | bstractTransactionContextFactory | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx member-1-txn-4669: Find primary for shard default failed
551017:2015-05-26 21:19:01,531 | DEBUG | ult-dispatcher-3 | bstractTransactionContextFactory | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx member-1-txn-4670: Find primary for shard default failed
551092:2015-05-26 21:19:04,493 | DEBUG | ult-dispatcher-3 | bstractTransactionContextFactory | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx member-1-txn-4671: Find primary for shard default failed
551165:2015-05-26 21:19:04,495 | DEBUG | ult-dispatcher-3 | bstractTransactionContextFactory | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx member-1-txn-4672: Find primary for shard default failed
551288:2015-05-26 21:19:09,490 | DEBUG | ult-dispatcher-3 | bstractTransactionContextFactory | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx member-1-txn-4673: Find primary for shard default failed
551348:2015-05-26 21:19:09,493 | DEBUG | ult-dispatcher-3 | bstractTransactionContextFactory | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx member-1-txn-4674: Find primary for shard default failed

Comment by Dana Kutenicsova [ 27/May/15 ]

I tried build https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/distribution-karaf/0.3.0-SNAPSHOT/distribution-karaf-0.3.0-20150527.075221-1862.tar.gz , which should contain the change 20571. From my point of view it didn't help, the logs are in ~/27-05-fail-karaf.log

Comment by Tom Pantelis [ 27/May/15 ]

An actorSelecton shouldn't take that long but then again nothing should But something took 7 min and the logs indicate it wasn't FindPrimary...

I think the bunch of consecutive failed transactions is b/c they were all backed up then timed out similarly.

I agree the 5s/30s timeouts look like ours - that part is odd. The 5s is more odd b/c that would correspond to the operation timeout but we only use that now for remote tx operations - all these txs are local.

(In reply to Moiz Raja from comment #8)
> Tom, I don't think that actorSelection taking a long time is the root cause.
> Creating an actorSelection is supposed to be cheap. Using an actor selection
> is more expensive than an ActorRef. If we go with your theory then
> actorSelection would have to do an ask to the actor and wait for a response.
>
> Note that the timeouts we are seeing 5s/30s are the kind of timeouts we have
> set.
>
> I think there is something else going on here.
>
> Also note a bunch of consecutive transactions failed.
>
> 550777:2015-05-26 21:18:51,324 | DEBUG | ult-dispatcher-3 |
> bstractTransactionContextFactory | 169 -
> org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx
> member-1-txn-4668: Find primary for shard default failed
> 550905:2015-05-26 21:19:01,529 | DEBUG | ult-dispatcher-3 |
> bstractTransactionContextFactory | 169 -
> org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx
> member-1-txn-4669: Find primary for shard default failed
> 551017:2015-05-26 21:19:01,531 | DEBUG | ult-dispatcher-3 |
> bstractTransactionContextFactory | 169 -
> org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx
> member-1-txn-4670: Find primary for shard default failed
> 551092:2015-05-26 21:19:04,493 | DEBUG | ult-dispatcher-3 |
> bstractTransactionContextFactory | 169 -
> org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx
> member-1-txn-4671: Find primary for shard default failed
> 551165:2015-05-26 21:19:04,495 | DEBUG | ult-dispatcher-3 |
> bstractTransactionContextFactory | 169 -
> org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx
> member-1-txn-4672: Find primary for shard default failed
> 551288:2015-05-26 21:19:09,490 | DEBUG | ult-dispatcher-3 |
> bstractTransactionContextFactory | 169 -
> org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx
> member-1-txn-4673: Find primary for shard default failed
> 551348:2015-05-26 21:19:09,493 | DEBUG | ult-dispatcher-3 |
> bstractTransactionContextFactory | 169 -
> org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx
> member-1-txn-4674: Find primary for shard default failed

Comment by Tom Pantelis [ 27/May/15 ]

I can't get to ~/27-05-fail-karaf.log

(In reply to Dana Kutenicsova from comment #9)
> I tried build
> https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/
> org/opendaylight/integration/distribution-karaf/0.3.0-SNAPSHOT/distribution-
> karaf-0.3.0-20150527.075221-1862.tar.gz , which should contain the change
> 20571. From my point of view it didn't help, the logs are in
> ~/27-05-fail-karaf.log

Comment by Moiz Raja [ 27/May/15 ]

Here is 27-05-karaf.log https://cisco.box.com/s/m5o616g8qsye2urcvr844zfim2l5ikt5

But it does not have debug logging. I'll enable that again and run the test then post the log after.

Comment by Moiz Raja [ 27/May/15 ]

Debug logs https://cisco.box.com/s/qbw7pexrgkesgbuqpjgtij2p7kojf324

Comment by Tom Pantelis [ 27/May/15 ]

The default-operational shard sent the LeaderStateChanged event at :

2015-05-27 12:20:34,187 | INFO | er-data-akka.actor.default-dispatcher-34 | ShardManager | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | shard-manager-operational: Received LeaderStateChanged message: LeaderStateChanged [memberId=member-1-shard-default-operational, leaderId=member-1-shard-default-operational]

The FindPrimary message was sent and received 5 sec later

2015-05-27 12:20:39,113 | DEBUG | er-data-akka.actor.default-dispatcher-16 | ActorContext | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | findPrimaryShardAsync received: LocalPrimaryShardFound [primaryPath=akka.tcp://opendaylight-cluster-data@127.0.0.1:2550/user/shardmanager-operational/member-1-shard-default-operational#-1521885687, localShardDataTree=org.opendaylight.yangtools.yang.data.impl.schema.tree.InMemoryDataTree@3eb14c15]

In fact this is the only such log entry for default-operational, which makes sense b/c it should've been cached thereafter and there were no more LeaderStateChanged events to clear it.

Also we see that the local data tree was resolved so it must have gotten put in the cache:

2015-05-27 12:20:39,114 | DEBUG | er-data-akka.actor.default-dispatcher-16 | bstractTransactionContextFactory | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Shard default resolved to local data tree - adding local factory

But yet a tx ~35 sec later failed to find the primary shard. Notice the 11 sec gap.

2015-05-27 12:21:14,949 | DEBUG | er-data-akka.actor.default-dispatcher-19 | SingleCommitCohortProxy | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx member-1-txn-7907 canCommit
2015-05-27 12:32:15,902 | DEBUG | er-data-akka.actor.default-dispatcher-32 | bstractTransactionContextFactory | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx member-1-txn-7907: Find primary for shard default failed

But I see no FindPrimary message going out between 12:21:14 and 12:32:15 which indicates it got the Future from the cache. But how could the Future indicate failure when we only put immediate, successful Futures in the cache? Plus other transactions at 12:20:39+ succeeded b/c I see ReadyLocalTransaction messages going out.

It sort of seems like there's something funky going on with akka/scala Futures - almost like it somehow loses its completed state and gets some unrelated failure associated to it.

Maybe we should try switching to use ListenableFutures for the cache...

Comment by Moiz Raja [ 27/May/15 ]

Could the future returned to us be mutable? Could it be reused by akka and set to something else? What if we cached the PrimaryShardInfo instead of the future?

Comment by Moiz Raja [ 27/May/15 ]

Mutable futures is making less sense to me now. In PrimaryShardInfoFutureCache#putSuccessful we create a new future instead of storing the one from akka#ask.

Now it only looks like this may be a case of a confused future...

Comment by Tom Pantelis [ 27/May/15 ]

Yeah - it doesn't make sense. We need more debug logging - I'll push a patch for that. I want to try to prove that the scala/akka Future got messed up somehow or maybe I did something wrong in the isCompleted code block.

We really don't need to use a scala Future here anyway - ListenableFutures are more of a known commodity - I feel more comfortable with them - I've read the source code and fully understand how they work. I don't really know enough scala to be able to understand their code at this point.

Comment by Moiz Raja [ 27/May/15 ]

Tom, hold off on the ListenableFuture changes till we put in more diagnostics. I'll try to push a diagnostic patch as well.

Comment by Moiz Raja [ 04/Jun/15 ]

This appears to have been fixed by https://git.opendaylight.org/gerrit/#/c/21749/

Comment by Moiz Raja [ 09/Jun/15 ]

On my machine running a test with 100K routes takes 15 mins for CDS vs 4.5 mins for IMDS.

Comment by Moiz Raja [ 10/Jun/15 ]

Performance is acceptable now : 1M routes ingested in 5m

Generated at Wed Feb 07 19:55:16 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.