[ODLPARENT-49] Karaf ssh EOFError (is it due to low entropy, due to Java's default use of blocking /dev/random instead of /dev/urandom?) Created: 23/Sep/16 Updated: 24/Jan/18 Resolved: 18/Jan/17 |
|
| Status: | Resolved |
| Project: | odlparent |
| Component/s: | General |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Michael Vorburger | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: Linux |
||
| Attachments: |
|
| External issue ID: | 6790 |
| Description |
|
JamO & others (incl. dfarrell) report relatively frequently seeing EOFError in CSIT Robot Suites when ssh into Karaf (not OS level sshd). |
| Comments |
| Comment by Michael Vorburger [ 23/Sep/16 ] |
|
https://lists.opendaylight.org/pipermail/dev/2016-September/002704.html : One theory we have is that perhaps this could be due to low entropy. One solution to this would be to make the JVM process of the ssh server in Karaf (which apparently we've configured to use boucycastle) use the non-blocking /dev/urandom instead of the blocking /dev/random. https://brooklyn.apache.org/documentation/increase-entropy.html is another Java based system with similar issues, so maybe this is related. https://lists.opendaylight.org/pipermail/dev/2016-September/002727.html => https://git.opendaylight.org/gerrit/#/c/45749/ |
| Comment by Michael Vorburger [ 23/Sep/16 ] |
|
https://lists.opendaylight.org/pipermail/dev/2016-September/002734.html Ryan Goulding confirms that: "In our downstream internal CI, I had to |
| Comment by Michael Vorburger [ 23/Sep/16 ] |
|
https://lists.opendaylight.org/pipermail/dev/2016-September/002785.html Jamo Luhrsen reports entropy was 168 in /proc/sys/kernel/random/entropy_avail when this happened. https://git.opendaylight.org/gerrit/#/c/45760/ is a change to use -Djava.security.egd=file:/dev/./urandom on start up (based on http://stackoverflow.com/a/2325109/421602). https://lists.opendaylight.org/pipermail/dev/2016-September/002786.html disagrees that low entropy could be the root cause of this problem. This http://www.2uo.de/myths-about-urandom/ is interesting. |
| Comment by Stephen Kitt [ 23/Sep/16 ] |
|
(In reply to Michael Vorburger from comment #3) But that might well be wrong, given that OpenSSH uses /dev/urandom anyway. |
| Comment by Michael Vorburger [ 23/Sep/16 ] |
|
FTR: https://bugs.openjdk.java.net/browse/JDK-4705093 = http://bugs.java.com/view_bug.do?bug_id=4705093 https://docs.oracle.com/cd/E13209_01/wlcp/wlss30/configwlss/jvmrand.html |
| Comment by Michael Vorburger [ 23/Sep/16 ] |
|
Once we have confirmation that with the merge of the https://git.opendaylight.org/gerrit/#/c/45760/ (now cleaned up/refined, thanks Stephen Kitt!) this problem disappears, we should also: A. Open a bug + pull request (skitt) on upstream Karaf (4) to do the same as we are B. Open a bug on https://bugzilla.redhat.com to suggest that perhaps OpenJDK RPM packages could "Change the default for java.security in $JAVA_HOME/jre/lib/security/java.security from file:/dev/random to file:/dev/urandom", some time. "If deemed too risky a change for a Java 8 security patch, perhaps consider this for Java 9 packages? If this is controversial, perhaps it's time to re-raise this upstream on OpenJDK?" |
| Comment by Michael Vorburger [ 11/Jan/17 ] |
|
https://git.opendaylight.org/gerrit/#/c/50327/ fixes SingleFeatureTest up re. this. Attached a jstack of the Karaf stuck due to this. |
| Comment by Michael Vorburger [ 11/Jan/17 ] |
|
Attachment karaf.jstack.txt has been added with description: jstack of Karaf stuck in netconf SSH server init due to low entropy |
| Comment by Vratko Polak [ 12/Jan/17 ] |
|
This is still affecting CSIT. See comment [0]. Either releng/builder manages to prepare machines with sufficinet entropy, or we should switch to testing Karaf with "-Djava.security.egd=file:/dev/./urandom" option added. |
| Comment by Vratko Polak [ 17/Jan/17 ] |
|
More info from https://lists.opendaylight.org/pipermail/integration-dev/2017-January/008955.html > [3] https://git.opendaylight.org/gerrit/#/c/45760 Oh, that was merged long time ago, It the option worked, we would not need more entropy on ODL_SYSTEM. Vratko. |
| Comment by Vratko Polak [ 18/Jan/17 ] |
|
https://git.opendaylight.org/gerrit/50594 claim to fix CSIT failures. If that is true, this error was not about entropy after all. There is still some chance that the Karaf SSH server is slow because it blocks on /dev/random, but that would need closer examination to decide. |