Monday, 04 June 2012
Sometimes conferences can be dull and boring, but sometimes they can be just awesome in terms of finding the right people to collaborate with. Linaro Connect in Hong Kong last week was definitely one of the great ones!
I chaired my my usual sessions (armhf status and cross-distro ARM) and we had some lively discussion in both. We're probably just about done with the armhf sessions, as most distros have accepted a hard-float ARMv7 port now and there's not so much specific work left there now that future sessions will be necessary. The cross-distro work for ARM ports is likely to continue into the future, but we're going to be concentrating on bootstrapping work for ARMv8 soon.
Talking v8, there were a lot of meetings discussing the various work topics for this new 64-bit ARM architecture: kernel, toolchains, bootstrapping etc. More to come on that soon!
On top of all this useful discussion and planning, we also found time during the week for some hacking. This was the highlight of the week for me, as I found some expert help to solve my long-standing Ruby on ARM bug (Debian bug #652674, ruby1.9.1: FTBFS on armhf: test suite segfaults). Ulrich Weigand (an IBM toolchain wizard seconded into the Linaro team) sat with me for a couple of hours while we worked through reproducing the problem, only to find that he could not reproduce it! The crash had looked very much like a pthreads locking bug, which was scaring me. After some digging, we worked out where the problem was, and how it was now fixed.
For a long time on ARM, the Linux getcontext/setcontext system calls have never been implemented; apparently nobody really missed them, so they have never been a priority. In Ruby 1.9.x, the implementation of the new "fiber" primitive wanted to use getcontext/setcontext to control stack state etc. in different threads of control. In cases where they are not available or known not to work well, Ruby has fallback code to implement similar functionality. It seems that fallback code is buggy. Maybe it was correct at some point and has bit-rotted due to not being exercised, or maybe it was buggy as written, but it clearly does not work correctly for us now. In the last few months, getcontext/setcontext have finally been implemented for ARM in glibc trunk (by Michael Hope, also in the Linaro toolchain team!) and backported into current Debian and Ubuntu eglibc versions. Re-running configure and rebuilding Ruby against the most recent code in both Sid and Precise fixed the test suite crash we were seeing earlier. Yay! We could also provoke the bug again at will by quickly hacking around the Ruby source to force it to switch back to the fallback code, thereby verifying the fix.
Why does this matter? As we're expecting ARM servers to enter the market soon, web apps written in Ruby on Rails are going to be an important part of the software stack that customers will want to run. Broken fibers and threading would not help here!
It's great to meet up and work with talented folks like this; Linaro Connect is an excellent event for getting stuff done!