Lets have more green trees

I have been working on making jobs ignore intermittent failures for mochitests (bug 1036325) on try servers to prevent unnecessary oranges, and save resources that goes into retriggering those jobs on tbpl. I am glad to announce that this has been achieved for desktop mochitests (linux, osx and windows). It doesn’t work for android/b2g mochitests but they will be supported in the future. This post explains how it works in detail and is a bit lengthy, so bear with me.

Lets see the patch in action. Here is an example of an almost green try push:

Tbpl Push Log

 Note: one bc1 orange job is because of a leak (Bug 1036328)

In this push, the intermittents were suppressed, for example this log shows an intermittent on mochitest-4 job on linux :

tbpl1

Even though there was an intermittent failure for this job, the job remains green. We can determine if a job produced an intermittent  by inspecting the number of tests run for the job on tbpl, which will be much smaller than normal. For example in the above intermittent mochitest-4 job it shows “mochitest-plain-chunked: 4696/0/23” as compared to the normal “mochitest-plain-chunked: 16465/0/1954”. Another way is looking at the log of the particular job for “TEST-UNEXPECTED-FAIL”.

<algorithm>

The algorithm behind getting a green job even in the presence of an intermittent failure is that we recognize the failing test, and run it independently 10 times. If the test fails < 3 times out of 10, it is marked as intermittent and we leave it. If it fails >=3 times out of 10, it means that there is a problem in the test turning the job orange.

</algorithm>

Next to test the case of a “real” failure, I wrote a unit test and tested it out in the try push:

tbpl4

This job is orange and the log for this push is:

tbpl3

In this summary, a test is failing for more than three times and hence we get a real failure. The important line in this summary is:

3086 INFO TEST-UNEXPECTED-FAIL | Bisection | Please ignore repeats and look for ‘Bleedthrough’ (if any) at the end of the failure list

This tells us that the bisection procedure has started and we should look out for future “Bleedthrough”, that is, the test causing the failure. And at the last line it prints the “real failure”:

TEST-UNEXPECTED-FAIL | testing/mochitest/tests/Harness_sanity/test_harness_post_bisection.html | Bleedthrough detected, this test is the root cause for many of the above failures

Aha! So we have found a permanent failing test and it is probably due to some fault in the developer’s patch. Thus, the developers can now focus on the real problem rather than being lost in the intermittent failures.

This patch has landed on mozilla-inbound and I am working on enabling it as an option on trychooser (More on that in the next blog post). However if someone wants to try this out now (works only for desktop mochitest), one can hack in just a single line:

options.bisectChunk = ‘default’

such as in this diff inside runtests.py and test it out!

Hopefully, this will also take us a step closer to AutoLand (automatic landing of patches).

Other Bugs Solved for GSoC:

[1028226] – Clean up the code for manifest parsing
[1036374] – Adding a binary search algorithm for bisection of failing tests
[1035811] – Mochitest manifest warnings dumped at start of each robocop test

A big shout out to my mentor (Joel Maher) and other a-team members for helping me in this endeavour!

About these ads

2 thoughts on “Lets have more green trees

  1. What is it that you’re planning to do with this work? The post makes it sound like the intent is that try pushes will automatically ignore intermittent oranges. Wouldn’t that make it less likely for people to detect new intermittent oranges on try, and thus more likely to land them and make the tree more orange?

    • Thanks for your comment David!
      Yes, the intent is to make try pushes automatically ignore intermittent oranges and get a step closer to AutoLand. We know, as a fact, that out test coverage is imperfect and that on any given run there are highly likely to be some unexpected test results that don’t represent real regressions. We also know, for a fact, that we can regress many things without ever noticing. However because the former are visible and the latter are invisible, we worry much more about the risk that a scheme to automatically filter out suspected noise will temporarily hide some real new regressions, than we do about the real regressions that we can’t detect in an automated way.
      The scheme that I have currently implemented is to ignore an intermittent test if it fails less than 20% of the time and to make the job orange otherwise. This will, to some extent, ensure that the failure is not a major one. I agree to the fact that there may be some new intermittents that are ignored on try, and may be determined later on, but I think it will be better not to block progress on infrastructure that will help productivity, given the fact that our testing coverage is not 100% perfect.
      I accept that my current implementation is not perfect, I would like to hear out your views and suggestions to make the implementation better :)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s