Lets have more green trees

I have been working on making jobs ignore intermittent failures for mochitests (bug 1036325) on try servers to prevent unnecessary oranges, and save resources that goes into retriggering those jobs on tbpl. I am glad to announce that this has been achieved for desktop mochitests (linux, osx and windows). It doesn’t work for android/b2g mochitests but they will be supported in the future. This post explains how it works in detail and is a bit lengthy, so bear with me.

Lets see the patch in action. Here is an example of an almost green try push:

Tbpl Push Log

 Note: one bc1 orange job is because of a leak (Bug 1036328)

In this push, the intermittents were suppressed, for example this log shows an intermittent on mochitest-4 job on linux :

tbpl1

Even though there was an intermittent failure for this job, the job remains green. We can determine if a job produced an intermittent  by inspecting the number of tests run for the job on tbpl, which will be much smaller than normal. For example in the above intermittent mochitest-4 job it shows “mochitest-plain-chunked: 4696/0/23” as compared to the normal “mochitest-plain-chunked: 16465/0/1954”. Another way is looking at the log of the particular job for “TEST-UNEXPECTED-FAIL”.

<algorithm>

The algorithm behind getting a green job even in the presence of an intermittent failure is that we recognize the failing test, and run it independently 10 times. If the test fails < 3 times out of 10, it is marked as intermittent and we leave it. If it fails >=3 times out of 10, it means that there is a problem in the test turning the job orange.

</algorithm>

Next to test the case of a “real” failure, I wrote a unit test and tested it out in the try push:

tbpl4

This job is orange and the log for this push is:

tbpl3

In this summary, a test is failing for more than three times and hence we get a real failure. The important line in this summary is:

3086 INFO TEST-UNEXPECTED-FAIL | Bisection | Please ignore repeats and look for ‘Bleedthrough’ (if any) at the end of the failure list

This tells us that the bisection procedure has started and we should look out for future “Bleedthrough”, that is, the test causing the failure. And at the last line it prints the “real failure”:

TEST-UNEXPECTED-FAIL | testing/mochitest/tests/Harness_sanity/test_harness_post_bisection.html | Bleedthrough detected, this test is the root cause for many of the above failures

Aha! So we have found a permanent failing test and it is probably due to some fault in the developer’s patch. Thus, the developers can now focus on the real problem rather than being lost in the intermittent failures.

This patch has landed on mozilla-inbound and I am working on enabling it as an option on trychooser (More on that in the next blog post). However if someone wants to try this out now (works only for desktop mochitest), one can hack in just a single line:

options.bisectChunk = ‘default’

such as in this diff inside runtests.py and test it out!

Hopefully, this will also take us a step closer to AutoLand (automatic landing of patches).

Other Bugs Solved for GSoC:

[1028226] – Clean up the code for manifest parsing
[1036374] – Adding a binary search algorithm for bisection of failing tests
[1035811] – Mochitest manifest warnings dumped at start of each robocop test

A big shout out to my mentor (Joel Maher) and other a-team members for helping me in this endeavour!

Advertisements

GSoC 2014 Progress: Coding Period Starts!

In the last two weeks, I have started coding for the “Mochitest Failure Investigator” GSoC project (Bug 1014125). The work done in these two weeks:

  • Added mach support for –bisect-chunk (this option is used if a user wants to explicitly provide the name of the failing test). This would help in faster debugging locally.
  • Wrote a prototype patch in which I coded two algorithms namely, Reverse Search and Binary Search. As the name suggests, Reverse Search is used to split all the tests before the failing test in 10 chunks and iterate over each chunk till the failing chunk is found. Once, the failing chunk is found each test in that chunk is iterated again to determine the test causing the failure. Binary Search on the other hand split the tests in halves and iterates over each half in a recursive way to find the failure point.
  • The mochitest test harness did not support random test filtering but only supported sequential tests filtering, that is, if we needed to run “test1”, “test2” and “test99” we could not do that and we had to run all the tests for 1 to 99. So, I initially implemented the algorithm such that tests are run in sequential way, however this method was not optimal as a lot of unnecessary tests were run again and again.
  • Next, I optimized both the search methods and added support for running random tests in the mochitest test harness. This was done by filtering the tests that were added to tests.json when the profile was created.
  • I re-factored the patch on  :jmaher’s recommendations and made the code more modular. This was followed by testing the patch for the sample test cases that I had initially developed on the try server  for mochitest-plain, mochitest-chrome and browser-chrome tests.

The results on try were fantastic.

A typical binary search bisection algorithm on try looks like this:

binary-search-example

A typical reverse search algorithm looks like this:

reverse-search-exampleThe “TEST-BLEEDTHROUGH” shows the test responsible for the failure. As we would expect, reverse search performs better than binary search when the failure point is closer to the failing test and vice-versa. The bisection algorithms took 20 mins on an average to compute the result.

How is all of this useful?

The contributors in mozilla spend large amount of effort to investigate test failures. This tool will help in increasing productivity, and saving on an average 4-6 hours of finding the test case down and also reduce the number of unnecessary try pushes. Once this tool is hooked up on try, it will monitor the tests and as soon as the first failure occurs, it will bisect and find the failure point. Also, we can use this tool to validate new tests and help reducing intermittent problems by adding tests in chunks and verifying whether they are passing and if not which tests are affecting the to be added test. We can also use this tool to find out the reason for timeout/crash of a chunk.

It has been quite exciting to tackle mochitest problems with :jmaher . He is an amazing mentor. In the coming weeks, I will be working on making the tool support intermittent problems and incorporating the logic of auto-bisection. Happy hacking!