A-Team Contributions in 2015

It has been a while since I last blogged and I had something interesting to share so I finally managed to overcome my laziness. In this post, I would like to talk about some of the projects that I have been involved with in 2015 in the A-Team at Mozilla. Before I talk about the projects, I would like to give a shout out to @jmaher, @armenzg, @kmoir, @dminor, @ahal, @gbrown, @wlach who were always there to answer my questions, help me find a way to solve problems when I was stuck and review my patches. I have worked on some exciting problems and made some great progress on the projects in the past 4 months. They are:

1) SETA – Search for Extraneous Test Automation
We run hundreds of builds and thousands of test jobs each day on our integration branches, that is , mozilla-inbound and fx-team. And as more and more platforms are added every month, the load on test machines is ever increasing. But are so many test jobs for each push required? We run the test jobs to catch failures but majority of time, the test jobs pass and the ones who indeed catch failures often have duplicate ones. SETA tries to tackle this problem by being smart about utilizing machine cycles. In SETA, we find the minimum number of jobs that are needed to find all the failures that have occurred in the last six months on integration machines. From this data, we predict that these jobs will be more likely to catch failures than others and therefore other test jobs are set to run less frequently. It is true that we will be wrong certain number of times and when we miss a failure, the sheriffs would need to backfill some jobs to find the root cause. But most of the time, it will work. Joel has done an excellent blog post giving examples and statistics that has been done in this project. This project has been deployed in Mozilla Releng production systems and we have reduced the number of jobs to roughly 150-190 jobs/push from 350-400 jobs/push per day on desktop (linux, osx, win) platforms, a 50% reduction during high load weekdays. To put this into perspective, the past week we have seen the lowest jobs per push since January 1st on both mozilla-inbound and fx-team. I see this as a huge win as it drastically reduces the load on our machines as well as reduces the time the sheriffs need to star intermittents, increasing productivity for all. And this data is for desktop platforms only, android and other platforms are yet to come, after which we should be seeing more gains.

dailyjobs

jobs/push per week since January for mozilla-inbound. Lowest on April 20th – April 26th

Project Repo: https://github.com/dminor/ouija
Project Stats: http://alertmanager.allizom.org/dailyjobs.html
Project Information: http://alertmanager.allizom.org/seta.html

2) Mozilla CI Tools
MozCI (Mozilla CI Tools) is a python library which allows one to trigger builds and test jobs on treeherder.mozilla.org via command line. It has various use cases like triggering jobs for bisecting intermittent failures, backfilling missing jobs, and personally I use it for bisecting performance regressions. This tool is also used by sheriffs and is aimed for increasing developer productivity.
Project Repo: https://github.com/armenzg/mozilla_ci_tools
Project Documentation: https://mozilla-ci-tools.readthedocs.org/en/latest/index.html

3) Firefox Performance Sheriffing
In late January 2015, I took up the role of a performance sheriff. In this role, I look at the performance data produced by the test jobs and find regressions, root causes and get bugs on file to track issues and bring it to the attention of patch authors.
Sheriff documentation: https://wiki.mozilla.org/Buildbot/Talos/Sheriffing

I have also contributed patches to some other projects like removing android*.json from mochitests (bug 1083347), A-Team bootcamp and Mozregression. If you are looking to contribute to open source projects, I think this is a great time to start contributing to Automation and Tools team at Mozilla and make a big impact. For me, it has been one of the most productive quarters and I plan to keep contributing further. As some of you may know, this summer I will be joining as the A-Team intern at Mozilla in San Francisco with @chmanchester as my mentor, and I am looking forward to do more exciting work here!

Lets have more green trees

I have been working on making jobs ignore intermittent failures for mochitests (bug 1036325) on try servers to prevent unnecessary oranges, and save resources that goes into retriggering those jobs on tbpl. I am glad to announce that this has been achieved for desktop mochitests (linux, osx and windows). It doesn’t work for android/b2g mochitests but they will be supported in the future. This post explains how it works in detail and is a bit lengthy, so bear with me.

Lets see the patch in action. Here is an example of an almost green try push:

Tbpl Push Log

 Note: one bc1 orange job is because of a leak (Bug 1036328)

In this push, the intermittents were suppressed, for example this log shows an intermittent on mochitest-4 job on linux :

tbpl1

Even though there was an intermittent failure for this job, the job remains green. We can determine if a job produced an intermittent  by inspecting the number of tests run for the job on tbpl, which will be much smaller than normal. For example in the above intermittent mochitest-4 job it shows “mochitest-plain-chunked: 4696/0/23” as compared to the normal “mochitest-plain-chunked: 16465/0/1954”. Another way is looking at the log of the particular job for “TEST-UNEXPECTED-FAIL”.

<algorithm>

The algorithm behind getting a green job even in the presence of an intermittent failure is that we recognize the failing test, and run it independently 10 times. If the test fails < 3 times out of 10, it is marked as intermittent and we leave it. If it fails >=3 times out of 10, it means that there is a problem in the test turning the job orange.

</algorithm>

Next to test the case of a “real” failure, I wrote a unit test and tested it out in the try push:

tbpl4

This job is orange and the log for this push is:

tbpl3

In this summary, a test is failing for more than three times and hence we get a real failure. The important line in this summary is:

3086 INFO TEST-UNEXPECTED-FAIL | Bisection | Please ignore repeats and look for ‘Bleedthrough’ (if any) at the end of the failure list

This tells us that the bisection procedure has started and we should look out for future “Bleedthrough”, that is, the test causing the failure. And at the last line it prints the “real failure”:

TEST-UNEXPECTED-FAIL | testing/mochitest/tests/Harness_sanity/test_harness_post_bisection.html | Bleedthrough detected, this test is the root cause for many of the above failures

Aha! So we have found a permanent failing test and it is probably due to some fault in the developer’s patch. Thus, the developers can now focus on the real problem rather than being lost in the intermittent failures.

This patch has landed on mozilla-inbound and I am working on enabling it as an option on trychooser (More on that in the next blog post). However if someone wants to try this out now (works only for desktop mochitest), one can hack in just a single line:

options.bisectChunk = ‘default’

such as in this diff inside runtests.py and test it out!

Hopefully, this will also take us a step closer to AutoLand (automatic landing of patches).

Other Bugs Solved for GSoC:

[1028226] – Clean up the code for manifest parsing
[1036374] – Adding a binary search algorithm for bisection of failing tests
[1035811] – Mochitest manifest warnings dumped at start of each robocop test

A big shout out to my mentor (Joel Maher) and other a-team members for helping me in this endeavour!

GSoC 2014: Progress Report

In the last month, a good deal of work has been done on the Mochitest Failure Investigator project. I have described the algorithms implemented for bisecting the tests in my previous post. Since then, I have worked on including “sanity-check” once the cause of failure is found in which the tests are re-run omitting the cause of failure. The advantage of doing this is that if suppose a test is failing due to >1 number of previous tests, then we can find all those tests which are causing the failure. I then worked on refactoring the patch to integrate it seamlessly with the existing code. Andrew and Joel gave me good suggestions to make the code more beautiful and readable, and it has been a challenging and a great learning experience . The patch is now in the final review stages and it is successfully passing for all platforms on tbpl: https://tbpl.mozilla.org/?tree=Try&rev=07bbcb4a3e98. Apart from this, I worked on a small fun bug (925699) which was not cleaning up the unwanted temporary files, eventually occupying a large space on my machine.

Now since the proposed project is almost over and I still have a month and a half left, after discussing with my mentor, I am planning to cover up a few extra things:

* support finding the cause of failure in case of a timeout/crash for a chunk

* support for intermittent problems – this case is interesting, as we may be able to support AutoLand by using this.

* validating new tests to reduce intermittent problems caused by their addition.

Until next time!

 

Saving time in Incremental Builds

For the past couple of days, I have been working on Mochitest and many times I have to make small changes in some files for debugging purposes. These changes belong to a single file most of the times and sometimes to multiple files. Previously whenever I used to make some changes, I did an incremental build to test it out. It used to take approx 5 mins each time I did an incremental build. Recently, I have found out a way to speed up the debugging process and save time in incremental builds. Now, whenever I make some changes to some files I just replace those files in the corresponding subdirectory of {objdir}. This method works as the build system copies the files in the source tree to the {objdir}. For example, if I make some changes in runtests.py for mochitest, I just need to replace that file inside {objdir}/_tests/testing/mochitest and directly jump to testing the changes I make, thus saving a significant amount of time. There are other methods too such as using ‘make’ but I tend to prefer this way.

GSoC 2014 Progress: Coding Period Starts!

In the last two weeks, I have started coding for the “Mochitest Failure Investigator” GSoC project (Bug 1014125). The work done in these two weeks:

  • Added mach support for –bisect-chunk (this option is used if a user wants to explicitly provide the name of the failing test). This would help in faster debugging locally.
  • Wrote a prototype patch in which I coded two algorithms namely, Reverse Search and Binary Search. As the name suggests, Reverse Search is used to split all the tests before the failing test in 10 chunks and iterate over each chunk till the failing chunk is found. Once, the failing chunk is found each test in that chunk is iterated again to determine the test causing the failure. Binary Search on the other hand split the tests in halves and iterates over each half in a recursive way to find the failure point.
  • The mochitest test harness did not support random test filtering but only supported sequential tests filtering, that is, if we needed to run “test1”, “test2” and “test99” we could not do that and we had to run all the tests for 1 to 99. So, I initially implemented the algorithm such that tests are run in sequential way, however this method was not optimal as a lot of unnecessary tests were run again and again.
  • Next, I optimized both the search methods and added support for running random tests in the mochitest test harness. This was done by filtering the tests that were added to tests.json when the profile was created.
  • I re-factored the patch on  :jmaher’s recommendations and made the code more modular. This was followed by testing the patch for the sample test cases that I had initially developed on the try server  for mochitest-plain, mochitest-chrome and browser-chrome tests.

The results on try were fantastic.

A typical binary search bisection algorithm on try looks like this:

binary-search-example

A typical reverse search algorithm looks like this:

reverse-search-exampleThe “TEST-BLEEDTHROUGH” shows the test responsible for the failure. As we would expect, reverse search performs better than binary search when the failure point is closer to the failing test and vice-versa. The bisection algorithms took 20 mins on an average to compute the result.

How is all of this useful?

The contributors in mozilla spend large amount of effort to investigate test failures. This tool will help in increasing productivity, and saving on an average 4-6 hours of finding the test case down and also reduce the number of unnecessary try pushes. Once this tool is hooked up on try, it will monitor the tests and as soon as the first failure occurs, it will bisect and find the failure point. Also, we can use this tool to validate new tests and help reducing intermittent problems by adding tests in chunks and verifying whether they are passing and if not which tests are affecting the to be added test. We can also use this tool to find out the reason for timeout/crash of a chunk.

It has been quite exciting to tackle mochitest problems with :jmaher . He is an amazing mentor. In the coming weeks, I will be working on making the tool support intermittent problems and incorporating the logic of auto-bisection. Happy hacking!

 

 

The first month of Google Summer of Code

Already a month has passed since the start of Google Summer of Code 2014. As previously mentioned I had developed a test case to show a failure so that we can detect it. In the past month, most of my time was spent in familiarizing and researching on the existing mochitest harness. I researched about mochitest-plain, browser-chrome and mochitest-chrome. Mochitest plain tests are normal html tests without administrator privileges while chrome tests are the ones with administrator privileges. Browser chrome tests are similar to chrome tests but they control and run the user interface(UI) of the webpage.  Also, chrome tests depend on harness.xul while browser-chrome tests depend on browser-harness.xul, and browser-test-overlay.xul is used to provide an overlay to both type of tests. These files are entry points to the harness and deal with the UI of the tests.

Next, I learnt about test chunking(how the tests are segregated), manifest filtering(filtering the tests that have to be run depending on the command line options) and log parsing. It has been an interesting and a humbling experience to learn about the different things that hold together a  framework. Then, I started working on Bug 992911. Mochitests till now are run per manifest and this bug basically deals with running mochitest per directory and displaying the cumulative results(passed, failed, todos) at the end. The advantage of doing this is that we can determine how many tests are run in each directory and we get a great balance between runtime and test isolation. Joel Maher had already coded the basic algorithm for this functionality but there was a problem with log parsing and summarization. On his advice, I fixed up the log parsing and we could correctly display the cumulative number of pass/fail/todo tests and I also added mach support for ‘–run-by-dir’ command. The patch is almost done and we will very soon have this functionality.
In the coming weeks, I will start working on the tool for bisecting the chunks to find the failure point of a failing test. Stay tuned!

Selected for Google Summer of Code 2014!

I have been selected for Google Summer of Code 2014 and the project that I will be working on is Mochitest Failure Investigator (abstract) mentored by the awesome Joel Maher. Basically, this project aims at building a tool that will help contributors find the origin of test failures for mochitest on local as well as production machines, thus leading to accelerated development and debugging.

In the past few weeks, I have hacked and patched a couple of bugs for mochitest and am learning how the test harness works. I have started work on my GSoC project and I have developed a test case which is crafted to show a failure so that we can detect it. It is a simple test case in which I have set a dummy value to a fake preference and then checking afterwards whether it is the same. If it is so, then I make the test to fail. Now, the next step is to understand properly how the python and javascript harness of mochitest works together and I will spend this week doing the same.

I am really excited for this project and will write about my work regarding the project regularly.