First Month of Mozilla Internship

It has been a month since I started my Mozilla internship in San Francisco, but it feels like I had just started yesterday. I have been interning with the Automation and Tools team this summer and it has been a wonderful experience. As an intern in Mozilla, one gets goodies, new laptop, free food, and there are various intern events in which one gets to take part in. My mentor @chmanchester also gave me the freedom to decide what I wanted to work on which is quite unlike some other companies.

I chose to work on tools to improve developer productivity and making libraries that I have worked on in the past, more relevant. In the past month, I have been working on getting various triggering options inside Treeherder, which is a reporting dashboard for checkins to Mozilla projects. This involved writing AngularJS code for the front-end UI and python code for the backend. The process involved publishing the “triggering actions” to pulse, listening to those actions and then use the mozci library on the backend to trigger jobs. Currently if developers, particularly the sheriffs, want to trigger a lot of jobs, they have to do it via command line, and it involves context switching plus typing out the syntax.  To improve that, this week we have deployed three new options in Treeherder that will hopefully save time and make the process easier. They are:

* Trigger_Missing_Jobs: This button is to ensure that there is one job for every build in a push. The problem is that on integration branches we coalesce jobs to minimize load during peak hours, but many times there is a regression and sheriffs need to trigger all jobs for the push. That is when this button will come in handy, and one will be able to trigger jobs easily.

* Trigger_All_Talos_Jobs: As the name suggests, this button is to trigger all the talos jobs on a particular push. Being a perf sheriff, I need to trigger all talos jobs a certain number of times for a regressing push to get more data, and this button will aid me and others in doing that.

Screen Shot 2015-07-30 at 22.30.56Fig: shows “Trigger Missing Jobs” and “Trigger All Talos Jobs” buttons in the Treeherder UI

* Backfill_Job: This button is to trigger a particular type of job till the last known push on which that job was run. Due to coalescing certain jobs are not run on a series of pushes, and when an intermittent or bustage is detected, sheriffs need to find the root cause and thus manually trigger those jobs for the pushes. This button should aid them in narrowing down the root regressing revision for a job that they are investigating.

Screen Shot 2015-07-31 at 09.39.43Fig: shows “Backfill this job” button in the Treeherder UI

All of the above features right now only work for jobs that are on buildapi, but when mozci will have the ability to trigger tasks on taskcluster, they will be able to work on those jobs too. Also, right now all these buttons trigger the jobs which have a completed build, in future I plan to make these buttons to also trigger jobs which don’t have an existing build. These features are in beta and have been released for sheriffs, I would love to hear your feedback! A big shout out to all the people who have reviewed, tested and given feedback on these features.

Advertisements

A-Team Contributions in 2015

It has been a while since I last blogged and I had something interesting to share so I finally managed to overcome my laziness. In this post, I would like to talk about some of the projects that I have been involved with in 2015 in the A-Team at Mozilla. Before I talk about the projects, I would like to give a shout out to @jmaher, @armenzg, @kmoir, @dminor, @ahal, @gbrown, @wlach who were always there to answer my questions, help me find a way to solve problems when I was stuck and review my patches. I have worked on some exciting problems and made some great progress on the projects in the past 4 months. They are:

1) SETA – Search for Extraneous Test Automation
We run hundreds of builds and thousands of test jobs each day on our integration branches, that is , mozilla-inbound and fx-team. And as more and more platforms are added every month, the load on test machines is ever increasing. But are so many test jobs for each push required? We run the test jobs to catch failures but majority of time, the test jobs pass and the ones who indeed catch failures often have duplicate ones. SETA tries to tackle this problem by being smart about utilizing machine cycles. In SETA, we find the minimum number of jobs that are needed to find all the failures that have occurred in the last six months on integration machines. From this data, we predict that these jobs will be more likely to catch failures than others and therefore other test jobs are set to run less frequently. It is true that we will be wrong certain number of times and when we miss a failure, the sheriffs would need to backfill some jobs to find the root cause. But most of the time, it will work. Joel has done an excellent blog post giving examples and statistics that has been done in this project. This project has been deployed in Mozilla Releng production systems and we have reduced the number of jobs to roughly 150-190 jobs/push from 350-400 jobs/push per day on desktop (linux, osx, win) platforms, a 50% reduction during high load weekdays. To put this into perspective, the past week we have seen the lowest jobs per push since January 1st on both mozilla-inbound and fx-team. I see this as a huge win as it drastically reduces the load on our machines as well as reduces the time the sheriffs need to star intermittents, increasing productivity for all. And this data is for desktop platforms only, android and other platforms are yet to come, after which we should be seeing more gains.

dailyjobs

jobs/push per week since January for mozilla-inbound. Lowest on April 20th – April 26th

Project Repo: https://github.com/dminor/ouija
Project Stats: http://alertmanager.allizom.org/dailyjobs.html
Project Information: http://alertmanager.allizom.org/seta.html

2) Mozilla CI Tools
MozCI (Mozilla CI Tools) is a python library which allows one to trigger builds and test jobs on treeherder.mozilla.org via command line. It has various use cases like triggering jobs for bisecting intermittent failures, backfilling missing jobs, and personally I use it for bisecting performance regressions. This tool is also used by sheriffs and is aimed for increasing developer productivity.
Project Repo: https://github.com/armenzg/mozilla_ci_tools
Project Documentation: https://mozilla-ci-tools.readthedocs.org/en/latest/index.html

3) Firefox Performance Sheriffing
In late January 2015, I took up the role of a performance sheriff. In this role, I look at the performance data produced by the test jobs and find regressions, root causes and get bugs on file to track issues and bring it to the attention of patch authors.
Sheriff documentation: https://wiki.mozilla.org/Buildbot/Talos/Sheriffing

I have also contributed patches to some other projects like removing android*.json from mochitests (bug 1083347), A-Team bootcamp and Mozregression. If you are looking to contribute to open source projects, I think this is a great time to start contributing to Automation and Tools team at Mozilla and make a big impact. For me, it has been one of the most productive quarters and I plan to keep contributing further. As some of you may know, this summer I will be joining as the A-Team intern at Mozilla in San Francisco with @chmanchester as my mentor, and I am looking forward to do more exciting work here!

Saving time in Incremental Builds

For the past couple of days, I have been working on Mochitest and many times I have to make small changes in some files for debugging purposes. These changes belong to a single file most of the times and sometimes to multiple files. Previously whenever I used to make some changes, I did an incremental build to test it out. It used to take approx 5 mins each time I did an incremental build. Recently, I have found out a way to speed up the debugging process and save time in incremental builds. Now, whenever I make some changes to some files I just replace those files in the corresponding subdirectory of {objdir}. This method works as the build system copies the files in the source tree to the {objdir}. For example, if I make some changes in runtests.py for mochitest, I just need to replace that file inside {objdir}/_tests/testing/mochitest and directly jump to testing the changes I make, thus saving a significant amount of time. There are other methods too such as using ‘make’ but I tend to prefer this way.