The ART of Open Source Contributions

Chloe Wang
6 min readJul 6, 2022

I proudly announce that my first Open Source contribution has been merged(Pull Request link) and released with ART 1.11.0 on July 1st.

Open Source software development has become more and more popular for many reasons. For example, programmers can see the source code, spot mistakes, fix issues, propose new features, and make consensus decisions; Open Source is also a fantastic way to get help, seek guidance, and learn cutting-edge technologies.

Open source covers the entire range from professional software development sponsored by large corporations to code developed by motivated contributors that work in their free time on a project that matters to them. This blog focuses on the Open Source contribution in our free time. It is like everything else: it is fun if you like it; otherwise, giving up could be an easy path to fall into because you are volunteering. It is an interest-driven activity; How to make commitment consistent instead of only being passionate at the beginning? This blog is about what I learned and wanted to share with you. Let’s dive in.

Think Before the Act

First of all, choose an active GitHub repo. You know that behind an active GitHub repo, there must be a vibrant community and an enthusiastic team to maintain the repo. Most importantly, this means there are adopters of this Github project. Adopters are essential because they will give you feedback and help you make improvements.

Second, find a leader who can guide you through the journey, especially if you are new to a project, like me, a first-time Open Source contributor. Many Open Source projects have ramp-up documents for beginners. Github's firstcontributions repo has many detailed and beginner-friendly documents, tools, and tutorials.

Thirdly, be brave to ask questions and remove blockers. I started from zero and didn’t know what adversarial ML was, even though I had some experience with applied Machine Learning algorithms. However, is it great to be involved in a new area? My answer is YES. Learning is always the theme for Open Source contributions; asking questions and removing blockers are steps to help us learn!

How did I find ART?

In September 2021, I reached out to my mentor, Susan Malaika, who is an Open Source Advocate at IBM, let her know my interests, and asked for my ideal Open Source project:

  1. It must be machine learning related. This ask was based on my interests. I have worked on a couple of applied machine learning projects on a video surveillance system. Since then, I have been working on project usage data analytics for several years. I like what I am doing and want to keep joining on a project outside of work about discovering insights from the data.
  2. It would be great if the project lead could help me onboard.
  3. I prefer to participate in a popular project. And we chose to use Github “Star” as a fair measurement from this perspective;

Susan recommended ART for many reasons, which met all my asks very nicely. Even better, it is a project originally created by IBM Research and maintained by IBMers. She introduced ART’s project lead, Beat Buesser, to me.

Beat helped me ramp up and suggested an issue: add a new feature, Sign_OPT, a query-efficient hard-label adversarial evasion attack to ART.

ART is a project with excellent documentation, as well as a ton of examples. For Sign_OPT, I started learning from the existing black-box attack, Boundary Attack(link). I’ve read papers to understand both algorithms and watched tutorials online, including GAN, Adversarial Attacks on Github, etc.

Main Takeaways

I took many baby steps that can be leveraged on any ML-related project. I’ve summarized 6 here:

  1. understand the dataset: MNIST, CIFAR10, Iris;
  2. understand and learn how to use the deep learning framework: Tensorflow, PyTorch, Keras;
  3. understand the Neural Network architecture and why some NN architecture is so popular;
  4. learn how to write code with GPU compatibility;
  5. learn how to utilize free GPU resources from Google Colab;
  6. learn how to configure Neural Network layers to debug algorithm performance issues.

I also learned many key things applicable to common Open Source projects. The following 4 are the most significant learnings I got:

  1. Always commit code with git commit -s. “-s” is for The Developer Certificate of Origin(DCO); since I didn’t know DCO, I had to re-sign all my commits; it was pretty painful.
  2. Run code-style tools to bring the best practice of programming. For example, Pylint, pycodestyle, mypy, black, and more;
  3. For Python programmers, PyTest has many advanced features to make your unit test clean and compact, like Parametrizing fixtures and test functions, for example, adding annotation like @pytest.mark.parametrize() with @pytest.fixture();
  4. Open Source License. It is very beneficial to understand Open Source Definition, which is forced by Open Source licenses. Check out Open Source Initiative for details.

Other essential pieces of advice:

  1. Set up regular checkpoints with your collaborator. For example, the first month after talking to Beat and glancing at the paper, I felt I had chosen an impossible mission. There was little progress. To encourage me to keep trying, I set up a weekly chat with Beat, a 10–15min Zoom meeting, asking questions, removing blockers, and set up weekly plans. The sessions with Beat inspired me to keep working. Beat always gives me clear answers and actionable todos, making me feel confident and also making my learning fast.
  2. Ask specific questions. This is a common suggestion but is worth emphasizing. The process of thinking about specific questions, instead of very general ones, is the process of finding answers. This exercise helped me bring up true blockers that I need help from others; it also helped me find answers by myself.
  3. Last but not least, ALWAYS ask for help! In my case, Beat and I also reached out to one of the authors of Sign_OPT, Minhao Cheng, for several questions, including reusing some of the source code with permission and asking detailed questions like, “Is the adversarial MNIST image clipped or not?”; “What are the parameters for meeting the performance listed on the paper”; “Which Neural Network architect was used for performance evaluation?” Minhao was always responsive and gave detailed answers. I can’t express my gratitude more to Cheng and his co-authors.

I hope this blog gives you the courage to try Open Source contributions. Feel free to reach out to me with any questions. Thank you for reading.

Susan Malaika

Susan Malaika is an STSM(Senior Technical Staff Member) in IBM’s Cognitive Applications Group and a member of the IBM Academy of Technology Leadership Team. Susan is focused on data and AI-related technologies in IBM’s Open Technologies group: increasing IBM’s adoption and contribution to open source and engagement with developers. LinkedIn Profile

Beat Buesser

Beat Buesser is a Research Staff Member in the AI Security & Privacy group at the Dublin Research Laboratory of IBM Research. He is currently leading the development of the Adversarial Robustness Toolbox, and his research focuses on the security of machine learning and artificial intelligence. Before joining IBM, he worked as a postdoctoral associate at the Massachusetts Institute of Technology (MIT) and obtained his doctorate degree from ETH Zurich. LinkedIn Profile

Chloe Wang

Chloe is an STSM(Senior Technical Staff Member) at IBM. During my workdays, I design solutions and develop applications, do code reviews as a full-stack engineer, plus participate in client-facing activities as a technical architect. I always like to play with Data and join hands-on Machine Learning related projects in my free time. Please check my Github for multiple ML projects. LinkedIn Profile

--

--

Chloe Wang

Senior Technical Staff Member in IBM Finance and Operation team. A big fan and practitioner of applied machine learning and distributed cloud computing.