Tips and advice when creating a python software for lab members to use in academia

Jeremy John Selva

Prior knowledge:
No previous knowledge expected

Summary

I have made a software called MSOrganiser (https://github.com/SLINGhub/MSOrganiser) for an academia lab to tidy Mass Spectrometry acquisition data and calculate concentration. While implementation in Python may sound simple, a lot of time is spent to make it user-friendly for people with limited computing experience. Hope my sharing be helpful for beginners who are doing a similar task as me.

Description

Introduction (1 min)

In academia, most of us are not experts in coding and software development. While it is hard for us to write software for our own research paper, it is even more challenging to create software for others to use. As one of Daniel Lemire’s blog goes “Cooking your own food is a lot easier than being a chef in a restaurant”.

Despite the many obstacles faced, I have managed to come up with a software MSOrganiser in Python that started off to be used for my own project but later expanded to do more things such that the lab is able to use with minimal supervision from me. From this experience, I have many advises and tips to share but due to the limit of time in this lightning talk, I decide to highlight the ones that have helped me tremendously in my work.

Advice 1: Convert Python scripts to a Web Service or Executable Software (1 min)

The first advice is to invest some time to learn how to convert Python scripts into a web service or an executable software. The key is to make the software more user-friendly. People prefer to click on buttons rather than to type in command lines. For my case, I decided to opt for an executable software because of my limited experience in Flask and the lab do not have the IT expertise to own a server to run the web service. I am thankful for the Python Package Gooey, Pyinstaller and Jack McKew's blog for helping me make this a reality.

Advice 2: Create a cheat sheet for new users (1 min)

As more people uses the software, they will ask for feature requests and soon the software will be able to do many things. However, the downside is that it will be more complex to use and understand, especially for new users that are seeing the tool for the first time. While it is possible to record everything in a User Manual, a less intimidating approach is to show the big picture to new users via a summary page or cheat sheet. Inspired from the RStudio's cheatsheet, I have made one in the Overview page of README. Once the new users are more familiar with the tools, they can be referred to the User Manual should they have specific questions that the cheat sheet cannot answer.

Advice 3: Give helpful and kind error messages (1 min)

No software is perfectly free from errors and bugs. Likewise, no human is perfect and can sometimes make mistakes, such as providing an invalid data input. Unfortunately, users in academia, especially those with limited computing experience, are afraid of error messages. This is because Python error messages are too technical for them to understand. It is hard from the user's perspective to know if the issues lies on their input data or the software. Saadia Minhas' blog has provided some good resources on how to create error messages that are helpful and kind. User's feedback can also be valuable. The key is to show your openness to criticism, I usually follow Micheal Lynch's advice and ask "What changes would be helpful ?" instead of "What is unclear about the error message ?". It works for me because it sounds less offensive to the user. It also encourage users to give suggestions and give constructive feedback, doing the work for you.

Using the example from MSOrganiser, the most common error message users encounter is this message.

  • [Errno 13] Permission denied: 'D:\MSOrganiser\dist\Results.xlsx'

After some feedback from users, the same message has been converted to

  • Unable to save Excel file due to:
    [Errno 13] Permission denied: 'D:\MSOrganiser\dist\Results.xlsx'
    Please close the file if it is still open on your computer, then try running the software again

Advice 4: Output report file and pre-processing results to show accountability, even if it seems obvious to you. (1 min)

In academia, software reproducibility and reusability are highly valued.

Here are what I have done to achieve the following.

Tag a label on different software version and encourage users to cite not just the software name but the version number as well. Github has this nice feature to store different releases of the same software.

Next, ensure that your software create a report file or table that store the user's input parameters. This is to ensure reproducibility of data. In the case of MSOrganiser, Weasyprint is used to create a pdf file containing a small table on what the software parameters are. A preview of the parameter report can be found in the software cheat sheet as well.

Pre-processing results helps to explain how the software calculate the final results. The MSOrganiser has a testing mode that generate such results in different excel sheets. From my experience, it is easier to explain to a collaborator, who may be unfamiliar with your work, that the final result is calculated by (Sheet A/Sheet B * Sheet C) rather than a generic formula. This is also helpful when there is a need to troubleshoot for logical errors. In fact, the lab uses these sheet to help explain where I have gone wrong during calculation.

Advice 5: Organise documentations into specific structures (1 min)

Documentation is important because it tells others how the software works and how it is built. It helps to prevent technical debt, especially if the one taking over the software maintenance is less experience than you.

Resources from Benjamin D. Lee and Joseph D. Romano and Jason H. Moore are good places to start.

However, as my documentation starts to get longer and more complex, I begin to realise that there is no documentation that fits all users and there is a need to create different kinds of documents for different groups of people with specific needs. I ended up having too many documentations that it can be hard to manage. Thankfully, I came across a website from DIVIO that significantly help me to group my documentations into different categories. Following in its footsteps, this is what I come up with

I also added the following

Take Home Advice: Create a software that gives a lasting impact, even if it is for a few group of users. (1 min)

When creating a software, we tend to want to create something to impress a lot of people or to made it onto a journal paper. We tend to forget that the main purpose for creating one is not to make us popular but to help others with their problems. A problem no matter how small can be as annoying as big ones. The more annoying the problem the software tries to solve, the more useful it is.

The things that my software does are very simple but it helps my lab members reduce the tedious workload of manually copy and pasting data and creating hard-coded formula from multiple Excel sheets. Don't feel discouraged when you are tasked to create a tool that does small and simple things, instead "do (these) small things with great love." -- Mother Teresa. As this blog post from Yihui Xie says "If you can impact a few people deeply, they will just shout from the rooftops for you. The breadth of the impact will be a matter of time".

Just be patient and continue to improve along the way. All the best and take care.