Solved – Why and when create a R package

I understand this question is quite a broad one, but I wonder what should be the decisive points in deciding to create (or not) a new package for R. To be more specific, I would add that the question is not about the reasons to use R in itself, more about the decision to compile various scripts and to integrate them in a new package.

Amongst the points that could lead to these decisions, I have thought of (in a quite non-exhaustive fashion), of :

  • the non-existence of other packages in the same sub-field ;
  • the need for exchanging with other researchers and allowing reproducibility of experiments ;

And amongst the points that could lead to a contrary decision :

  • part of the methods used already present in some other packages;
  • number of new functions not sufficient to justify to create a new independent package.

I might have forgotten many points that could go in either list, and also, these criteria seem partly subjective. So, what would you say should justify, and at which point, to start bringing together various functions and data in a new documented and broadly available package ?

I don't program in R, but I program otherwise, and I see no R-specific issue here.

I imagine that most people first write something because they really want it for themselves. Conversely, any feeling that one should be publishing software because it is the thing to do should be resisted strongly. Smart people can be lousy programmers, and often are.

Going public seems a matter of being confident that you have something that is as good or better than what is already public and fills a gap. Knowing that other people want to do the same thing is surely a boost.

If you are in doubt, don't publish. In many communities, there is a quality control problem of mediocre or buggy software released by uncritical or inexperienced programmers, although how bad the problem is remains open to debate. Optimists feel that trivia can just be ignored and that users will expose bugs and limitations fast enough; pessimists feel that we are drowning in poor quality stuff and it's hard to tell the winners from the losers. (On the other hand, the experience gained from publication is part of what allows programmers to improve.)

There could be a book on this, but a few pointers spring to mind:

  1. Good quality documentation distinguishes good software as well as good code, indeed sometimes more obviously. Never underestimate how much work will be needed to provide the documentation that the code deserves. R programmers often seem to require that R users know just as much they do about the technique being implemented and document minimally….

  2. As far as possible, test your code so that you can reproduce published solutions with real data from elsewhere. (If you are coding up something totally new, that may be more difficult, but not impossible. Also, you may often find yourself wondering whether it's their bug or yours.)

  3. Programmers often underestimate the ability of users to throw unsuitable data at a program. So, think about what could go wrong, e.g. with missing values, zeros if a program assumes positive, etc., etc. (The benign take here is that it's the job of the users to find the problems and improve the code through their feedback, but a program that breaks down easily won't enhance your reputation.)

Similar Posts:

Rate this post

Leave a Comment