Tag Archives: R

naijR 0.2.2

This post is to announce the arrival of naijR 0.2.2 on CRAN.

New S3 classes

This version of the package introduces the use of an object-oriented style to programming, making available constructors for states and lgas objects. To create instances of both classes, we pass a character vector of States or LGAs as appropriate. These constructors are somewhat permissive and do not perform strict accuracy checks. For that we have the functions is_state and is_lga.

Check for LGAs

There are 774 LGAs in the country and they are pivotal to any analytic tasks done with country data. They are also very often misspelt as any dataset taken from the wild would reveal. I have taken the pains to provide authoritative appellation for this tier of governance using government sources. This can be easily inspected in the inbuilt package dataset lgas_nigeria.

The new function is_lga will scan through a vector to check whether its elements have correctly spelt LGAs. Where poorly spelt ones are found, the function fix_region can be used to correct this. The method for lgas objects will attempt to do this automatically using partial matching. For example

> library(naijR)

> mylga <- c("Amuwo-Odofin", "Bukuru", "Askira-Uba")

> is_lga(mylga)
[1] TRUE FALSE FALSE

Fix mispelt regions

A major addition in the current version is the function fix_region, which helps a user to repair any misspelt adminstrative regions within a dataset. The function has methods for different kinds of regions i.e. States and Local Government Areas, which are optionally represented as the S3 objects states and lgas, respectively. However, the function also has a method for base character vectors, mainly for States, since they are not that many. To repair our vector my_lga, we will create an lgas object first and then pass it as an argument to fix_region.

> fixed <- fix_region(lgas(mylga))
Approximate match(es) not found for the following:
* Bukuru
Warning message:
In lgas(mylga) : One or more elements is not an LGA

> fixed
[1] "Amuwo-Odofin" "Bukuru"       "Askira/Uba"

The LGA Askira-Uba has been corrected to its correct spelling, Askira/Uba. However, a match could not be found for the element Bukuru. (Bukuru is actually the name of the headquarters of Jos South LGA of Plateau State). To continue attempting to repair our vector, we run fix_region in interactive mode

> fixed <- fix_region(lgas(mylga), interactive = TRUE)
Approximate match(es) not found for the following:
* Bukuru
Do you want to repair interactively? (Y/N): 

The user is prompted to continue interatively. To continue enter something like y.

Fixing ‘Bukuru’
Search pattern: buk
Select the LGA 

1: Bukkuyum
2: Retry
3: Skip
4: Quit

We are searching for options using the search term buk and only one option was returned i.e. Bukkuyum. Unfortunately, that’s not the one we are looking for so we will enter 2 and run the search again, by passing only bu

Selection: 2
Search pattern: bu
Select the LGA 

 1: Buruku                         2: Akpabuyo                    
 3: Obubra                         4: Obudu                       
 5: Burutu                         6: Abuja Municipal Area Council
 7: Babura                         8: Buji                        
 9: Bunkure                       10: Sabuwa                      
11: Bunza                         12: Kabba/Bunu                  
13: Ijebu East                    14: Ijebu North                 
15: Ijebu North East              16: Ijebu Ode                   
17: Abua/Odual                    18: Tambuwal                    
19: Bursari                       20: Bukkuyum                    
21: Bungudu                       22: Retry                       
23: Skip                          24: Quit      
                  

The LGA I wanted to select was Buruku, so I pick option 1

Selection: 1
Warning message:
In lgas(mylga) : One or more elements is not an LGA

> fixed
[1] "Amuwo-Odofin" "Buruku"       "Askira/Uba"  
> is_lga(fixed)
[1] TRUE TRUE TRUE

We’ve fixed the LGAs! At this point, any LGAs that could not be fixed can be treated be directly manipulation of the object,

Maps

This version of the package provides increased granularity for the Nigeria country map, currently going down to LGA levels.

map_ng(lgas())

To know more about drawing Nigeria maps with the package, see the documentation (?map_ng) or read the vignette.

Conclusion

This version of naijR brings some new functionality to aid with data cleaning and validation of LGA names, as well as LGA level mapping. I would like you to try it out and give me some feedback.

Leave a comment

Filed under Data Scoemce

An R package to help with RQDA

A few weeks aga, I published a package on GitHub, which I called RQDAassist. The package was inspired by a script I wrote to help RQDA users, myself included, to install the package after it was archived on CRAN when R 4.0 arrived on the scene. So, when RQDAassist was first published, that was its only real functionality.

Today, I am releasing a minor update (v. 0.2.0) that has a few functions added. It can now convert transcripts written in Word into plain text files – a desired format for RQDA projects – and it can prepare those test files into objects that can be read, in bulk, into an RQDA database. Another thing I personally needed for my work was the ability to seaarch qualitative codes using R scripts rather than the graphics user interface; so I wrote a search function, which currently works for active RQDA projects.

This package has so far been tested on Windows 10 (x64) but it should work fine on other major platforms (any subequent update will include the relevant tests for Linux and Mac OS).

There are no plans to take this package to CRAN and indeed there should be no need to do so once RQDA installation from that repository is fully restored. But I find the prospect of additional helper functions to be quite useful in my work and hope others do too. I hope to see these functionalities expand over time.

You are welcome to check out this project at the GitHub repository or try it out using the instructions in the README.

Leave a comment

Filed under Computers & Internet

Another Excel Horror Story

I was trying to create a list of officially approved Health Maintenance Organisations (HMOs) in Nigeria. After jotting down what data I wanted to collect and creating a schema, I paused to decide on how to initiate the approach. I wanted to first of all have it as a CSV file and then figured that the cheapest way to start would be to be “graphical” about it. I opted to go for MS Excel, since I could easily save the results in the desired format. After all, I’m an Office 365 subscriber, so why not give it a try?

If you know anything about me, you are probably aware of my aversion to Excel. After a long romance, our separation was both violent and traumatic. But today I said to myself that I would not be unduly nasty and give it a shot. I told myself, there is no doubt that Excel is a great application and it’s used my millions with great effect.

I found the website of the National Health Insurance Scheme (NHIS) and the page that lists the HMOs. Good. I could have two windows open, the web page on the left and Excel on the right, plug into some good music and in a few minutes of copy-pasting, I should be able to acquire the data.

After a few minutes — and when I got to the phone numbers — Excel started off with one of our old quarrels. Somehow, we could never get to agree on how to handle phone numbers. First, it turned the numbers into scientific notation. Then I tried to set the input type from “General” to “Text” to allow, leading zeros. Then I had to click on the action prompt to indicate that I didn’t want formatted text. Even though I applied my settings to the columns that were to accept phone numbers, whenever I hit the next row, I had to start all over again. Arrrrrgh!

I now chastised myself for thinking that Excel was a changed person. How stupid I was! So I had to vent…

Sometimes we do silly things but don’t know why. This was one of them. I’m reasonably comfortable with R, and practically kicked myself knowing that with the rvest package, and a little peeping around for HTML tags and/or CSS selectors using the SelectorGadget, I could more efficiently grab the data I so badly needed.

Here’s the code I eventually used to get the job done:

library(rvest)

nhisHtml <- read_html("https://www.nhis.gov.ng/hmo-contacts/")

tableTag <- html_nodes(nhisHtml, "table")
tblElements <- html_table(tableTag)
myDf <- tblElements[[1]]
write.csv(myDf, "data.csv")

What on earth was I thinking to even attempt using Excel for this task?

Leave a comment

Filed under Computers & Internet

Help with installing RQDA

RQDA user interface
The RQDA User Interface

[Update – 25 Nov 2020]: In the last 3-4 days, there has been significant activity on the RQDA GitHub repository, specifically addressing the needed updates to the package. So, it’s expected that very soon, the package will once again be available for installation via the regular channels.

RQDA is software for computer-aided qualitative data analysis (CAQDAS) and is specifically tailored for use with the R programming language and statistical computing environment. Last year I was privileged to use RQDA in carrying out the data analysis for an assessment involving 4 Nigerian States. It’s a great package, and very user-friendly. I was able to engage a team of non-programmers and after a 2-hour training, they were good to go, giving me great results.

A few months ago, somebody raised an alarm on the package’s GitHub repository. RQDA was gone!

GitHub Issue #38: Package was archived on CRAN
You need to see the comments that followed after!

What followed was a long discussion – many researchers were adversely affected by this development. Fortunately, my project was properly isolated using package management powered by renv and I really had no problems at all. But others were not so fortunate, and some didn’t even know how to start solving the problem. I participated somewhat on the thread to see how I could help out a few people.

You see, what had happened was that some of the dependencies of RQDA on CRAN, the Comprehensive R Archive Network, had been upgraded and the maintainer of RQDA, Prof. Ronggi Huang of Fudan University, China, was yet to upgrade the project accordingly. With the upgrading of R to version 4.0, these packages were all archived on CRAN and could not be installed the regular way i.e. with install.package(). On a good day, installing RQDA already presents some challenges, because of the graphical user interface (GUI) libraries it uses. Now it was impossible, except for advanced R users.

One of the developers on the thread took it upon himself to work on a fork of the project and came up with a good solution. And it worked. RQDA could be downloaded and installed with little or no pain. However, when colleagues asked whether he was going to commit to maintaining the fork or even pushing to CRAN, he declined, and rightly so. Instructions for using his branch can be found here.

Given this scenario, I decided that it would be good to also develop a solution based on the last available CRAN version, even though it was archived. I therefore came up with an R script that can be used both in the shell and within an R session. With this solution, RQDA can be successfully installed from CRAN on the current version of R (v4.0.2), I tried to provide informative messages to guide would-be users in carrying out the required steps – in some cases, there might be a need to stop the script and carry out an intermediary step at the R console. This script has been uploaded here as a GitHub Gist.

To use this script, follow these steps:

  1. Download the script and save it to disk–its name is gwdg-arch.R. Note the location where it is saved.
  2. Navigate to the directory/folder where the file in the shell or in an R session.
  3. Run the script:
    • If in the shell, use Rscript gwdg-arch.R.
    • If in the R console, use source("gwdg-arch.R")
  4. If RGtk2 was successfully installed by the script, it will terminate. You should now go to the R console and run library(RGtk2); this will bring up a dialog, asking you to install Gtk+. Accept it.
  5. After installing Gtk+, run the script again to download and install the other packages, including RQDA.
  6. If the above steps fail, perhaps your system is lacking some extraneous dependency. Run the script in the shell, only this time add the flag --verbose. This will print out more messages to help identify the possible cause of the problem.

Feel free to give me a shout.

1 Comment

Filed under Computers & Internet

Quick Tip on Deleting Directories in R

When trying to delete a directory, one can encounter some unexpected problems. The function for carrying out this operation is unlink, which accepts the director name as its first argument; other arguments are recursive (a logical vector or length 1 indicating whether we want to delete subdirectories, and force, also logical, which tries to override file permissions in most cases. It returns 0 when successful and 1 when not.

But there is a gotcha to using the function. First let’s list the contents of the HOME directory

> list.files()
 [1] "3D Objects"
 [2] "AppData"
 [3] "Contacts"
 [4] "Desktop"
 [5] "Documents"
 [6] "Downloads"
 [7] "Favorites"
 [8] "IntelGraphicsProfiles"
 [9] "Links"
[10] "MicrosoftEdgeBackups"
[11] "Music"
[12] "New folder"
[13] "NTUSER.DAT"
[14] "ntuser.dat.LOG1"
[15] "ntuser.dat.LOG2"
[16] "NTUSER.DAT{a70b1724-6bc8-11e8-a408-d0bf9c58c5d2}.TM.blf"
[17] "NTUSER.DAT{a70b1724-6bc8-11e8-a408-d0bf9c58c5d2}.TMContainer00000000000000000001.regtrans-ms"
[18] "NTUSER.DAT{a70b1724-6bc8-11e8-a408-d0bf9c58c5d2}.TMContainer00000000000000000002.regtrans-ms"
[19] "ntuser.ini"
[20] "OneDrive"
[21] "Pictures"
[22] "R"
[23] "Saved Games"
[24] "Searches"
[25] "source"
[26] "Videos"

Let’s say we want to delete the ‘New folder’ directory

> (unlink('New folder/', recursive = TRUE, force = TRUE))
[1]

It fails!

Even when you study the help file, the source of this failure is not apparent.

Well, it turns out that the function does not recognize the trailing slash that indicates that we are dealing with a directory. This is always added when you use tab completion for the directory name.

So, when we type

# Remove trailing slash in directory name
> (unlink('New folder', recursive = TRUE, force = TRUE))
[0]

The function succeeds, as evidenced by listing the directory contents

> dir()
[1] "3D Objects"
[2] "AppData"
[3] "Contacts"
[4] "Desktop"
[5] "Documents"
[6] "Downloads"
[7] "Favorites"
[8] "IntelGraphicsProfiles"
[9] "Links"
[10] "MicrosoftEdgeBackups"
[11] "Music"
[12] "NTUSER.DAT"
[13] "ntuser.dat.LOG1"
[14] "ntuser.dat.LOG2"
[15] "NTUSER.DAT{a70b1724-6bc8-11e8-a408-d0bf9c58c5d2}.TM.blf"
[16] "NTUSER.DAT{a70b1724-6bc8-11e8-a408-d0bf9c58c5d2}.TMContainer00000000000000000001.regtrans-ms"
[17] "NTUSER.DAT{a70b1724-6bc8-11e8-a408-d0bf9c58c5d2}.TMContainer00000000000000000002.regtrans-ms"
[18] "ntuser.ini"
[19] "OneDrive"
[20] "Pictures"
[21] "R"
[22] "Saved Games"
[23] "Searches"
[24] "source"
[25] "Videos"

Watch out for this!

Leave a comment

Filed under Computers & Internet