Python Time

May 15, 2020

After taking this computational biology course, I have learned a number of ways to use both R and Python for data science. One of the most interesting things i’ve learned is how to make R and Python work together to manipulate list variables. I will now demonstrate how to create, sort, and combine lists in python and then manipulate the combined list in R to create a working dataset.

Before I can begin working with Python in Rstudio, I must load in the required packages as well as the data “skulls”

#R

#install.packages("reticulate")
#install_miniconda()
library(reticulate)
Sys.setenv(RETICULATE_PYTHON = "/usr/bin/python")
use_python("usr/bin/python")

Creating two variables

Creating variables for the amount of time each animal spends in the shelter and whether they are a cat or dog.

#Python

#Time in shelter for each cat (in months)
cattime = [2, 6, 2, 8, 7, 5, 3, 4, 4]
cattime=list(cattime)

#Time in shelter for each dog (in months)
dogtime = [3, 1, 2, 4, 2, 5, 6, 3, 4]
dogtime=list(dogtime)

Sorting the variables by time and

Now that I’ve created these two variables in python lets sort them ascending by time spent (in R we would use tidyverse’s arrange())

#Python

#Sorted ascending
cattime.sort()
cattime

#Sorted ascending

## [2, 2, 3, 4, 4, 5, 6, 7, 8]

dogtime.sort()
dogtime

## [1, 2, 2, 3, 3, 4, 4, 5, 6]

###Getting summary statistics for each species

#Average amount of time cats spend in the shelter
cattime_sum = sum(cattime)
print(cattime_sum/9)

#Average amount of time cats spend in the shelter

## 4

dogtime_sum = sum(dogtime)
print(dogtime_sum/9)

#Maximum and Minimum shelter times for cats

## 3

maxc=max(cattime)
minc=min(cattime)
print(minc, maxc)

#Maximum and Minimum shelter times for dogs

## (2, 8)

maxd=max(dogtime)
mind=min(dogtime)
print(mind, maxd)

## (1, 6)

Combining lists

I now want to combine both lists of 9 cats and 9 dogs into 1 list of 18 animals

#Python

animaltime = cattime+dogtime

Viewing list in R

Now that the lists are combined, let’s look at the whole thing in R

#R
time<- py$animaltime

Creating a dataset from the python list in R

I want to make a dataset from these times that has a variable determining if the animal is a cat or a dog.

#R

#The first 9 animals on the list are cats and the next 9 are dogs so we can make a list with 9 cats and 9 dogs and then cbind it to our time list

animal<-c("cat","cat","cat","cat","cat","cat","cat","cat","cat","dog","dog","dog","dog","dog","dog","dog","dog","dog")


shelter<-as.data.frame(cbind(animal,time))

shelter$time<-as.character(shelter$time)
shelter$time<-as.numeric(shelter$time)

shelter$animal<-as.character(shelter$animal)

str(shelter)

## 'data.frame':    18 obs. of  2 variables:
##  $ animal: chr  "cat" "cat" "cat" "cat" ...
##  $ time  : num  2 2 3 4 4 5 6 7 8 1 ...

head(shelter)

##   animal time
## 1    cat    2
## 2    cat    2
## 3    cat    3
## 4    cat    4
## 5    cat    4
## 6    cat    5

Manipulating the dataset

Now that the lists have been turned into an R dataset we can use it to learn about the differences between cats and dogs.

#R
library(tidyverse)

## ── Attaching packages ─────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.0     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.5
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0

## ── Conflicts ────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(dplyr)

#Finding the averages, max, and mean in R now
shelter %>% group_by(animal) %>% summarise(avg=mean(time))

## # A tibble: 2 x 2
##   animal   avg
##   <chr>  <dbl>
## 1 cat     4.56
## 2 dog     3.33

shelter %>% group_by(animal) %>% summarise(min=min(time), max=max(time))

## # A tibble: 2 x 3
##   animal   min   max
##   <chr>  <dbl> <dbl>
## 1 cat        2     8
## 2 dog        1     6

This shows the many possibilities of combining r and python without having to load in additional python packages such as pandas or numpy (mostly because my computer completely force quitted r everytime I tried to load them). They can work together to do any number of things, this is just a demonstration of list manipulation and going between python and r.