Work with SASPy

Initiate a SAS session

Once SASPy is configured the first step in working with SASPy is to initiate a SAS session from a Python environment. This will launch a SAS session in the background that will be available to run statistical analyses on any input data. After the SAS session is initiated, the general order of steps is to send a Pandas data frame to SAS, submit SAS commands from the Python session, retrieve statistical output or data from SAS to the Python environment. The approach is very similar to that of R and Python via the reticulate package.

# Data manipulation
import pandas as pd

# Module with a sample data set
import bambi as bmb

# Interface with SAS
import saspy

# Loads a custom function
from my_fx.utilities import format_pval_df
WARNING (pytensor.configdefaults): g++ not available, if using conda: `conda install gxx`
WARNING (pytensor.configdefaults): g++ not detected!  PyTensor will be unable to compile C-implementations and will default to Python. Performance may be severely degraded. To remove this warning, set PyTensor flags cxx to an empty string.
sas = saspy.SASsession(cfgname = 'autogen_winlocal')
SAS Connection established. Subprocess id is 21596

Load an example data set

data = bmb.load_data("sleepstudy")
data.head()
Reaction Days Subject
0 249.5600 0 308
1 258.7047 1 308
2 250.8006 2 308
3 321.4398 3 308
4 356.8519 4 308

Send data to SAS

The next step in working with SASPy is to send a Pandas data frame to SAS. This command will send the data frame “data” to the background SAS session. Before sending data to SAS, it may be a good idea to double check that SAS has the proper formatting for dates and that the values, if categorical are recoded to comply with SAS column and value conventions. By default this data will be named _df and will be found in the work library

sas_data = sas.df2sd(data, verbose = False)

Submit SAS commands

The main functions to submit sas commands on data that is available in the sas session are sas.submit() and sas.submitLST(). The primary difference is that the LST version of the function will display the log and any output in the viewer when working in Positron. I personally use the LST version of the function to ensure that the SAS procedures are running correctly. When it is determined that the SAS procedures are running correctly. I then will remove the LST and then extract the tables from SAS to display in a Quarto document. To save the output of SAS procedures I use ods output statements as in the example below.

# Use sas.submitLST() to display output in viewer in an interactive session,
# but use sas.submit() when rendering a .qmd document.
c = sas.submit(
"""
ods output Tests3=type3_results;

proc mixed data = work._df;
  class Subject Days;
  model Reaction = Days;
  random intercept/subject = Subject;
run;
""")

Retrieve ods output tables from SAS

In the code chunk above, we set ods output to save the Type 3 sums of squares results to a table named type3_results. We can then retrieve that table from SAS into our Python environment. Once in the Python environment, the tables can be formatted to your liking and purpose. Here’s an example of how to format the Test3 table using a custom function to format the p-values.

type3_results = sas.sasdata("type3_results", libref = "work").to_df()

# Format the p values
type3_results["ProbF"] = format_pval_df(type3_results['ProbF'])

# Round all numberical values, set index and display
type3_results.round(2).set_index("Effect")
NumDF DenDF FValue ProbF
Effect
Days 9.0 153.0 18.7 <0.0001