surveydata.surveyctoplatform module

Support for SurveyCTO as a survey data platform.

class surveydata.surveyctoplatform.SurveyCTOPlatform(server: str = '', username: str = '', password: str = '', form_id: str = '', private_key: str = '')

Bases: SurveyPlatform

SurveyCTO survey data platform implementation.

__init__(server: str = '', username: str = '', password: str = '', form_id: str = '', private_key: str = '')

Initialize SurveyCTO for access to survey data.

Parameters:

server (str) – SurveyCTO server name (like “use”, without the https prefix or .surveycto.com suffix)
username (str) – Email address for API access
password (str) – Password for API access
form_id (str) – SurveyCTO form ID
private_key (str) – Full text of private key, if using encryption

If you’re not going to call sync_data(), you don’t need to supply any of the parameters to this constructor.

static get_submissions_df(storage: StorageSystem, sort_columns: bool = False) → DataFrame

Get all submission data from storage, organized into a Pandas DataFrame and optimized based on the platform.

Parameters:

storage (StorageSystem) – Storage system for submissions
sort_columns (bool) – True to sort columns by name

Returns:

Pandas DataFrame containing all submissions currently in storage

Return type:

pandas.DataFrame

static get_text_audit_df(storage: StorageSystem, location_string: str = '', location_strings: Optional[Series] = None) → DataFrame

Get one or more text audits from storage, organized into a Pandas DataFrame.

Parameters:

storage (StorageSystem) – Storage system for attachments
location_string (str) – Location string of single text audit to load
location_strings (pandas.Series) – Series of location strings of text audits to load

Returns:

DataFrame with either the single text audit contents or all text audit contents indexed by Series index

Return type:

pandas.DataFrame

Pass either a single location_string or a Series of location_strings.

static process_text_audits(ta_df: DataFrame, start_times: Optional[Series] = None, end_times: Optional[Series] = None, data_tz: Optional[timezone] = None, collection_tz: Optional[timezone] = None) → DataFrame

Process text audits by summarizing, transforming, and reshaping into a single row per submission.

Parameters:

ta_df (pd.DataFrame) – DataFrame with raw text audit data, typically from get_text_audit_df()
start_times (pd.Series) – Pandas Series with a starting date and time for each submission (indexed by submission ID)
end_times (pd.Series) – Pandas Series with an ending date and time for each submission (indexed by submission ID)
data_tz (datetime.timezone) – Timezone of timestamps in start_times and end_times
collection_tz (datetime.timezone) – Timezone of data collection

Returns:

Pandas DataFrame, indexed by submission ID, with summary details as well as field-by-field visit summaries

Return type:

pd.DataFrame

The returned DataFrame is indexed by submission ID and includes the following columns:

ta_duration_total - Total duration spent in form fields (ms); feature engineering recommendation: divide by max to rescale to 0-1
ta_duration_mean - Mean duration spent in form fields (ms); feature engineering recommendation: divide by max to rescale to 0-1
ta_duration_sd - Standard deviation of duration spent in form fields (ms); feature engineering recommendation: divide by max to rescale to 0-1
ta_duration_min - Min duration spent in form field (ms); feature engineering recommendation: divide by max to rescale to 0-1
ta_duration_max - Max duration spent in form field (ms); feature engineering recommendation: divide by max to rescale to 0-1
ta_fields - Number of fields visited; feature engineering recommendation: divide by max to rescale to 0-1
ta_time_in_fields - Percent of overall calendar time spent in fields; feature engineering recommendation: leave as 0-1 scale (but note that rounding errors and device clock issues can result in values outside (0, 1))
ta_sessions - Number of form-filling sessions (always 1 unless eventlog-level text audit data); feature engineering recommendation: divide by max to rescale to 0-1
ta_pct_revisits - Percent of field visits that are revisits (always 0 unless eventlog-level text audit data); feature engineering recommendation: leave as 0-1 scale
ta_start_dayofweek - Day of week submission started (0 for Sunday, only available if eventlog text audit data or timezone information supplied); feature engineering recommendation: one-hot encode
ta_start_hourofday - Hour of day submission started (only available if eventlog text audit data or timezone information supplied); feature engineering recommendation: one-hot encode
ta_field_x_visited - 1 if field x visited, otherwise 0; feature engineering recommendation: leave as 0-1 scale, fill missing with 0
ta_field_x_visit_y_start - When field x was visited the yth time divided by ta_total_duration, otherwise 0; feature engineering recommendation: leave as 0-1 scale, fill missing with 0
ta_field_x_visit_y_duration - Time spent on field x the yth time it was visited divided by ta_total_duration, otherwise 0 (i.e., percentage of overall form time spent on the field visit); feature engineering recommendation: leave as 0-1 scale, fill missing with 0 (or divide by max to rescale to full 0-1 range)

sync_data(storage: StorageSystem, attachment_storage: Optional[StorageSystem] = None, no_attachments: bool = False, review_statuses: Optional[list] = None) → list

Sync survey data to storage system.

Parameters:

storage (StorageSystem) – Storage system for submissions (and attachments, if supported and other options don’t override)
attachment_storage (StorageSystem) – Separate storage system for attachments (only if needed)
no_attachments (bool) – True to not sync attachments
review_statuses (list) – List of review statuses to include (any combo of “approved”, “pending”, “rejected”; if not specified, syncs only approved submissions)

Returns:

List of new submissions stored (submission ID strings)

Return type:

list

update_submissions(submission_updates: list)

Submit one or more submission updates, including reviews, classifications, and/or comments.

Parameters:: submission_updates (list) – List of dictionaries with one per update; each should include values for “submissionID”; “reviewStatus” (“none”, “approved”, or “rejected”); “qualityClassification” (“good”, “okay”, “poor”, or “fake”); and/or “comment” (custom text)

Warning: this method uses an undocumented SurveyCTO API that may break in future SurveyCTO releases.