surveydata.surveyctoplatform module

Support for SurveyCTO as a survey data platform.

class surveydata.surveyctoplatform.SurveyCTOPlatform(server: str = '', username: str = '', password: str = '', formid: str = '', private_key: str = '')

Bases: SurveyPlatform

SurveyCTO survey data platform implementation.

__init__(server: str = '', username: str = '', password: str = '', formid: str = '', private_key: str = '')

Initialize SurveyCTO for access to survey data.

Parameters

server (str) – SurveyCTO server name (like “use”, without the https prefix or .surveycto.com suffix)
username (str) – Email address for API access
password (str) – Password for API access
formid (str) – SurveyCTO form ID
private_key (str) – Full text of private key, if using encryption

If you’re not going to call sync_data(), you don’t need to supply any of the parameters to this constructor.

static get_submissions_df(storage: StorageSystem) → DataFrame

Get all submission data from storage, organized into a Pandas DataFrame and optimized based on the platform.

Parameters: storage (StorageSystem) – Storage system for submissions
Returns: Pandas DataFrame containing all submissions currently in storage
Return type: pandas.DataFrame

static get_text_audit_df(storage: StorageSystem, location_string: str = '', location_strings: Optional[Series] = None) → DataFrame

Get one or more text audits from storage, organized into a Pandas DataFrame.

Parameters

storage (StorageSystem) – Storage system for attachments
location_string (str) – Location string of single text audit to load
location_strings (pandas.Series) – Series of location strings of text audits to load

Returns

DataFrame with either the single text audit contents or all text audit contents indexed by Series index

Return type

pandas.DataFrame

Pass either a single location_string or a Series of location_strings.

static process_text_audits(ta_df: DataFrame, start_times: Optional[Series] = None, end_times: Optional[Series] = None, data_tz: Optional[timezone] = None, collection_tz: Optional[timezone] = None) → DataFrame

Process text audits by summarizing, transforming, and reshaping into a single row per submission.

Parameters

ta_df (pd.DataFrame) – DataFrame with raw text audit data, typically from get_text_audit_df()
start_times (pd.Series) – Pandas Series with a starting date and time for each submission (indexed by submission ID)
end_times (pd.Series) – Pandas Series with an ending date and time for each submission (indexed by submission ID)
data_tz (datetime.timezone) – Timezone of timestamps in start_times and end_times
collection_tz (datetime.timezone) – Timezone of data collection

Returns

Pandas DataFrame, indexed by submission ID, with summary details as well as field-by-field visit summaries

Return type

pd.DataFrame

The returned DataFrame is indexed by submission ID and includes the following columns:

ta_duration_total - Total duration spent in form fields (ms); feature engineering recommendation: divide by max to rescale to 0-1
ta_duration_mean - Mean duration spent in form fields (ms); feature engineering recommendation: divide by max to rescale to 0-1
ta_duration_sd - Standard deviation of duration spent in form fields (ms); feature engineering recommendation: divide by max to rescale to 0-1
ta_duration_min - Min duration spent in form field (ms); feature engineering recommendation: divide by max to rescale to 0-1
ta_duration_max - Max duration spent in form field (ms); feature engineering recommendation: divide by max to rescale to 0-1
ta_fields - Number of fields visited; feature engineering recommendation: divide by max to rescale to 0-1
ta_time_in_fields - Percent of overall calendar time spent in fields; feature engineering recommendation: leave as 0-1 scale (but note that rounding errors and device clock issues can result in values outside (0, 1))
ta_sessions - Number of form-filling sessions (always 1 unless eventlog-level text audit data); feature engineering recommendation: divide by max to rescale to 0-1
ta_pct_revisits - Percent of field visits that are revisits (always 0 unless eventlog-level text audit data); feature engineering recommendation: leave as 0-1 scale
ta_start_dayofweek - Day of week submission started (0 for Sunday, only available if eventlog text audit data or timezone information supplied); feature engineering recommendation: one-hot encode
ta_start_hourofday - Hour of day submission started (only available if eventlog text audit data or timezone information supplied); feature engineering recommendation: one-hot encode
ta_field_x_visited - 1 if field x visited, otherwise 0; feature engineering recommendation: leave as 0-1 scale, fill missing with 0
ta_field_x_visit_y_start - When field x was visited the yth time divided by ta_total_duration, otherwise 0; feature engineering recommendation: leave as 0-1 scale, fill missing with 0
ta_field_x_visit_y_duration - Time spent on field x the yth time it was visited divided by ta_total_duration, otherwise 0 (i.e., percentage of overall form time spent on the field visit); feature engineering recommendation: leave as 0-1 scale, fill missing with 0 (or divide by max to rescale to full 0-1 range)

sync_data(storage: StorageSystem, attachment_storage: Optional[StorageSystem] = None, no_attachments: bool = False, review_statuses: Optional[list] = None) → list

Sync survey data to storage system.

Parameters

storage (StorageSystem) – Storage system for submissions (and attachments, if supported and other options don’t override)
attachment_storage (StorageSystem) – Separate storage system for attachments (only if needed)
no_attachments (bool) – True to not sync attachments
review_statuses (list) – List of review statuses to include (any combo of “approved”, “pending”, “rejected”; if not specified, syncs only approved submissions)

Returns

List of new submissions stored (submission ID strings)

Return type

list

update_submissions(submission_updates: list)

Submit one or more submission updates, including reviews, classifications, and/or comments.

Parameters: submission_updates (list) – List of dictionaries with one per update; each should include values for “submissionID”; “reviewStatus” (“none”, “approved”, or “rejected”); “qualityClassification” (“good”, “okay”, “poor”, or “fake”); and/or “comment” (custom text)

Warning: this method uses an undocumented SurveyCTO API that may break in future SurveyCTO releases.