surveydata.surveyctoplatform module
Support for SurveyCTO as a survey data platform.
- class surveydata.surveyctoplatform.SurveyCTOPlatform(server: str = '', username: str = '', password: str = '', formid: str = '', private_key: str = '')
Bases:
SurveyPlatform
SurveyCTO survey data platform implementation.
- __init__(server: str = '', username: str = '', password: str = '', formid: str = '', private_key: str = '')
Initialize SurveyCTO for access to survey data.
- Parameters
server (str) – SurveyCTO server name (like “use”, without the https prefix or .surveycto.com suffix)
username (str) – Email address for API access
password (str) – Password for API access
formid (str) – SurveyCTO form ID
private_key (str) – Full text of private key, if using encryption
If you’re not going to call sync_data(), you don’t need to supply any of the parameters to this constructor.
- static get_submissions_df(storage: StorageSystem) DataFrame
Get all submission data from storage, organized into a Pandas DataFrame and optimized based on the platform.
- Parameters
storage (StorageSystem) – Storage system for submissions
- Returns
Pandas DataFrame containing all submissions currently in storage
- Return type
pandas.DataFrame
- static get_text_audit_df(storage: StorageSystem, location_string: str = '', location_strings: Optional[Series] = None) DataFrame
Get one or more text audits from storage, organized into a Pandas DataFrame.
- Parameters
storage (StorageSystem) – Storage system for attachments
location_string (str) – Location string of single text audit to load
location_strings (pandas.Series) – Series of location strings of text audits to load
- Returns
DataFrame with either the single text audit contents or all text audit contents indexed by Series index
- Return type
pandas.DataFrame
Pass either a single location_string or a Series of location_strings.
- static process_text_audits(ta_df: DataFrame, start_times: Optional[Series] = None, end_times: Optional[Series] = None, data_tz: Optional[timezone] = None, collection_tz: Optional[timezone] = None) DataFrame
Process text audits by summarizing, transforming, and reshaping into a single row per submission.
- Parameters
ta_df (pd.DataFrame) – DataFrame with raw text audit data, typically from get_text_audit_df()
start_times (pd.Series) – Pandas Series with a starting date and time for each submission (indexed by submission ID)
end_times (pd.Series) – Pandas Series with an ending date and time for each submission (indexed by submission ID)
data_tz (datetime.timezone) – Timezone of timestamps in start_times and end_times
collection_tz (datetime.timezone) – Timezone of data collection
- Returns
Pandas DataFrame, indexed by submission ID, with summary details as well as field-by-field visit summaries
- Return type
pd.DataFrame
The returned DataFrame is indexed by submission ID and includes the following columns:
ta_duration_total - Total duration spent in form fields (ms); feature engineering recommendation: divide by max to rescale to 0-1
ta_duration_mean - Mean duration spent in form fields (ms); feature engineering recommendation: divide by max to rescale to 0-1
ta_duration_sd - Standard deviation of duration spent in form fields (ms); feature engineering recommendation: divide by max to rescale to 0-1
ta_duration_min - Min duration spent in form field (ms); feature engineering recommendation: divide by max to rescale to 0-1
ta_duration_max - Max duration spent in form field (ms); feature engineering recommendation: divide by max to rescale to 0-1
ta_fields - Number of fields visited; feature engineering recommendation: divide by max to rescale to 0-1
ta_time_in_fields - Percent of overall calendar time spent in fields; feature engineering recommendation: leave as 0-1 scale (but note that rounding errors and device clock issues can result in values outside (0, 1))
ta_sessions - Number of form-filling sessions (always 1 unless eventlog-level text audit data); feature engineering recommendation: divide by max to rescale to 0-1
ta_pct_revisits - Percent of field visits that are revisits (always 0 unless eventlog-level text audit data); feature engineering recommendation: leave as 0-1 scale
ta_start_dayofweek - Day of week submission started (0 for Sunday, only available if eventlog text audit data or timezone information supplied); feature engineering recommendation: one-hot encode
ta_start_hourofday - Hour of day submission started (only available if eventlog text audit data or timezone information supplied); feature engineering recommendation: one-hot encode
ta_field_x_visited - 1 if field x visited, otherwise 0; feature engineering recommendation: leave as 0-1 scale, fill missing with 0
ta_field_x_visit_y_start - When field x was visited the yth time divided by ta_total_duration, otherwise 0; feature engineering recommendation: leave as 0-1 scale, fill missing with 0
ta_field_x_visit_y_duration - Time spent on field x the yth time it was visited divided by ta_total_duration, otherwise 0 (i.e., percentage of overall form time spent on the field visit); feature engineering recommendation: leave as 0-1 scale, fill missing with 0 (or divide by max to rescale to full 0-1 range)
- sync_data(storage: StorageSystem, attachment_storage: Optional[StorageSystem] = None, no_attachments: bool = False, review_statuses: Optional[list] = None) list
Sync survey data to storage system.
- Parameters
storage (StorageSystem) – Storage system for submissions (and attachments, if supported and other options don’t override)
attachment_storage (StorageSystem) – Separate storage system for attachments (only if needed)
no_attachments (bool) – True to not sync attachments
review_statuses (list) – List of review statuses to include (any combo of “approved”, “pending”, “rejected”; if not specified, syncs only approved submissions)
- Returns
List of new submissions stored (submission ID strings)
- Return type
list
- update_submissions(submission_updates: list)
Submit one or more submission updates, including reviews, classifications, and/or comments.
- Parameters
submission_updates (list) – List of dictionaries with one per update; each should include values for “submissionID”; “reviewStatus” (“none”, “approved”, or “rejected”); “qualityClassification” (“good”, “okay”, “poor”, or “fake”); and/or “comment” (custom text)
Warning: this method uses an undocumented SurveyCTO API that may break in future SurveyCTO releases.