surveydata.surveyctoplatform module

Support for SurveyCTO as a survey data platform.

class surveydata.surveyctoplatform.SurveyCTOPlatform(server: str = '', username: str = '', password: str = '', formid: str = '', private_key: str = '')

Bases: SurveyPlatform

SurveyCTO survey data platform implementation.

__init__(server: str = '', username: str = '', password: str = '', formid: str = '', private_key: str = '')

Initialize SurveyCTO for access to survey data.

Parameters
  • server (str) – SurveyCTO server name (like “use”, without the https prefix or .surveycto.com suffix)

  • username (str) – Email address for API access

  • password (str) – Password for API access

  • formid (str) – SurveyCTO form ID

  • private_key (str) – Full text of private key, if using encryption

If you’re not going to call sync_data(), you don’t need to supply any of the parameters to this constructor.

static get_submissions_df(storage: StorageSystem) DataFrame

Get all submission data from storage, organized into a Pandas DataFrame and optimized based on the platform.

Parameters

storage (StorageSystem) – Storage system for submissions

Returns

Pandas DataFrame containing all submissions currently in storage

Return type

pandas.DataFrame

static get_text_audit_df(storage: StorageSystem, location_string: str = '', location_strings: Optional[Series] = None) DataFrame

Get one or more text audits from storage, organized into a Pandas DataFrame.

Parameters
  • storage (StorageSystem) – Storage system for attachments

  • location_string (str) – Location string of single text audit to load

  • location_strings (pandas.Series) – Series of location strings of text audits to load

Returns

DataFrame with either the single text audit contents or all text audit contents indexed by Series index

Return type

pandas.DataFrame

Pass either a single location_string or a Series of location_strings.

static process_text_audits(ta_df: DataFrame, start_times: Optional[Series] = None, end_times: Optional[Series] = None, data_tz: Optional[timezone] = None, collection_tz: Optional[timezone] = None) DataFrame

Process text audits by summarizing, transforming, and reshaping into a single row per submission.

Parameters
  • ta_df (pd.DataFrame) – DataFrame with raw text audit data, typically from get_text_audit_df()

  • start_times (pd.Series) – Pandas Series with a starting date and time for each submission (indexed by submission ID)

  • end_times (pd.Series) – Pandas Series with an ending date and time for each submission (indexed by submission ID)

  • data_tz (datetime.timezone) – Timezone of timestamps in start_times and end_times

  • collection_tz (datetime.timezone) – Timezone of data collection

Returns

Pandas DataFrame, indexed by submission ID, with summary details as well as field-by-field visit summaries

Return type

pd.DataFrame

The returned DataFrame is indexed by submission ID and includes the following columns:

  • ta_duration_total - Total duration spent in form fields (ms); feature engineering recommendation: divide by max to rescale to 0-1

  • ta_duration_mean - Mean duration spent in form fields (ms); feature engineering recommendation: divide by max to rescale to 0-1

  • ta_duration_sd - Standard deviation of duration spent in form fields (ms); feature engineering recommendation: divide by max to rescale to 0-1

  • ta_duration_min - Min duration spent in form field (ms); feature engineering recommendation: divide by max to rescale to 0-1

  • ta_duration_max - Max duration spent in form field (ms); feature engineering recommendation: divide by max to rescale to 0-1

  • ta_fields - Number of fields visited; feature engineering recommendation: divide by max to rescale to 0-1

  • ta_time_in_fields - Percent of overall calendar time spent in fields; feature engineering recommendation: leave as 0-1 scale (but note that rounding errors and device clock issues can result in values outside (0, 1))

  • ta_sessions - Number of form-filling sessions (always 1 unless eventlog-level text audit data); feature engineering recommendation: divide by max to rescale to 0-1

  • ta_pct_revisits - Percent of field visits that are revisits (always 0 unless eventlog-level text audit data); feature engineering recommendation: leave as 0-1 scale

  • ta_start_dayofweek - Day of week submission started (0 for Sunday, only available if eventlog text audit data or timezone information supplied); feature engineering recommendation: one-hot encode

  • ta_start_hourofday - Hour of day submission started (only available if eventlog text audit data or timezone information supplied); feature engineering recommendation: one-hot encode

  • ta_field_x_visited - 1 if field x visited, otherwise 0; feature engineering recommendation: leave as 0-1 scale, fill missing with 0

  • ta_field_x_visit_y_start - When field x was visited the yth time divided by ta_total_duration, otherwise 0; feature engineering recommendation: leave as 0-1 scale, fill missing with 0

  • ta_field_x_visit_y_duration - Time spent on field x the yth time it was visited divided by ta_total_duration, otherwise 0 (i.e., percentage of overall form time spent on the field visit); feature engineering recommendation: leave as 0-1 scale, fill missing with 0 (or divide by max to rescale to full 0-1 range)

sync_data(storage: StorageSystem, attachment_storage: Optional[StorageSystem] = None, no_attachments: bool = False, review_statuses: Optional[list] = None) list

Sync survey data to storage system.

Parameters
  • storage (StorageSystem) – Storage system for submissions (and attachments, if supported and other options don’t override)

  • attachment_storage (StorageSystem) – Separate storage system for attachments (only if needed)

  • no_attachments (bool) – True to not sync attachments

  • review_statuses (list) – List of review statuses to include (any combo of “approved”, “pending”, “rejected”; if not specified, syncs only approved submissions)

Returns

List of new submissions stored (submission ID strings)

Return type

list

update_submissions(submission_updates: list)

Submit one or more submission updates, including reviews, classifications, and/or comments.

Parameters

submission_updates (list) – List of dictionaries with one per update; each should include values for “submissionID”; “reviewStatus” (“none”, “approved”, or “rejected”); “qualityClassification” (“good”, “okay”, “poor”, or “fake”); and/or “comment” (custom text)

Warning: this method uses an undocumented SurveyCTO API that may break in future SurveyCTO releases.