05 / Data Sourcing

How do we source the data?

User Input
Electronic Medical Records
Research Database

06 / Data Processing

How do we process the data?

This is the step-by-step process I’d take:

Data cleaning: handle missing value, remove duplicates, correct inaccuracies
Create baseline: normalize + scale data to find median
Data encoding: turn qualitative data into quantitate
Handle outliers
Create features so AI can start to recognize patterns
Collect time periods for certain interactions (ie: treatment duration)
Text data processing for NPL (ie: community discussions, medical literature)
Diversify + merge multiple data sets
Privacy: make sure all data is private
Account for underrepresented data sets (ie: certain locations, HHI, etc)
Prioritize data sets that are relevant to features > deprioritize irrelevant data sets
Data splitting: split data into 3 cats — training, validation + test sets

07 / Data Sets

How do we ensure the data sets are accurate?

Training Set: Train the AI; uses 80% of data set
- Historical user data: past interactions, treatment outcomes, community engagement
- Anonymous data from research databases + clinical studies
- Features representing: user profiles, medical history + preferences
2. Validation Set: Validate + fine-tune; uses 10-15% of data set
- This would help with hyertuning
3. Testing Set: Evaluate the final performance; uses 10-15% of data
- Represents a completely unseen dataset for the model

08 / Privacy Concerns

What are the privacy and data concerns? How will we resolve or combat?

This is the process I’d take for each aspect of privacy + data concern:

Concern: Difficulty in tracking and responding to security incidents.
Mitigation: Implement audit trails to log user activities and monitor the system for unusual behavior, enabling rapid response to security incidents.
Concern: Violation of data protection regulations such as GDPR, HIPAA, or other local laws.

Mitigation: Ensure strict adherence to relevant data protection laws and obtain informed consent from users regarding data usage and storage practices.
Concern: Preserving user privacy by preventing the identification of individuals.
Mitigation: Anonymize personally identifiable information (PII) in the dataset and ensure that aggregated results cannot be traced back to individual users.
Concern: Protecting data in transit and at rest to prevent unauthorized access.
Mitigation: Implement strong encryption protocols for communication between users and the AI platform, as well as for storing data.
Concern: Lack of transparency about data usage and AI model purposes.
Mitigation: Obtain explicit informed consent from users regarding the use of their data for AI model training, research, and improvement purposes.
Concern: Unidentified vulnerabilities in the system.
Mitigation: Conduct regular security audits and penetration testing to identify and address potential vulnerabilities in the AI software.description
Concern: Unauthorized access to stored data.
Mitigation: Use secure, compliant data storage solutions with access controls, regular security audits, and compliance checks.
Concern: Intercepting sensitive information during communication.
Mitigation: Ensure secure communication channels using encryption protocols, especially during virtual consultations and data exchanges.
Concern: Lack of clarity regarding how user data is used and shared.
Mitigation: Clearly communicate privacy policies to users, detailing the purpose of data collection, storage, and usage. Provide users with options to control their data preferences.
Concern: Unauthorized access to user profiles and medical information
Mitigation: Employ robust user authentication mechanisms (e.g., multi-factor authentication) and ensure strict authorization controls to limit access based on user roles and permissions
Concern: Lock-in of user data without options for portability.
Mitigation: Provide users with the ability to export their data, promoting data portability and transparency.

05 / Data Sourcing

06 / Data Processing

07 / Data Sets

08 / Privacy Concerns

Combatting AI Error, Learning + Optimizations

Testing