05 / Data Sourcing

How do we source the data?

  1. User Input

  2. Electronic Medical Records

  3. Research Database

06 / Data Processing

How do we process the data?

This is the step-by-step process I’d take:

  1. Data cleaning: handle missing value, remove duplicates, correct inaccuracies

  2. Create baseline: normalize + scale data to find median

  3. Data encoding: turn qualitative data into quantitate

  4. Handle outliers

  5. Create features so AI can start to recognize patterns

  6. Collect time periods for certain interactions (ie: treatment duration)

  7. Text data processing for NPL (ie: community discussions, medical literature)

  8. Diversify + merge multiple data sets

  9. Privacy: make sure all data is private

  10. Account for underrepresented data sets (ie: certain locations, HHI, etc)

  11. Prioritize data sets that are relevant to features > deprioritize irrelevant data sets

  12. Data splitting: split data into 3 cats β€” training, validation + test sets

07 / Data Sets

How do we ensure the data sets are accurate?

  1. Training Set: Train the AI; uses 80% of data set

    • Historical user data: past interactions, treatment outcomes, community engagement

    • Anonymous data from research databases + clinical studies

    • Features representing: user profiles, medical history + preferences

    2. Validation Set: Validate + fine-tune; uses 10-15% of data set

    • This would help with hyertuning

    3. Testing Set: Evaluate the final performance; uses 10-15% of data

    • Represents a completely unseen dataset for the model

08 / Privacy Concerns

What are the privacy and data concerns? How will we resolve or combat?

This is the process I’d take for each aspect of privacy + data concern:

  • Concern: Difficulty in tracking and responding to security incidents.

    Mitigation: Implement audit trails to log user activities and monitor the system for unusual behavior, enabling rapid response to security incidents.

  • Concern: Violation of data protection regulations such as GDPR, HIPAA, or other local laws.


    Mitigation:
    Ensure strict adherence to relevant data protection laws and obtain informed consent from users regarding data usage and storage practices.

  • Concern: Preserving user privacy by preventing the identification of individuals.

    Mitigation: Anonymize personally identifiable information (PII) in the dataset and ensure that aggregated results cannot be traced back to individual users.

  • Concern: Protecting data in transit and at rest to prevent unauthorized access.

    Mitigation: Implement strong encryption protocols for communication between users and the AI platform, as well as for storing data.

  • Concern: Lack of transparency about data usage and AI model purposes.

    Mitigation: Obtain explicit informed consent from users regarding the use of their data for AI model training, research, and improvement purposes.

  • Concern: Unidentified vulnerabilities in the system.

    Mitigation: Conduct regular security audits and penetration testing to identify and address potential vulnerabilities in the AI software.description

  • Concern: Unauthorized access to stored data.

    Mitigation: Use secure, compliant data storage solutions with access controls, regular security audits, and compliance checks.

  • Concern: Intercepting sensitive information during communication.

    Mitigation: Ensure secure communication channels using encryption protocols, especially during virtual consultations and data exchanges.

  • Concern: Lack of clarity regarding how user data is used and shared.

    Mitigation: Clearly communicate privacy policies to users, detailing the purpose of data collection, storage, and usage. Provide users with options to control their data preferences.

  • Concern: Unauthorized access to user profiles and medical information

    Mitigation: Employ robust user authentication mechanisms (e.g., multi-factor authentication) and ensure strict authorization controls to limit access based on user roles and permissions

  • Concern: Lock-in of user data without options for portability.

    Mitigation: Provide users with the ability to export their data, promoting data portability and transparency.

Previous
Previous

Combatting AI Error, Learning + Optimizations

Next
Next

Testing