Competition tasks - IDASH PRIVACY & SECURITY WORKSHOP 2024

Track 1. Secure Neural Network Evaluation for Protein Sequence Classification

Organizers: Arif O. Harmanci, Miran Kim and Xiaoqian Jiang

BACKGROUND

As the number of large-scale models run on the web is increasing (e.g., ChatGPT), there is a lot of interest in securely evaluating these so that the queries sent to these models can be protected by encryption.

In this year's challenge, participants are asked to build a secure method using homomorphic encryption (HE) to evaluate a neural network for classifying protein sequences.

The method should take as input a query protein sequence and a plaintext model. The output should be encrypted so that, when decrypted, it should match the output of the model evaluated on the plaintext input sequence. This way, the input sequence data is protected. The neural network contains an embedding layer, followed by an attention layer and a classification layer. It contains many parameters; as such, it represents a feasibility test for the participating teams, which will help push the secure computation field forward and make it feasible to evaluate commonly used large models (e.g., LLMs) in the secure domain.

GOAL

In this challenge, there are 2 entities:

QE: Querying entity with a protein sequence they would like to classify.
CE: Computing entity that holds the model.

QE wants to evaluate CE's model to classify their protein sequence but does not want to reveal the protein sequence publicly.

The model takes a protein sequence (i.e., a sequence of one-letter amino acid symbols) and classifies the protein into one of 25 protein families. Output is a 25-dimensional score vector for each input example.

CE only receives encrypted protein sequence(s) and the needed set of keys from QE (except for the secret key), evaluates the model securely on the encrypted protein sequence, and returns the encrypted results to QE. In the end, QE decrypts the results.

CHALLENGE

In the challenge period, the teams will be provided with:

The protein classification neural network model and related documentation.
A text file with sequences and classes for 1000 proteins. This file is provided as the example input file format for the Evaluation Stage of the competition. Note that this example file is not the training dataset for the model, and teams are not required to train the model; this file is only included as an example of the file format we will use in the Evaluation stage.

DATA FORMATTING

We provide an example input data, model and the instructions to load and run the model.

The output (after decryption) should be a text file that contains a 25-element vector for each input sequence, indicating the score for each protein class. The format of this output can be defined by the participants.

REQUIREMENTS FOR THE SOLUTIONS

The teams must implement (or approximate) each network layer and provide the details of approximation in the final documentation.
The solutions do not have to implement the softmax activation function from the last Dense Layer that saves a 25-element score vector.
Solutions do not need to simulate QE and CE or the network traffic explicitly. The consecutive steps can be implemented into the solution.
Preprocessing of plaintext protein sequence data is not allowed except for one linear scaling operation of the data before and after decryption (solutions don't need to implement the final softmax activation).
The encryption must satisfy the requirements of the HE white paper: Albrecht et al., Homomorphic Encryption Standard, https://eprint.iacr.org/2019/939.pdf

EVALUATION CRITERIA

In the evaluation period, the submissions will be evaluated with respect to the accuracy of secure classification on a test set of 100 class-balanced protein sequences. All submissions will also be benchmarked with respect to time usage in minutes.

A final score will be used to assign ranks to all submissions:

"auROC per exp(wall time per 20 min)" = (auROC / exp(total time in minutes / 20))

* exp(total time in minutes / 20) denotes the e^(wall_time_in_minutes / 20)

* Submissions that require more than 60 minutes of wall time will be excluded from further evaluation.

LICENSE

This track requires every participating team to share their code and/or binaries under the BSD 3-Clause Open Source license. The track co-organizers will upload all submitted code and/or binaries to a GitHub repository under the BSD 3-Clause License right after the competition results are announced. By submitting the code and/or binaries, the participants automatically consent to allow the track co-organizers to release the code and/or binaries under the BSD 3-Clause License.

DATA USAGE AND PUBLICATION AGREEMENT

By registering and/or participating in this challenge and receiving restricted access to the challenge dataset, members of all teams agree to abide by the following rules of data usage:

They will not share the challenge dataset with others.
They will not use the challenge dataset in any publications until after the iDASH24 Workshop concludes.
They will adhere to the license terms of the shared code, data, and documentation when they are used before or after the challenge period.

These are set up to ensure fairness among the participating teams.

FAQ: For questions and clarifications, please check and post on FAQ @ https://docs.google.com/document/d/1IyiYgL6mz6tpEpGZjA_sLC2Qd7TuW_6Zkv9XL9YNJdQ/edit?usp=sharing

Submission: Please use the link below to submit your solution:

https://uthtmc.az1.qualtrics.com/jfe/form/SV_dbPEknaNsEMdpwG

For any question, please contact Arif @ arif.o.harmanci@uth.tmc.edu

DATA & DOCUMENTATION: data/code/documentation: https://drive.google.com/drive/folders/13_a4H3pkwi36lJOqh4rgW0odKcVXrQ2S?usp=sharing

Track 2: Federated dataset selection for collaborative machine learning using biomedical data

Organized by Haixu Tang, XiaoFeng Wang, Yongming Fan, Zihao Wang and Rui Zhu

Background

Machine learning (ML) models have been developed to predict the risks of complex diseases such as cancer from phenotype data [1, 2, 3, 4]. In practice, collaboration is often required in order to build the ML model on the data from multiple cohorts held by different organizations. However, in many cases, the organizations could be reluctant to share their data human subjects due to the restrictions of their organizational data sharing policies. Therefore, it becomes highly desired to enable two or more participants (clients) to collaboratively build a ML model without directly sharing the data, a scenario often termed as federated learning or collaborative learning. Current literature indicates that Federated Learning (FL) can integrate multicenter phenotype data while protecting privacy, thus enhancing the accuracy of cancer prediction models [5, 6, 7]. Indeed, FL has shown significant advantages in cancer subtype classification, drug response prediction, and tumor biomarker discovery.

One of the primary challenges in applying FL to disease prediction is the potential bias or skewness in local data from various clients. Without appropriate data filtering or correction, integrating these skewed datasets may result in an ML model that underperforms those trained solely on local data [8, 9, 10]. This issue arises because the model attempts to generalize across skewed data inputs, leading to potentially inaccurate or unfair predictions. Currently, an effective solution to mitigate skewness involves assigning different weights to the intermediate results uploaded by clients. Clients that may negatively impact the final result are assigned lower weights, while those that are likely to positively influence the final model performance receive higher weights. However, determining which clients are more likely to have a beneficial impact on the final model’s performance remains a challenging task.

Challenge

Our challenge focuses on federated survival analysis using the COX model [11, 12], i.e, to predict the survival rate based on the input phenotypes of patients. We invite participating teams to propose a weighted aggregation algorithm that demonstrates high generalizability. The goal is to optimize the aggregation by assigning each client different weights according to their data quality or bias [13, 14, 15].

We challenge participating teams to implement a weighted aggregation algorithm. We will provide phenotype datasets (including labels) from multiple clients and a predefined FL framework with preset hyperparameters. Teams are tasked with developing an algorithm to assign each client a weight ranging from 0 to 1, with the baseline being weight assignment based on the number of samples each client holds.

We provide the code and the training data for our federated COX analysis at https://github.com/idash2024/iDash2024. Everything except the weighted aggregation algorithm will be fixed. The parameters and features passed into the weighted aggregation algorithm can be adjusted based on your needs. Participants can use the baseline code to fine-tune their weighted aggregation algorithm and aim to achieve the best possible performance on an unreleased testing dataset.

Evaluation Criteria

We will use two evaluation criteria:

The metric used to evaluate the generalizability of the weighted aggregation submitted is the concordance index (c-index). It measures how well the risk scores rank in relation to the times-to-event on an unreleased testing dataset, with a high c-index indicating accurate inverse ranking.
The metric used to evaluate the efficiency of the weighted aggregation submitted is the computation and communication cost. It measures the training time required by the method, with lower times indicating higher efficiency.

If the difference in c-index values is less than 1%, we will compare their efficiency for the final ranking. Otherwise, we will use the c-index alone for ranking.

Experimental setting

We will provide the clients' datasets (D₁, D₂, …, D_C) along with a FL framework. Each team needs to design a weighted aggregation algorithm to compute a 0-1 value for each D_i (i=1, 2, …, C). We will evaluate this model on a holdout testing dataset (D_test). We will run the source code on our server to evaluate the algorithm’s efficiency and produce the resulting model. Subsequently, we will test the model on an unreleased dataset to assess the algorithm’s effectiveness.

Test Platform Information OS

TBD

Data Skeleton

The dataset provided contains various attributes related to patients, including demographic information, clinical features, and specific measurements labeled as ‘E’ and ‘T’. It is formatted as a .csv file with the header indicating the following 39 attributes: Demographic features include age_at_index, ethnicity (not_hispanic_or_latino, not_reported), and race (asian, black_or_african_american, not_reported, white). Clinical features cover ajcc_pathologic (m_MX, n_N0, n_N0 (i-), n_N1, n_N1a, n_N1b, n_N1mi, n_N2, n_N2a, n_N2b, n_N3, n_N3a, n_N3b, n_N3c, t_T1, t_T1a, t_T1b, t_T1c, t_T2, t_T3, t_T4, t_T4a, t_T4b, t_T4d), treatment_or_therapy (not_reported, yes), and tumor_stage (stage_i, stage_ia, stage_iia, stage_iib, stage_iiia, stage_iiic). The specific measurements are E and T.

FAQ:

For questions and clarifications, please check and post on FAQ @

https://docs.google.com/document/d/17eZwmLRLEe_3GjAdplPi5Hi--bpP1iaNGZwHZovu5Tc/edit?usp=sharing

If your question is not in the FAQ, please contact us at fan322@purdue.edu.

Submission:

Please send your submission to fan322@purdue.edu with the email subject “iDash track 2 submission from team xxx,” where xxx is your registered team name. Please attach a zip file containing the complete, runnable code (Based on https://github.com/idash2024/iDash2024). Ensure that it can be executed by running `python3 federated.py`. Additionally, include a brief description of your solution. We will test it accordingly. .

Terms of Use

The data terms can be found at https://gdc.cancer.gov/access-data/data-access-processes-and-tools. Please note that we only use unrestricted data, but we do not guarantee that the use of this data is completely free for the user. It is mandatory to check the applicability of the license associated with this data before using it.

In particular, according to the GDC data access policy at https://gdc.cancer.gov/about-gdc/gdc-policies, users must not attempt to identify individual human research participants from whom the data were obtained.

In line with TCGA policies (https://gdc.cancer.gov/egc/research/genome-sequencing/tcga/history/ethics-policies), special care has been taken to ensure the privacy protection of research subjects, including compliance with HIPAA regulations. Please note that we do not use the genetic data from TCGA, as its access is restricted due to its sensitivity.

References

Alfayez, Asma Abdullah, Holger Kunz, and Alvina Grace Lai. "Predicting the risk of cancer in adults using supervised machine learning: a scoping review." BMJ open 11, no. 9 (2021): e047755.
Liu, Jiaqi, Hengqiang Zhao, Yu Zheng, Lin Dong, Sen Zhao, Yukuan Huang, Shengkai Huang et al. "DRABC: deep learning accurately predicts optimal immune pathogenic mutation status in breast cancer patients based on phenotype data." Genome Medicine 14, no. 1 (2022): 21.
Zou, Dex, Lixin Yang, Yu Jin, Huan Qi, Yahu Li, and Li Ren. "Machine learning-based models for the prediction of breast cancer recurrence risk." BMC Medical Informatics and Decision Making 23, no. 1 (2023): 276.
Gharib, Badr, and Aleksander Vakanski. "Machine learning methods for cancer classification using gene expression data: a review." Bioengineering 10, no. 2 (2023): 173.
Buol, Constanza, Can Yousef, Tural Iqra, Meach Mahmoud, and Eric W. Traerl. "Differentially private federated learning for cancer prediction." arXiv preprint arXiv:2107.02997 (2021).
Almutraf, Marah Fahaad, Noshira Tariq, Mamoona Humayun, and Bushra Almas. "A Federated Learning Approach to Breast Cancer Prediction in a Collaborative Learning Framework." Healthciences 11, no. 1, pp. 3-185, PMID, 2023.
Yiong, Goudong, Ming Xie, Tao Shen, Tianyi Zhou, Xianzhi Wang, and Yong Ding. "Multi-center federated learning: clients clustering for better personalization." World Wide Web 26, no. 1 (2023): 481-500.
Rajendran, Suraj, Zhenxing Xu, Weishen Pan, Arnab Ghosh, and Fei Wang. "Data heterogeneity in federated learning: clients clustering for better personalization." PLOS Digital Health 2, no. 3 (2023): e000017.
Guo, Yongxin, Xiaoyang Tao, and Bo Tian. "FedBr: improving federated learning on heterogeneous data via local learning bias reduction." In International Conference on Machine Learning, pp. 12034-12045. PMLR, 2023.
Abay, Aminu, Yi Zhou, Nathalie Baracaldo, Shashank Rajanomi, Ebube Chuba, and Heiko Ludwig. "Mitigating bias in federated learning." arXiv preprint arXiv:2012.02447 (2020).
Andrex, Xiaodong, André Manoel, Romuald Menzuel, Charlie Saillard, and Chloé Simpson. "Federated survival analysis with discrete-time cox models." arXiv preprint arXiv:2006.08997 (2020).
Liu, Jianfang, Fan Lichtenberg, Katherine A. Hoadley, Liara M. Poisson, Alexander J. Lazar, Andrew Shenkler, Albrecht J. Karol, et al. "An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analyses." Cell173, no. 2 (2018): 400-416.
Ji, Zixi, Tao Lin, Xinyi Shang, and Chao Wu. "Revisiting weighted aggregation in federated learning with neural networks." In International Conference on Machine Learning, pp. 19767-19788. PMLR, 2023.
Ye, Rui, Mingkai Xu, Jinyu Wang, Chenxu Xin, Sheng Chen, and Yanfeng Wang. "Feddisco: federated learning with discrepancy-aware collaboration." In International Conference on Machine Learning, pp. 39879-39902. PMLR, 2023.
Chen, Ajit, Bertram Ng, Mengfei Cui, and Yong Xia. "Think Twice Before Selection: Federated Evidential Learning for Medical Image Analysis with Domain Shifts." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11439-11449. 2024.