Turning Machine Intelligence Against Lung Cancer

The Challenge

In 2016 – 2017, competitors used anonymized, high-resolution lung scans from hundreds of patients provided by the National Cancer Institute (NCI), to create algorithms that can improve lung cancer screening technology. The participants created algorithms that can accurately determine when lesions in the lungs are cancerous and thereby dramatically decreasing the false positive rate of current low-dose CT technology.

Cancer Moonshot

In the U.S., cancer will strike two in every five people in their lifetimes. But it affects all of us. That’s why, in 2016, the office of the Vice President announced the Cancer Moonshot. It’s an audacious effort to make a decade’s worth of progress in cancer prevention, diagnosis, and treatment in just five years.

The 2017 Data Science Bowl will pursue one of the Cancer Moonshot’s key goals: unleashing the power of data against this deadly disease. Presented by Booz Allen and Kaggle, the competition will convene the data science and medical communities to develop cancer detection algorithms, and help end the disease as we know it.

The Lung Cancer Detection Challenge

Lung cancer is one of the most common types of cancer, with nearly 225,000 new cases of the disease expected in the U.S. in 2016.

Early detection is critical, as it opens a range of treatment options not available when cancer is detected at later, more advanced stages. Low-dose computed tomography (CT) is a potential breakthrough technology for early detection, with the ability to reduce deaths by 20%.1 Often, suspicious lesions identified in screening are initially assessed as high risk of cancer, but after additional follow-up tests, they turn out to be non-cancerous (false positives from the initial screening).2 Can machine learning reduce the number of radiology exams flagged for potentially unnecessary follow up and avoid patient anxiety?

1Aberle DR, Adams AM, Berg CD, et al.: Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 365 (5): 395-409, 2011.
2Low-Dose CT has historically resulted in high false positive rates of around 25% (Aberle, et. al., New England J Med, 2011, 365:395-409).

The Prize

The Data Science Bowl awarded a total prize purse of $1 million, provided by the Laura and John Arnold Foundation, to those who observed the right patterns, asked the right questions, and in turn, created unprecedented impact around this high-priority issue.

Trophy Icon First Place

$500,000

1st Place

Trophy Icon Second Place

$200,000

2nd Place

Trophy Icon Third Place

$100,000

3rd Place

Medal Award

$25,000

4th-10th Place

In addition, $5,000 was awarded to each of the top three most highly voted Kernels (Total of $15,000) and $10,000 in prizes awarded to three random drawing winners for sharing their Data Science Bowl journey on social media.

2017 Partners

National Cancer Institute

Data Science Bowl Sponsor (cancer.gov)

Keyvan Farahani

Keyvan Farahani

Program Director

read bio

Dr. Keyvan Farahani is the Program Director for Image-Guided Interventions (IGI), Cancer Imaging Program, National Cancer Institute (NCI).  In this capacity, he is responsible for the development of NCI research initiatives that address diagnosis and treatment of cancer through integration of advanced imaging and minimally invasive interventions, including nanotechnologies.  He has led NCI initiatives in Oncologic IGI focused on small business development, early phase clinical trials, and image-guided drug delivery research.  Since 2013, in collaboration with national and international academic groups, he has led organization of many computational challenges related to imaging, digital pathology, and radiomics of cancer, conducted through international scientific societies. He Chairs the NCI Quantitative Imaging Network’s Task Force on Challenge and Collaborative Projects.  Dr. Farahani obtained his PhD in Biomedical Physics from the University of California at Los Angeles in 1993.

Paul F. Pinsky

Paul F. Pinsky

Chief of Early Detection Research

read bio

Paul Pinsky is currently chief of the Early Detection Research Group, Division of Cancer Prevention, National Cancer Institute, NIH. Two large cancer screening trials sponsored by the branch have recently been completed, the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial and the National Lung Screening Trial (NLST), with continued follow-up ongoing. The branch has a grant portfolio related to cancer screening and early detection and advises on clinical trial design and evaluation of evidence in the cancer screening field.

Dr. Pinsky received an M.P.H. in epidemiology from Columbia University and a Ph.D. in applied mathematics from the University of Maryland, College Park. He has previously worked at the Centers for Disease Control and the Food and Drug Administration. He has published over 30 first-authored papers in cancer prevention and related fields.

Radboud University Medical Center

Diagnostic Image Analysis Group

Dr. Bram van Ginneken

Dr. Bram van Ginneken

Lead Scientist

read bio

Bram leads the Diagnostic Image Analysis Group at Radboud University Medical Center in Nijmegen, The Netherlands. He has twenty years of experience on machine learning with medical image data, and has published numerous papers on automated analysis of chest CT scans for lung cancer screening. He has a Master’s degree in Physics and a PhD in medicine from Utrecht University. Since 2007 he has been involved in organizing over fifteen medical image analysis challenges.

2017 Sponsors

A special thank you for the National Cancer Institute for its support with this year’s data challenge.

2017 Supporting Organizations

Breathe in the future. Breathe out the past.

Submit Your Ideas

Help us build an unprecedented future.

We are searching for the next Data Science Bowl challenge—a problem with the potential to change the world. If selected, the power of the entire data science community will be harnessed against it.

Contact us to submit your ideas or email DataScienceBowl@bah.com. Include an overview of the problem, your contact information, a brief description of the data, and where it can be obtained.