Medical videos capture dynamic information of motion, motion, and disturbance, which aid in disease diagnosis and understanding. Common examples of medical videos include cardiac ultrasound to assess cardiac motion, endoscopies to screen for gastrointestinal cancer, nature videos to track human behaviors in population health, and microscopy to understand cellular interactions.
In addition, deep learning for medical video analysis is developing rapidly and is very good. Ability to extract actionable insights from these rich complex data. Here we provide an overview of deep learning approaches to perform segmentation, object tracking, and motion analysis from medical videos. Using cardiac ultrasound and cellular microscopy as case studies, we highlight the unique challenges of working with videos versus more standard models used on still images. We further discuss the available video data sets that can be found as good training sets and benchmarks. We conclude by discussing future directions for the field, along with recommendations for practitioners.
Impressed by the amazing, near-human-level accuracy in image classification, the researchers sought to apply similar deep learning algorithms to medical imaging tasks. From ophthalmologists to images of skin lesions and retinal images in chest X-rays and mammograms, deep learning approaches have been adapted to static medical image datasets. Although these are complex datasets, medical videos contain a much richer set of data, capturing movements and behaviors that still cannot be detected by images.
In one particularly striking example, the cardiovascular system contains many dynamic structures, with the heart muscle, heart valves, and blood movement providing important diagnostic information that is not yet captured in images. Deep learning for medical videos is much less developed than deep learning for images, although it has great potential to make an impact. In this chapter, we review important developments in computer vision and deep learning in video tasks and highlight applications in the medical machine learning literature. We discuss case studies ranging from ultrasound of the heart (echocardiogram) to video microscopy, highlighting ways to understand dynamic systems and techniques for dealing with complex data sets.
Videos contain additional temporal information compared to still images, and many tasks require this information for comprehension and understanding. While any individual video frame can provide location and context information, many behaviors and movements require the understanding of temporal information. For example, while a still image may be sufficient to identify a door, video is required to understand whether an action consists of “closing a door” or “opening a door.”
Biological behavior is very complex, often made up of similar actors but different movements or actions that dictate the task at hand. Actions such as “patting the head” versus “braiding the hair” can appear visually similar in a still image, but temporal information encoded in video data can easily distinguish between different behaviors. Datasets such as Kinetics, 18 HMDB, 19 and UCF10119, 20 have been designed for the purpose of computer vision examination of human behavioral videos.
There are also several forms of advanced medical imaging that capture motion in order to diagnose disease. An example of understanding movement for medical diagnosis is imaging of the heart to detect cardiovascular disease.
The heart is a very dynamic organ, with movement in every heartbeat and often a significant difference even between pulse and beat. While the heart can be imaged through many modalities — including ultrasound (echocardiogram), computed tomography (CT), or magnetic resonance imaging (MRI) — modalities that have lower temporal resolution often require collection information, taking advantage of the cyclical nature. Therefore, myocardial abnormalities or abnormalities in heart valve function can be easily detected with multiple imaging methods, but all cardiac imaging methods include temporal information.
In medicine, the characterization of organ systems through medical imaging often relies on similar methods to understand the relationship between medical imaging and disease characteristics. For example, in the case of solid organ cancers, such as prostate cancer and lung cancer, radiologists take a long time to understand the size and distribution of the tumors. Tumors are often distinct and easily identifiable. However, clinical workflow has a significant amount of subjectivity and human variation in how it is measured.
In medicine, characterizing organ systems through medical imaging often relies on similar approaches to understanding the relationship between medical imaging and disease characteristics. For example, in the case of solid organ cancers, such as prostate cancer and lung cancer, it takes radiologists a long time to understand the size and distribution of the tumors.
The tumors are often distinct and easily identifiable. However, clinical workflow has a significant amount of subjectivity and human variation in how tumor dimensions and characteristics are measured. Such tasks are crucial in differentiating between inactive disease and progressive disease; However, human variation can lead to underdiagnosis of subtle changes in tumor burden and overlooking of small but meaningful changes in disease status.