Amazon SageMaker Data Wrangler is a new capability of Amazon SageMaker that makes it faster for data scientists and engineers to prepare data for machine learning (ML) applications via a visual interface. Data preparation is a crucial step of the ML lifecycle, and Data Wrangler provides an end-to-end solution to import, prepare, transform, featurize, and analyze data for ML in a seamless, visual, low-code experience. It lets you easily and quickly connect to AWS components like Amazon Simple Storage Service (Amazon S3), Amazon Athena, Amazon Redshift, and AWS Lake Formation, and external sources like Snowflake. Data Wrangler also supports standard data types such as CSV and Parquet.
Data Wrangler now additionally supports Optimized Row Columnar (ORC), JavaScript Object Notation (JSON), and JSON Lines (JSONL) file formats:
ORC – The ORC file format provides a highly efficient way to store Hive data. It was designed to overcome limitations of the other Hive file formats. Using ORC files improves performance when Hive is reading, writing, and processing data. ORC is widely used in the Hadoop ecosystem.
JSON – The JSON file format is a lightweight, commonly used data interchange format.
JSONL – JSON Lines, also called newline-delimited JSON, is a convenient format for storing structured data that may be processed one record at a time.
You can preview ORC, JSON, and JSONL data prior to importing the datasets into Data Wrangler. After you import the data, you can also use one of the newly launched transformers to work with columns that contain JSON strings or arrays that are commonly found in nested JSONs.
Import and analyze ORC data with Data Wrangler
Importing ORC data is in Data Wrangler is easy and similar to importing files in any other supported formats. Browse to your ORC file in Amazon S3 and in the DETAILS pane, choose ORC as the file type during import.
Now let’s import files in JSON format with Data Wrangler and work with columns that contain JSON strings or arrays. We also demonstrate how to deal with nested JSONs. With Data Wrangler, importing JSON files from Amazon S3 is a seamless process. This is similar to importing files in any other supported formats. After you import the files, you can preview the JSON files as shown in the following screenshot. Make sure to set the file type to JSON in the DETAILS pane.
Next, let’s work on structured columns in the imported JSON file.
To deal with structured columns in JSON files, Data Wrangler is introducing two new transforms: Flatten structured column and Explode array column, which can be found under the Handle structured column option in the ADD TRANSFORM pane.
Let’s start by applying the Explode array column transform to one of the columns in our imported data. Before applying the transform, we can see the column topping is an array of JSON objects with id and type keys.
After we apply the transform, we can observe the new rows added as a result. Each element in the array is now a new row in the resulting DataFrame.
Now let’s apply the Flatten structured column transform on the topping_flattened column that was created as a result of the Explode array column transformation we applied in the previous step.
Before applying the transform, we can see the keys id and type in the topping_flattened column.
After applying the transform, we can now observe the keys id and type under the topping_flattened column as new columns topping_flattened_id and topping_flattened_type, which are created as a result of the transformation. You also have the option to flatten only specific keys by entering the comma separated key names for Keys to flatten on. If left empty, all the keys inside the JSON string or struct are flattened.
Conclusion
In this post, we demonstrated how to import file formats in ORC and JSON easily with Data Wrangler. We also applied the newly launched transformations that allow us to transform any structured columns in JSON data. This makes working with columns that contain JSON strings or arrays a seamless experience.
As next steps, we recommend you replicate the demonstrated examples in your own Data Wrangler visual interface. If you have any questions related to Data Wrangler, feel free to leave them in the comment section.
About the Authors
Balaji Tummala is a Software Development Engineer at Amazon SageMaker. He helps support Amazon SageMaker Data Wrangler and is passionate about building performant and scalable software. Outside of work, he enjoys reading fiction and playing volleyball.
Arunprasath Shankar is an Artificial Intelligence and Machine Learning (AI/ML) Specialist Solutions Architect with AWS, helping global customers scale their AI solutions effectively and efficiently in the cloud. In his spare time, Arun enjoys watching sci-fi movies and listening to classical music.
Posted by Chelsea Finn, Research Adviser and Eric Jang, Senior Research Scientist, Robotics at Google
People can flexibly maneuver objects in their physical surroundings to accomplish various goals. One of the grand challenges in robotics is to successfully train robots to do the same, i.e., to develop a general-purpose robot capable of performing a multitude of tasks based on arbitrary user commands. Robots that are faced with the real world will also inevitably encounter new user instructions and situations that were not seen during training. Therefore, it is imperative for robots to be trained to perform multiple tasks in a variety of situations and, more importantly, to be capable of solving new tasks as requested by human users, even if the robot was not explicitly trained on those tasks.
Existing robotics research has made strides towards allowing robots to generalize to new objects, taskdescriptions, and goals. However, enabling robots to complete instructions that describe entirely new tasks has largely remained out-of-reach. This problem is remarkably difficult since it requires robots to both decipher the novel instructions and identify how to complete the task without any training data for that task. This goal becomes even more difficult when a robot needs to simultaneously handle other axes of generalization, such as variability in the scene and positions of objects. So, we ask the question: How can we confer noteworthy generalization capabilities onto real robots capable of performing complex manipulation tasks from raw pixels? Furthermore, can the generalization capabilities of language models help support better generalization in other domains, such as visuomotor control of a real robot?
In “BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning”, published at CoRL 2021, we present new research that studies how robots can generalize to new tasks that they were not trained to do. The system, called BC-Z, comprises two key components: (i) the collection of a large-scale demonstration dataset covering 100 different tasks and (ii) a neural network policy conditioned on a language or video instruction of the task. The resulting system can perform at least 24 novel tasks, including ones that require interaction with pairs of objects that were not previously seen together. We are also excited to release the robot demonstration dataset used to train our policies, along with pre-computed task embeddings.
The BC-Z system allows a robot to complete instructions for new tasks that the robot was not explicitly trained to do. It does so by training the policy to take as input a description of the task along with the robot’s camera image and to predict the correct action.
Collecting Data for 100 Tasks Generalizing to a new task altogether is substantially harder than generalizing to held-out variations in training tasks. Simply put, we want robots to have more generalization all around, which requires that we train them on large amounts of diverse data.
We collect data by teleoperating the robot with a virtual reality headset. This data collection follows a scheme similar to how one might teach an autonomous car to drive. First, the human operator records complete demonstrations of each task. Then, once the robot has learned an initial policy, this policy is deployed under close supervision where, if the robot starts to make a mistake or gets stuck, the operator intervenes and demonstrates a correction before allowing the robot to resume.
This mixture of demonstrations and interventions has been shown to significantly improve performance by mitigating compounding errors. In our experiments, we see a 2x improvement in performance when using this data collection strategy compared to only using human demonstrations.
Example demonstrations collected for 12 out of the 100 training tasks, visualized from the perspective of the robot and shown at 2x speed.
Training a General-Purpose Policy For all 100 tasks, we use this data to train a neural network policy to map from camera images to the position and orientation of the robot’s gripper and arm. Crucially, to allow this policy the potential to solve new tasks beyond the 100 training tasks, we also input a description of the task, either in the form of a language command (e.g., “place grapes in red bowl”) or a video of a person doing the task.
To accomplish a variety of tasks, the BC-Z system takes as input either a language command describing the task or a video of a person doing the task, as shown here.
By training the policy on 100 tasks and conditioning the policy on such a description, we unlock the possibility that the neural network will be able to interpret and complete instructions for new tasks. This is a challenge, however, because the neural network needs to correctly interpret the instruction, visually identify relevant objects for that instruction while ignoring other clutter in the scene, and translate the interpreted instruction and perception into the robot’s action space.
Experimental Results In language models, it is well known that sentence embeddings generalize on compositions of concepts encountered in training data. For instance, if you train a translation model on sentences like “pick up a cup” and “push a bowl”, the model should also translate “push a cup” correctly.
We study the question of whether the compositional generalization capabilities found in language encoders can be transferred to real robots, i.e., being able to compose unseen object-object and task-object pairs.
We test this method by pre-selecting a set of 28 tasks, none of which were among the 100 training tasks. For example, one of these new test tasks is to pick up the grapes and place them into a ceramic bowl, but the training tasks involve doing other things with the grapes and placing other items into the ceramic bowl. The grapes and the ceramic bowl never appeared in the same scene during training.
In our experiments, we see that the robot can complete many tasks that were not included in the training set. Below are a few examples of the robot’s learned policy.
The robot completes three instructions of tasks that were not in its training data, shown at 2x speed.
Quantitatively, we see that the robot can succeed to some degree on a total of 24 out of the 28 held-out tasks, indicating a promising capacity for generalization. Further, we see a notably small gap between the performance on the training tasks and performance on the test tasks. These results indicate that simply improving multi-task visuomotor control could considerably improve performance.
The BC-Z performance on held-out tasks, i.e., tasks that the robot was not trained to perform. The system correctly interprets the language command and translates that into action to complete many of the tasks in our evaluation.
Takeaways The results of this research show that simple imitation learning approaches can be scaled in a way that enables zero-shot generalization to new tasks. That is, it shows one of the first indications of robots being able to successfully carry out behaviors that were not in the training data. Interestingly, language embeddings pre-trained on ungrounded language corpora make for excellent task conditioners. We demonstrated that natural language models can not only provide a flexible input interface to robots, but that pretrained language representations actually confer new generalization capabilities to the downstream policy, such as composing unseen object pairs together.
In the course of building this system, we confirmed that periodic human interventions are a simple but important technique for achieving good performance. While there is a substantial amount of work to be done in the future, we believe that the zero-shot generalization capabilities of BC-Z are an important advancement towards increasing the generality of robotic learning systems and allowing people to command robots. We have released the teleoperated demonstrations used to train the policy in this paper, which we hope will provide researchers with a valuable resource for future multi-task robotic learning research.
Acknowledgements We would like to thank the co-authors of this research: Alex Irpan, Mohi Khansari, Daniel Kappler, Frederik Ebert, Corey Lynch, and Sergey Levine. This project was a collaboration between Google Research and the Everyday Robot Project. We would like to give special thanks to Noah Brown, Omar Cortes, Armando Fuentes, Kyle Jeffrey, Linda Luu, Sphurti Kirit More, Jornell Quiambao, Jarek Rettinghouse, Diego Reyes, Rosario Jau-regui Ruano, and Clayton Tan for overseeing robot operations and collecting human videos of the tasks, as well as Jeffrey Bingham, Jonathan Weisz, and Kanishka Rao for valuable discussions. We would also like to thank Tom Small for creating animations in this post and Paul Mooney for helping with dataset open-sourcing.
We built a neural theorem prover for Lean that learned to solve a variety of challenging high-school olympiad problems, including problems from the AMC12 and AIME competitions, as well as two problems adapted from the IMO.[1] The prover uses a language model to find proofs of formal statements. Each time we find a new proof, we use it as new training data, which improves the neural network and enables it to iteratively find solutions to harder and harder statements.
We achieved a new state-of-the-art (41.2% vs 29.3%) on the miniF2F benchmark, a challenging collection of high-school olympiad problems. Our approach, which we call statement curriculum learning, consists of manually collecting a set of statements of varying difficulty levels (without proof) where the hardest statements are similar to the benchmark we target. Initially our neural prover is weak and can only prove a few of them. We iteratively search for new proofs and re-train our neural network on the newly discovered proofs, and after 8 iterations, our prover ends up being vastly superior when tested on miniF2F.
Formal mathematics is an exciting domain to study because of (i) its richness, letting you prove arbitrary theorems which require reasoning, creativity and insight and (ii) its similarity to games—where AI has been spectacularly successful—in that it has an automated way of determining whether a proof is successful (i.e., verified by the formal system). As demonstrated in the trivial example below, proving a formal statement requires generating a sequence of proof steps, each proof step consisting in a call to a tactic.[2] These tactics take mathematical terms as arguments and each tactic call will transform the current statement to prove, into statements that are easier to prove, until nothing is left to prove.
Problem 1
Adapted from AMC12 2000 Problem 5
Prove that if $|x – 2| = p$, where $x < 2$, then $x – p = 2 – 2p$.
theoremamc12_2000_p5-- ← theorem name
(x p : ℝ) -- ← the statement we want
(h₀ : x < 2) -- to prove
(h₁ : abs (x - 2) = p) :
x - p = 2 - 2 * p :=
begin-- ← formal proof starts here-- This first tactic requires that the prover invent-- the term: `abs (x - 2) = -(x - 2)`.have h₂ : abs (x - 2) = -(x - 2), {apply abs_of_neg,linarith,},rw h₁ at h₂,-- At this stage the remaining goal to prove is:-- `x - p = 2 - 2 * p` knowing that `p = -(x - 2)`.linarith,end
Since $x < 2$, $|x – 2| = -(x – 2)$. Using $p = |x – 2|$ we have $x = 2-p$ and finally $x – p = x – 2 – 2p$.
We observe that the capability to generate original mathematical terms required as arguments of tactics, which cannot be done without a neural language model, emerges from our training procedure. The proof below is an example of it: the proof step use n + 1 (entirely generated by our models) proposes to use n + 1 as a solution, the rest of the formal proof relying on the ring_exp tactic to verify that it is indeed valid.
Problem 2
Adapted from AMC12B 2020 Problem 6
For all integers $n ≥ 9$, prove that $((n + 2)! −(n + 1)!) / n!$ is a perfect square.
theoremamc12b_2020_p6
(n : ℕ)
(h0 : 9 ≤ n) :
∃ x : ℕ, (x:ℝ)^2 =
(nat.factorial (n + 2) - nat.factorial (n + 1))
/ nat.factorial n :=
begin-- The model directly proposes `n + 1` as solution.use n + 1,field_simp [nat.factorial_ne_zero, pow_succ'],ring_expend
Factoring $(n+1)$ we get: $(n+1)(n+2-1) = (n+1)^2$ which concludes the proof.
We also observe that our models and search procedure are capable of producing proofs that chain multiple non-trivial reasoning steps. In the proof below, the model starts by using contraposition leading to the existential statement (∃ (x : ℝ), f x ≠ a * x + b). It then generates a witness for it with use (0 : ℝ) and finishes the proof by leveraging the norm_num tactic.
Problem 3
Adapted from the MATH dataset
Let $f(x) = Ax + B$ and $g(x) = Bx + A$, where $A ne B$. If $f(g(x)) – g(f(x)) = B – A$, prove that $A + B = 0$.
theoremmathd_train_algebra_217
(a b : ℝ)
(f g : ℝ → ℝ)
(h₀ : ∀ x, f x = a * x + b)
(h₁ : ∀ x, f x = b * x + a)
(h₂ : a ≠ b)
(h₃ : ∀ x, f (g x) - g (f x) = b - a) :
a + b = 0 :=
beginrevert h₀ h₁ h₂ h₃,-- Initial contraposition.contrapose!,rintro ⟨h₀, ⟨h₁, h₂⟩⟩,-- The model proposes `0` as witness for the current-- goal that consists in `∃ (x : ℝ), f x ≠ a * x + b`.use (0 : ℝ),simp only [sub_eq_iff_eq_add, h₀, mul_zero, zero_add],norm_num at h₀,end
First we find that:
$$f(g(x)) = A(Bx + A) + B = ABx + A^2 + B$$$$g(f(x)) = B(Ax + B) + A = ABx + B^2 + A$$
Now we plug this back in $f(g(x)) – g(f(x)) = B – A$ and get:
$$(ABx + A^2 + B) – (ABx + B^2 + A) = B – A$$
That is:
$$A^2 – B^2 + B – A = B – A$$
Hence:
$$A^2 – B^2 = (A-B)(A+B) = 0$$
Since we are given that $A ne B$, necessarily, $A + B = 0$.
Our models, trained with statement curriculum learning, were able to close a variety of problems from training textbooks as well as AMC12 and AIME competitions, and 2 problems adapted from the IMO. We present below three examples of such generated proofs.
Problem 4
Adapted from IMO 1964 Problem 2
Suppose $a$, $b$, $c$ are the sides of a triangle.
Prove that $a^2(b + c − a) + b^2(c + a − b) + c^2(a + b − c) leq 3abc$.
theoremimo_1964_p2
(a b c : ℝ)
(h₀ : 0 < a ∧ 0 < b ∧ 0 < c)
(h₁ : c < a + b)
(h₂ : b < a + c)
(h₃ : a < b + c) :
a^2 * (b + c - a) + b^2 * (c + a - b) + c^2 * (a + b - c)
≤ 3 * a * b * c :=
begin-- Arguments to `nlinarith` are fully invented by our model.nlinarith [sq_nonneg (b - a),sq_nonneg (c - b),sq_nonneg (c - a)]end
Rearrange to get $a(a-b)(a-c) + b(b-a)(b-c) + c(c-a)(c-b) >= 0$ which is true by Schur’s inequality.
Problem 5
Adapted from AIME 1984 Problem 1
Prove that $a2 + a4 + a6 + a8 + …+ a98 = 93$ if $a1$, $a2$, $a3…$ is an arithmetic progression with common difference $1$, and $a1 + a2 + a3 + … + a98 = 137$.
theoremaime_1984_p1
(u : ℕ → ℚ)
(h₀ : ∀ n, u (n + 1) = u n + 1)
(h₁ : ∑ k in finset.range 98, u k.succ = 137) :
∑ k in finset.range 49, u (2 * k.succ) = 93 :=
beginrw finset.sum_eq_multiset_sum,dsimp [finset.range] at h₁,simp [h₀],ring,norm_num at h₁,norm_num,apply eq_of_sub_eq_zero,{ simp only [*, abs_of_pos, add_zero] at *, linarith },end
For $n geq 1$ we have $a(2n-1) = a(2n)-1$. Substituting this into the equation given, we get:
Adapted from IMO Longlist 1990 Problem 77[3]
For $a, b, c$ reals, prove that $(a^2 + ab + b^2)(b^2 + bc + c^2)(c^2 + ca + a^2) geq (ab + bc + ca)^3$.
theoremimo_longlist_1990_p77
(a b c : ℝ) :
(a * b + b * c + c * a)^3 ≤
(a^2 + a * b + b^2) * (b^2 + b * c + c^2) *
(c^2 + c * a + a^2) :=
begin-- The three initial steps use Cauchy–Schwarz to prove-- `(a * b + b * c) ^ 2 ≤ (a ^ 2 + b ^ 2) * (b ^ 2 + c ^ 2)`-- which is required for the final call to `nlinarith`.let u : euclidean_space ℝ (fin 2) := ![a, b],let v : euclidean_space ℝ (fin 2) := ![b, c],have h₀ := real_inner_mul_inner_self_le u v,simp [u, v, fin.sum_univ_succ, ←pow_two, ←pow_two, le_of_lt, mul_assoc] at h₀,-- The model introduces another required cut (i.e. invent-- the term `0 ≤ (c + a) * (c + a)` and proves it).have h₃ : 0 ≤ (c + a) * (c + a),{ nlinarith, },have h₄ := sq_nonneg (a * b + b * c + c * a),simp [sq, h₀, h₃, mul_add, add_mul] at h₄ ⊢,nlinarith [sq_nonneg (b - a),sq_nonneg (c - b),sq_nonneg (a - c)]end
After cancelling terms appearing on both sides, we are left to prove that:
After multiplying both sides by $2$, we can rearrange the above inequality to:
$$0 leq sum_{cyc} (a^2b + a^2c – b^2c)^2$$
which clearly holds, giving the claim.
Formal mathematics involves two main challenges that make a naive application of reinforcement learning unlikely to succeed.
(i) Infinite action space: not only does formal mathematics have an extremely large search space (like Go for example), it also has an infinite action space. At each step of a proof search, the model must choose not from a well-behaved finite set of actions, but a complex and infinite set of tactics, involving exogenous mathematical terms that have to be generated (e.g., generating a mathematical statement to be used as a witness, an object used in steps such as “there exists an $x$ s.t. …”, or a cut, the introduction and the chaining of a lemma in the middle of a proof).
(ii) Lack of self-play: conversely to 2-player games, a prover is not playing against an opponent but against a set of statements to prove. When faced with a statement that is just too hard, there is no obvious reframing that will let the prover generate intermediary easier statements to tackle first. This asymmetry prevents naive application of the self-play algorithms that were successful with 2-player games.
In our work, we address the infinite action space problem by sampling actions from a language model as we search for a proof. Language models have the capability to generate the tactic calls as well as the original mathematical terms often required as arguments. Our basis for addressing the lack of self-play is the observation that the key role of self-play in 2-player games is to provide an unsupervised curriculum. Our methodology proposes to replace this unsupervised curriculum with an auxiliary set of problem statements (without requiring proofs) of varying difficulty. We empirically show that, when the difficulty of these auxiliary problems is varied enough, our training procedure is able to solve a curriculum of increasingly difficult problems, eventually generalizing to the set of problems we care about.
While these results are extremely exciting, as they demonstrate that deep learning models are capable of non-trivial mathematical reasoning when interacting with a formal system, we are still very far from best-student performance on these competitions, only occasionally, rather than consistently, closing challenging olympiad problems. We hope nonetheless that our work will motivate research in this domain, in particular towards the IMO Grand Challenge and that the statement curriculum learning methodology we propose will help accelerate progress in automated reasoning in general.
Sessions on multidevice scenarios, inclusive and fair speech technologies, trustworthy speech processing, and speech intelligibility prediction seek paper submissions.Read More
As the pandemic overburdens medical facilities and clinicians become increasingly overworked, the ability to make quick decisions on providing the best possible treatment is even more critical. In urgent health situations, such decisions can mean life or death. However, certain treatment protocols can pose a considerable risk to patients who have serious medical conditions and can potentially contribute to unintended outcomes.
To build our model, we decided to use reinforcement learning—an ML framework that’s uniquely well-suited for advancing safety-critical domains such as healthcare. This is because at its core, healthcare is a sequential decision-making domain, and reinforcement learning is the formal paradigm for modeling and solving problems in such domains. In healthcare, clinicians base their treatment decisions on an overall understanding of a patient’s health; they observe how the patient responds to this treatment, and the process repeats. Likewise, in reinforcement learning, an algorithm, or agent, interprets the state of its environment and takes an action, which, coupled with the internal dynamics of the environment, causes it to transition to a new state, as shown in Figure 1. A reward signal is then assigned to account for the immediate impact of this change. For example, in a healthcare scenario, if a patient recovers or is discharged from the intensive care unit (ICU), the agent may receive a positive reward. However, if the patient does not survive, the agent receives a negative reward, or penalty.
Reinforcement learning is widely used in gaming, for example, to determine the best sequence of chess moves and maximize an AI system’s chances of winning. Over time, due to trial-and-error experimentation, the desired actions are maximized and the undesired ones are minimized until the optimal solution is identified. Normally, this experimentation is made possible by the proactive collection of extensive amounts of diverse data. However, unlike in gaming, exploratory data collection and experimentation are not possible in healthcare, and our only option in this realm is to work with previously collected datasets, providing very limited opportunities to explore alternative choices. This is where offline reinforcement learning comes into focus. A subarea of reinforcement learning, offline reinforcement learning works only with data that already exists—instead of proactively taking in new data, we’re using a fixed dataset. Even so, to propose the best course of action, an offline reinforcement learning algorithm still requires sufficient trial-and-error with alternatives, and this necessitates a very large dataset, something not feasible in safety-critical domains with limited data, like healthcare.
In the current research literature, when reinforcement learning is applied to healthcare, the focus is on what to do to support the best possible patient outcome, an infeasible objective. In our paper, we propose inverting this paradigm in offline settings to investigate high-risk treatments and identify when the state of patients’ health reaches a critical point. To enable this approach, we developed a methodology called Dead-end Discovery (DeD), which identifies treatments to avoid in order to prevent a medical dead-end—the point at which the patient is most likely to die regardless of future treatment. DeD provably requires exponentially less data than the standard methods, making it significantly more reliable in limited-data situations. By identifying known high-risk treatments, DeD could assist clinicians in making trustworthy decisions in highly stressful situations, where minutes count. Moreover, this methodology could also raise an early warning flag and alert clinicians when a patient’s condition reveals outstanding risk, often before it becomes obvious. We go into more detail on the DeD methodology later in this post.
Medical dead-ends and rescue states
At ICUs, patients experience a trajectory which sequentially tracks the state of their health. It starts with the patient’s condition upon admission, followed by the administration of treatment and then by their response to the treatment. This sequence repeats until the patient reaches a terminal state—the final observation of the patient’s condition that’s still relevant within the ICU. To learn what treatments to avoid, we focus on two types of terminal states: patient recovery and patient death. Other terminal states can also exist. For example, when playing chess, a loss or a win are not the only possible outcomes; draws can also occur. While our framework can encompass additional terminal states, this work focuses on only two possibilities: positive outcomes and negative outcomes.
Building on these two terminal states, we define medical dead-ends as patient states from which all possible future trajectories will lead to the terminal state of the patient’s death. If applied in acute care settings, it’s critical to both avoid medical dead-ends and identify the probability with which any selected treatment will lead to them. It’s also important to note that medical dead-ends can occur considerably earlier than clinicians are able to observe. This makes DeD particularly valuable, as every hour counts when it comes to critical conditions.
To contrast with medical dead-ends, we also propose the concept of rescue states, where recovery is fully reachable. At each rescue state, there exists at least one treatment that would lead, with the probability of 1, either to another rescue state or to recovery. In most cases, a patient’s condition is neither a medical dead-end nor a rescue state, as the minimum and maximum probability of future mortality or recovery is not always 0 and 1, but somewhere in between. Therefore, it’s important to have an alert when a patient is likely to enter a medical dead-end.
Treatment security: How to help doctors
To develop our model, we considered a generic condition that guarantees the merit and reliability of a given treatment-selection policy. In particular, we postulated the following condition we called treatment security:
For example, if a certain treatment leads to a medical dead-end or immediate death with a probability of more than 80 percent, that treatment should be selected for administration no more than 20 percent of the time.
While treatment security is a desired property, it’s not easy to directly enforce because the required probabilities are not known a priori, nor are they directly measurable from the data. Therefore, we developed a theoretical framework at the core of our method that enables treatment security from data by mapping it to proper learning problems.
DeD: Dead-end Discovery methodology
To precisely define the learning problems, we based our DeD methodology on three core ideas: 1) separating the outcomes, 2) learning the optimal value function of each outcome in isolation without discounting, and 3) proving important properties for these particular value functions, which enable treatment security.
We constructed two simple reward signals for independent learning problems:
-1 in the case of a negative outcome; 0 at all other transitions
+1 in the case of a positive outcome; 0 at all other transitions
Next, we learned their corresponding optimal value functions, (Q_{D}^{*}(s, a)) and (Q_{R}^{*}(s, a)) both with no discounting. It turns out that these value functions are intrinsically important. In fact, we show that:
Moreover, the quantity (1 + Q_{D}^{*}(s, a)) proves to be a meaningful threshold for a policy to make it secure. We formally show that: for treatment security, it is sufficient to abide by the maximum hope of recovery.
We further proved that if the probability of treatment selection can be higher than (Q_{R}^{*}(s, a)), the patient is guaranteed to remain in a rescue state when possible. Finally, we also showed that such thresholds for limiting the treatment selection probabilities exist.
Building from these results, we defined a training and deployment pipeline, illustrated in Figure 3.
Applying the DeD methodology to sepsis
To demonstrate the utility of DeD in safety-critical domains and to honor the underlying healthcare motivations behind its development, we applied DeD on publicly available real-world medical data. Specifically, our data pertained to critically ill patients who had developed sepsis and were treated in an ICU.
Sepsis is a syndrome characterized by organ dysfunction due to a patient’s dysregulated response to an infection. In the United States alone, sepsis is responsible for more than 200,000 deaths each year, contributing to over 10 percent of in-hospital mortality, and accounting for over $23 billion in hospitalization costs. Globally, sepsis is a leading cause of mortality, with an estimated 11 million deaths each year, accounting for almost 20 percent of all deaths. It’s also an end-stage to many health conditions. In a recent retrospective study of hospitalized COVID-19 patients, all the fatal cases and more than 40 percent of survivors were septic.
In our study, we envisioned a way to help clinicians identify which subset of treatments could statistically cause further health deterioration so that they could eliminate them when deciding on the next steps. To estimate the value functions of possible treatments, we used the publicly available Medical Information Mart for Intensive Care III (MIMIC-III) dataset (v 1.4), sourced from the Beth Israel Deaconess Medical Center in Boston, Massachusetts. MIMIC-III is comprised of deidentified electronic health records (EHR) of consenting patients admitted to critical care units, collected from 53,423 distinct hospital admissions between 2001 and 2012. Following standard extraction and preprocessing methods, we derived an experimental cohort of 19,611 patients who are presumed to have developed sepsis during their initial admission to the ICU, with an observed mortality rate of approximately 10 percent. We studied 72 hours of the patients’ stay at the ICU—24 hours before the presumed onset of sepsis and 48 hours afterwards. We used 44 observation variables, including various health records and demographic information, and 25 distinct treatment options (five discrete levels for IV fluid and vasopressor volumes in combination), aggregated over four hours.
With this dataset, we sought to demonstrate that medical dead-ends exist in medical data and show the effect of treatment selection on the development of medical dead-ends. We also sought to identify whether alternative treatments were available that could have prevented the occurrence of a medical dead-end.
To flag potentially nonsecure treatments, we examined whether the values estimated ((Q_{D}(s, a)) and (Q_{R}(s, a))) for each treatment passed certain thresholds. To flag potential medical dead-end states, we looked at the median values of available treatments against these same thresholds. Using the median helped mitigate approximation errors due to generalization from potentially insufficient data and extrapolations made by the reinforcement learning formulation. With the specified thresholds, DeD identified increasing percentages of patients raising fatal flags, particularly among the subpopulation that died in the hospital. In Figure 4, note the distinctive difference between the trend of estimated values for surviving and non-surviving patients. Over the course of 72 hours in the ICU, surviving patients rarely raised a flag, while flags were raised at an increased rate for patients who did not survive as they proceeded toward the final observations of their time in the ICU.
To further support our hypothesis that medical dead-ends exist among septic patients and may be preventable, we aligned patients according to the point in their care when a flag was first raised by our DeD framework. As shown in Figure 5, we selected all trajectories with at least 24 hours prior to and 16 hours after this flag. The DeD estimates of (V) and (Q) values for administered treatments had similar behavior in both the surviving and non-surviving subpopulations prior to this first flag, but the values quickly diverged afterwards. We observed that the advent of this first flag also corresponded to a similar divergence among various clinical measures and vital signs, shown in Figure 5, sections a and b.
DeD identified a clear critical point in these patients’ care, where non-surviving patients experienced an irreversible negative change to their health, as shown in Figure 5, section c. Additionally, there was a significant gap in the estimated value between the treatments administered to the non-surviving patients and those treatments deemed to be more secure by DeD, shown in Figure 5, section e. There was a clear inflection in the estimated values four to eight hours before this first flag was raised, shown in Figure 5, section c.
Further analysis of our results, which we describe in detail in our paper, indicates that more than 12 percent of treatments given to non-surviving patients could be detrimental 24 hours before death. We also identified that 2.7 percent of non-surviving patients entered medical dead-end trajectories with a sharply increasing rate up to 48 hours before death, and close to 10 percent when we slightly relaxed our thresholds for predicting medical dead-ends. While these percentages may seem small, more than 200,000 patients die of sepsis every year in US hospitals alone, and any reduction of this rate would result in possibly tens of thousands of individuals who would otherwise survive. We’re excited about the possibility that DeD could help clinicians provide their patients with the best care and that many more patients could potentially survive sepsis.
Looking ahead: Further uses of DeD and offline reinforcement learning
We view DeD as a powerful tool that could magnify human expertise in healthcare by supporting clinicians with predictive models as they make critical decisions. There is significant potential for researchers to use the DeD method to expand on this research and look at other measures, such as the relationship between patient demographics and sepsis treatment, with the goal of preventing certain treatment profiles for particular subgroups of patients.
The principles of offline reinforcement learning and the DeD methodology can also be applied to other clinical conditions, as well as to safety-critical areas beyond healthcare that also rely on sequential decision-making. For example, the domain of finance entails similar core concepts as it is analogously based on sequential decision-making processes. DeD could be used to alert financial professionals when specific actions, such as buying or selling certain assets, are likely to result in unavoidable future loss, or a financial dead-end. We hope our work will inspire active research and discussion in the community. You can learn more about the research and access the code here.
From active noise cancellation to digital assistants that are always listening for your commands, audio is perhaps one of the most important but often overlooked aspects of modern technology in our daily lives.
Audio Analytic has been using machine learning that enables a vast array of devices to make sense of the world of sound.
We spoke with Dr. Chris Mitchell, CEO and founder of Audio Analytic about the challenges, and the fun, involved in teaching machines to listen.
Pindar Van Arman, an American artist and roboticist, designs painting robots that explore the differences between human and computational creativity. Since his first system in 2005, he has built multiple artificially creative robots. The most famous, Cloud Painter, was awarded first place at Robotart 2018.
Steven Frank is a partner at the law firm Morgan Lewis, specializing in intellectual property and commercial technology law. He’s also half of the husband-wife team that used convolutional neural networks to authenticate artistic masterpieces, including Da Vinci’s Salvador Mundi, with AI’s help.
Researchers in the Department of Anthropology at Northern Arizona University are using GPU-based deep learning algorithms to categorize sherds — tiny fragments of ancient pottery.
Make the AI Podcast Better
Have a few minutes to spare? Fill out this listener survey. Your answers will help us make a better podcast.
imodels: A python package with cutting-edge techniques for concise, transparent, and accurate predictive modeling. All sklearn-compatible and easy to use.
Recent machine-learning advances have led to increasingly complex predictive models, often at the cost of interpretability. We often need interpretability, particularly in high-stakes applications such as medicine, biology, and political science (see here and here for an overview). Moreover, interpretable models help with all kinds of things, such as identifying errors, leveraging domain knowledge, and speeding up inference.
Despite new advances in formulating/fitting interpretable models, implementations are often difficult to find, use, and compare. imodels (github, paper) fills this gap by providing a simple unified interface and implementation for many state-of-the-art interpretable modeling techniques, particularly rule-based methods.
We built a neural theorem prover for Lean that learned to solve a variety of challenging high-school olympiad problems, including problems from the AMC12 and AIME competitions, as well as two problems adapted from the IMO.OpenAI Blog