HANDFUL: Sequential Grasp-Conditioned Dexterous Manipulation with Resource Awareness

HANDFUL tackles sequential grasp-conditioned dexterous manipulation tasks by treating finger use and in-hand space as a limited resource: learning resource-aware one and two-finger grasps and then leveraging the leftover fingers and in-hand space from grasping for downstream objectives like pressing a button, pulling a drawer, twisting a knob and pushing/picking up a second object.

Video

Abstract

Dexterous robot hands offer rich opportunities for multifunctional manipulation, where a robot must execute multiple skills in sequence while maintaining control over previously grasped objects. Most prior work in dexterous manipulation focuses on single-object, single-skill tasks. In contrast, our insight is that many sequential tasks require resource-aware grasps that conserve fingers for future actions. In this paper, we study sequential grasp-conditioned dexterous manipulation, where a robot first grasps an object and then performs a second, distinct manipulation subtask while preserving the initial grasp. We introduce HANDFUL, a learning framework that models finger usage as a limited resource and encourages exploration of resource-aware intermediate grasp states through finger-level contact rewards. These grasps are subsequently specialized for downstream tasks via curriculum-based policy learning. We further propose HANDFUL-Bench, a simulation benchmark that introduces sequential dexterous manipulation tasks across multiple second-subtask objectives, including pushing, pulling, and insertion, under a shared grasp-conditioned setup. Extensive simulation results demonstrate that encouraging resource-aware intermediate grasps and preserving them during downstream manipulation improves second-subtask success and robustness compared to a baseline that greedily optimizes the initial grasp before attempting the second subtask. We additionally validate our approach on a real dexterous LEAP hand. Together, this work establishes resource-aware grasp planning as a key principle for multifunctional dexterous manipulation.

Method Overview

Interpolate start reference image.

A. We train set of nine resource-aware one and two-finger grasping policies that leave fingers free for our diverse suite of second-subtasks.

B. We then use a curriculum-based method to select and train the best grasps for each second-subtask environment, removing grasps that perform poorly in the initial, limited randomization stages.

C. Finally, taking the best grasp for each task and its corresponding second-subtask policy, we roll out these policies and collect trajectories in each of our grasp + second-subtask environments. Then, to transfer to real, we first fix the position of the second-subtask object and then replay the trajectory whose first block position best matches the current position of the block in the real world.

Task Suite

We test HANDFUL with HANDFUL-Bench which includes five sequential grasp-conditioned dexterous manipulation tasks where the robot must grasp and hold an object while completing a second subtask. We design the second subtasks to require different combinations of fingers and in-hand space to complete, therefore motivating our exploration of resource-aware grasping styles.

On the left we show sample rollouts of our tasks in the real world. On the right we show our simulation environments and their randomization levels. Not pictured are the randomizations to the grasped block (red block) dimensions or object/robot properties like friction, damping, and mass.

Grasp + Push Object

Grasp the red block, then push the blue block to a target position and rotation (ghosted blue).

Grasp + Press Button

Grasp the red block, then press the button within the beige donut.

Grasp + Twist Knob

Grasp the red block, then twist the knob a target number of degrees.

Grasp + Pull Drawer

Grasp the red block, then pull the cabinet drawer a target distance.

Grasp + Pick Second

Grasp the red block, then grasp and pick up the green block.

Curriculum Overview

Interpolate start reference image.

Each curriculum stage C0 → C1 → C2 progressively increases the second-task object's position randomization and domain randomizations (mass, friction, etc.). We intentionally keep early stages simple so that all nine candidate grasps can quickly learn the core second-subtask strategy. This lets us identify and eliminate poor-performing grasps early before committing to expensive training under harder conditions. Only the top-performing grasps (and their corresponding second-subtask policies) survive to each subsequent stage. Note that the first grasped object (red block) has a fixed ±10cm position randomization during the universal first grasping subtask, which is not depicted here.

Grasp Types

Explore our resource-aware one and two-finger grasp strategies. For each grasp strategy, there are 3 hand poses visualized. Each pose here is from the same grasping policy, but with a different initial start and goal position for the red block.

Click and drag to rotate · Right-click to pan · Scroll to zoom

Block Position Diversity

With each diverse grasp strategy come different block position distributions within the hand. We collect 16384 different hand + block pose intermediate states for each grasp strategy to initialize our second subtask policies. Each dot here represents the block's center position in relation to the hand for one of these intermediate states.

With diverse distributions of block positions for different grasp strategies, each grasp type leaves different parts of the hand free for our diverse set of second subtasks. This is especially important for tasks like picking up a second block where in-hand space is an important constraint.

Click and drag to rotate · Right-click to pan · Scroll to zoom

Top Grasp Strategies per Task

Our curriculum selects the top three performing grasp strategies for each second subtask. Below we show the three strategies selected for each task in simulation from one of the curriculum seeds and point out some patterns between the strategies.

As we only train one seed for each grasp strategy (compared to multiple seeds of our second subtask policies) it isn't easy to discern how much of each grasp's performance on a particular second subtask is a result of its stability (in grasping the red block) versus its suitability to each task. In particular, in this grasping seed, the "Index Finger Only" and "Ring Finger + Thumb" grasps perform well on almost all of our tasks which may be because of their broad compatibility with the task suite or the high stability of the grasps (which is possibly seed specific) or both. Running more seeds for each grasp strategy and testing them on the second subtasks would likely help draw more concrete conclusions for what grasps are optimal for each second subtask.

Grasp + Push Object

Index + Middle Finger

Index Finger Only

Ring Finger + Thumb

Push is the easiest task of the five. Still, grasps that allow the index finger and thumb to better control the rotation of the push block (in blue) seem to do better here.

Grasp + Press Button

Index + Ring Finger

Ring Finger Only

Ring Finger + Thumb

Press requires precise finger insertion. Grasps that can keep the block to the side or back of the hand reduce collisions between the hand and button guard (in beige).

Grasp + Twist Knob

Index Finger Only

Middle Finger + Thumb

Ring Finger + Thumb

Twist demands sustained rotational contact and strong control over the block. Here, highly stable graps that leave more of the non-thumb fingers free succeed more often.

Grasp + Pull Drawer

Index Finger Only

Index + Ring Finger

Ring Finger + Thumb

Cabinet pulling requires hooking into the drawer handle while maintaining the initial grasp. Similar to press, this favors strategies that keep the block further back in the hand. Notably, the "index finger only" grasp here hooks the cabinet with the motor geometry on the index finger's knuckle which makes for a successful simulation strategy, but one that is difficult to transfer to real.

Grasp + Pick Second

Index Finger Only

Middle Finger + Thumb

Ring Finger + Thumb

Pick Second is the most resource-intensive task, requiring grasps with substantial in hand space and multiple free fingers to succeed.

Comparison with Demonstration-Based Methods

A natural question is whether demonstration-based methods, such as imitation learning and its variants, could replace our curriculum-based RL method. In practice, these approaches require both a nontrivial amount of human data collection and a teleoperation system capable of expressing the full range of hand configurations the hardware affords. Additionally, demonstrations often naturally reflect human grasping intuitions, which may inject behavioral bias that limits the efficacy of the resulting policy, especially in these more complex tasks where less intuitive strategies may be better.

As for the sim to real gap, because HANDFUL explores many candidate strategies in simulation, we can select among multiple viable options for real-world transfer. When a strategy proves difficult to transfer with our trajectory replay approach, an alternative is often available, partially compensating for the simplicity of our sim-to-real pipeline.

Demonstration-Based (DP3)
HANDFUL (Ours)

The teleoperated strategy primarily contacts the push object using the held block itself. HANDFUL instead discovers a strategy using finger contact, affording finer control over the pushed object.

Teleoperated demonstrations favor using the index finger to press the button, consistent with natural human motor strategy. HANDFUL transfers a thumb-press strategy, reflecting a different but viable use of the available degrees of freedom.

Both approaches decompose the task across two finger groups, but with notably different strategies.

Across all tasks, teleoperated demonstrations rely on finger-only grasps for the first object, while HANDFUL discovers finger + palm grasps as well. These grasps are difficult teleoperate due to the precision required to avoid hand-table contact (with the palm so close to the table) and more unlikely to arise from human demonstration due to the intuitive preference for fingertip-based grasping.