comfyui workflow flux dev with lora
This is a complex Image-to-Image workflow for Flux Dev designed to restyle input images (specifically portraits) into a "Ghibli" or 2D illustration style while strictly preserving the subject's identity and the image's structure. Key features include: Auto-Captioning: It uses the Florence-2 model to automatically generate a detailed text description of the input image, which is then combined with a hardcoded "Ghibli Style" text prompt. Identity Preservation: It employs PuLID (specifically pulid_flux_v0.9.1) to ensure the facial identity of the source image is transferred to the output. Structural Control: It uses DepthAnything V2 to generate a depth map, which is fed into the sampler alongside a specific Depth LoRA (flux1-depth-dev-lora) to maintain the original 3D composition. Style & Refinement: The workflow incorporates Flux Redux (via ReduxAdvanced) to further reference the input image's style and uses a specific Ghibli-style checkpoint (IllustrationJuanerGhibli_v20) for the final aesthetic. Executive Summary: The "Quadruple Lock" Strategy This workflow is a highly constrained Image-to-Image (Img2Img) pipeline. Unlike standard img2img which relies heavily on "denoising strength" to preserve structure, this workflow uses four distinct "locking" mechanisms (PuLID, Redux, Depth LoRA, and InstructPixToPix) to force the Flux model to strictly adhere to the original image's Identity and 3D Geometry, changing only the texture and lighting to match a "Ghibli" anime aesthetic. 1. The Control Stack (How it preserves the image) This is the most complex part of the workflow. It uses a multi-layered approach to ensure the output looks exactly like the input subject, just drawn differently. Layer 1: Identity Locking (PuLID) Node: ApplyPulidFlux (Node 32) Mechanism: It uses pulid_flux_v0.9.1 and Eva Clip to analyze the face in the input image. Settings: Applied at 0.90 strength. This is extremely high, meaning the face in the output will be nearly identical to the input, preventing the "anime face" generalization that often happens with style models. Layer 2: Structural Locking (Depth LoRA) Node: Power Lora Loader (Node 4) Mechanism: Loads flux1-depth-dev-lora at 0.8 strength. This tells the model, "Pay attention to the depth cues." Layer 3: Compositional Locking (Flux Redux) Node: ReduxAdvanced (Node 25) Mechanism: Acts similarly to an IP-Adapter. It takes the sigclip_vision encoding of the input image and feeds it into the conditioning. Analysis: This transfers the general "vibe," color palette, and composition of the original photo into the generation. Layer 4: Pixel Conditioning (InstructPixToPix) Node: InstructPixToPixConditioning (Node 8) Unique Technique: Crucially, this node takes the Depth Map (generated by DepthAnythingV2Preprocessor) as its "pixels" input, not the original color image. This forces the diffusion model to treat the depth map as the fundamental "instruction" for the image structure. 2. The Prompting Engine (Automated Context) Instead of asking you to describe the image, the workflow automates this to ensure the AI "sees" what is actually there. Auto-Captioning: The Florence2Run node (Node 28) scans the input image and generates a text description (e.g., "A man in a suit..."). Prompt Concatenation: It combines a hardcoded style trigger (CR Text Node 30: "Ghibli Style, ") with the dynamic Florence-2 caption. Result: The final prompt sent to Flux is: "Ghibli Style, [Description of your image]". 3. The Model & Inference Stack Base Model: It does not use the standard Flux Dev UNET. It loads a custom checkpoint: IllustrationJuanerGhibli_v20.safetensors. This provides the baseline "painted" look. Turbo Acceleration: It loads the FLUX.1-Turbo-Alpha LoRA at strength 1.0. Benefit: This allows the KSampler to run at only 12 steps, significantly speeding up generation compared to the standard 20-30 steps required for Flux Dev. Sampler Settings: Sampler: dpmpp_2m (Standard for quality). Scheduler: beta (Often preferred for Flux). Denoising: 1.0 (Because it relies on the Control Stack for structure, it generates "from scratch" rather than iterating on existing pixels). 4. Technical Requirements & Dependencies To run this without errors, you need: VRAM: High requirement (likely 16GB+) due to loading Flux, CLIP, T5, Florence-2, PuLID, Redux, and InsightFace simultaneously. Model Files: You must download the specific Ghibli checkpoint, the Redux model (flux1-redux-dev), the Depth LoRA, and the PuLID models. Custom Nodes: ComfyUI-Florence2 PuLID ComfyUI_Comfyroll_CustomNodes (for the text concatenation) rgthree (for the Power Lora Loader) Conclusion This is a high-fidelity restyling workflow. It is ideal for users who want to turn themselves into anime characters without losing their likeness or having the AI randomly change their pose. The complexity suggests it was built by someone who understands that Flux can be "stubborn" and needs strong guidance (Depth + Redux) to adhere to specific source layouts.