{"id":835,"date":"2025-07-20T12:00:00","date_gmt":"2025-07-20T12:00:00","guid":{"rendered":"https:\/\/ouyangminwei.com\/?p=835"},"modified":"2025-07-18T09:11:47","modified_gmt":"2025-07-18T09:11:47","slug":"exploring-the-principles-of-multimodal-models","status":"publish","type":"post","link":"https:\/\/ouyangminwei.com\/index.php\/2025\/07\/20\/exploring-the-principles-of-multimodal-models\/","title":{"rendered":"\u4e92\u52d5\u5f0f\u591a\u6a21\u614b\u6a21\u578b\u539f\u7406\u63a2\u7d22"},"content":{"rendered":"\n<!DOCTYPE html>\n<html lang=\"zh-Hant\">\n<head>\n    <meta charset=\"UTF-8\">\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n    <title>\u4e92\u52d5\u5f0f\u591a\u6a21\u614b\u6a21\u578b\u539f\u7406\u63a2\u7d22<\/title>\n    <script src=\"https:\/\/cdn.tailwindcss.com\"><\/script>\n    <script src=\"https:\/\/cdn.jsdelivr.net\/npm\/chart.js\"><\/script>\n    <link rel=\"preconnect\" href=\"https:\/\/fonts.googleapis.com\">\n    <link rel=\"preconnect\" href=\"https:\/\/fonts.gstatic.com\" crossorigin>\n    <link href=\"https:\/\/fonts.googleapis.com\/css2?family=Noto+Sans+TC:wght@400;500;700&#038;display=swap\" rel=\"stylesheet\">\n    <!-- Chosen Palette: Warm Neutral Harmony -->\n    <!-- Application Structure Plan: The SPA is designed as a linear, three-step narrative (Encoding, Alignment, Generation) to guide the user through the complex process logically. This replaces the report's chapter structure with a task-oriented flow focused on understanding the core mechanisms. Navigation is anchored by a persistent three-button header, allowing users to jump between the key stages. The most critical section, Generation, is further divided into two switchable views (Image-to-Text and Text-to-Image) to clearly explain these distinct, asymmetric processes. This structure was chosen to simplify a dense technical topic into a digestible, interactive story, directly addressing the user's goal of understanding how these models produce outputs. -->\n    <!-- Visualization & Content Choices: 1. **Encoding:** Info -> Show image\/text breakdown. Goal: Organize. Method: HTML\/CSS diagram. Interaction: Hover for detail. Justification: Simple, clear visual representation of tokenization\/patching. 2. **Alignment:** Info -> Contrastive Loss & Modality Gap. Goal: Compare\/Relationships. Method: Interactive HTML grid (Loss) & Chart.js scatter plot (Gap). Interaction: Hover on grid, view distinct clusters on chart. Justification: Makes the abstract concepts of loss and embedding space tangible. Library: Chart.js. 3. **Generation (I2T):** Info -> Cross-Attention. Goal: Relationships. Method: Interactive HTML\/CSS diagram. Interaction: Click generated words to highlight corresponding image areas. Justification: Provides a direct, intuitive \"Aha!\" moment for how the model \"sees.\" 4. **Generation (T2I):** Info -> Diffusion\/Denoising. Goal: Change. Method: JS-controlled animation. Interaction: Automatic animation loop. Justification: Visualizes the abstract concept of iterative refinement from noise. CONFIRMS NO SVG\/Mermaid. -->\n    <!-- CONFIRMATION: NO SVG graphics used. NO Mermaid JS used. -->\n    <style>\n        body {\n            font-family: 'Noto Sans TC', sans-serif;\n            background-color: #FDFBF8;\n            color: #4A4A4A;\n        }\n        .chart-container {\n            position: relative;\n            width: 100%;\n            max-width: 500px;\n            margin-left: auto;\n            margin-right: auto;\n            height: 300px;\n            max-height: 350px;\n        }\n        @media (min-width: 768px) {\n            .chart-container {\n                height: 350px;\n            }\n        }\n        .nav-btn {\n            transition: all 0.3s ease;\n            border-bottom: 3px solid transparent;\n        }\n        .nav-btn.active {\n            border-bottom-color: #E57373;\n            color: #2A2A2A;\n            font-weight: 700;\n        }\n        .content-section {\n            transition: opacity 0.5s ease-in-out, transform 0.5s ease-in-out;\n            will-change: opacity, transform;\n        }\n        .content-section.hidden {\n            opacity: 0;\n            transform: translateY(20px);\n            position: absolute;\n            pointer-events: none;\n        }\n        .interactive-highlight {\n            transition: all 0.3s ease;\n            cursor: pointer;\n        }\n        .img-patch-highlight {\n            box-shadow: 0 0 15px 5px rgba(229, 115, 115, 0.8);\n            transform: scale(1.05);\n            border-color: #E57373;\n        }\n        .contrast-grid-cell {\n            transition: background-color 0.2s ease;\n        }\n    <\/style>\n<\/head>\n<body class=\"antialiased\">\n\n    <div class=\"container mx-auto px-4 py-8 max-w-5xl\">\n        <header class=\"text-center mb-12\">\n            <h1 class=\"text-4xl md:text-5xl font-bold text-gray-800 mb-3\">\u591a\u6a21\u614b\u6a21\u578b\u5982\u4f55\u904b\u4f5c\uff1f<\/h1>\n            <p class=\"text-lg text-gray-600 max-w-3xl mx-auto\">\u4e00\u500b\u95dc\u65bc\u6a21\u578b\u5982\u4f55\u7406\u89e3\u8207\u751f\u6210\u6587\u5b57\u548c\u5716\u50cf\u7684\u4e92\u52d5\u5f0f\u6307\u5357\u3002<\/p>\n        <\/header>\n\n        <nav class=\"sticky top-0 bg-opacity-80 backdrop-blur-md bg-[#FDFBF8] z-10 mb-12 border-b border-gray-200\">\n            <div class=\"flex justify-center space-x-4 sm:space-x-8 text-lg\">\n                <button data-section=\"encoding\" class=\"nav-btn py-4 px-2 active\">\u7b2c\u4e00\u6b65\uff1a\u7ffb\u8b6f (Encoding)<\/button>\n                <button data-section=\"alignment\" class=\"nav-btn py-4 px-2\">\u7b2c\u4e8c\u6b65\uff1a\u5c0d\u9f4a (Alignment)<\/button>\n                <button data-section=\"generation\" class=\"nav-btn py-4 px-2\">\u7b2c\u4e09\u6b65\uff1a\u751f\u6210 (Generation)<\/button>\n            <\/div>\n        <\/nav>\n\n        <main id=\"main-content\" class=\"relative\">\n            \n            <section id=\"encoding\" class=\"content-section space-y-12\">\n                <div class=\"text-center\">\n                    <h2 class=\"text-3xl font-bold text-gray-800\">\u7b2c\u4e00\u6b65\uff1a\u7ffb\u8b6f\u6210\u901a\u7528\u8a9e\u8a00<\/h2>\n                    <p class=\"mt-2 text-gray-600 max-w-2xl mx-auto\">\u6a21\u578b\u7684\u7b2c\u4e00\u9805\u4efb\u52d9\uff0c\u662f\u5c07\u7d50\u69cb\u5b8c\u5168\u4e0d\u540c\u7684\u5716\u50cf\u8207\u6587\u5b57\uff0c\u300c\u7ffb\u8b6f\u300d\u6210\u96fb\u8166\u80fd\u7406\u89e3\u7684\u901a\u7528\u683c\u5f0f\u2014\u2014\u5411\u91cf (Vectors)\u3002\u9019\u500b\u904e\u7a0b\u7a31\u70ba\u7de8\u78bc (Encoding)\u3002<\/p>\n                <\/div>\n\n                <div class=\"grid md:grid-cols-2 gap-8 items-start\">\n                    <div class=\"bg-white p-6 rounded-xl shadow-md border border-gray-100\">\n                        <h3 class=\"text-2xl font-semibold mb-4 text-center text-rose-500\">\u5716\u50cf\u7de8\u78bc (Vision Transformer)<\/h3>\n                        <p class=\"text-gray-600 mb-4 text-center\">\u5716\u50cf\u88ab\u5206\u5272\u6210\u5c0f\u5340\u584a (patches)\uff0c\u6bcf\u500b\u5340\u584a\u90fd\u88ab\u8f49\u63db\u6210\u4e00\u500b\u5411\u91cf\uff0c\u4e26\u52a0\u5165\u4f4d\u7f6e\u8cc7\u8a0a\uff0c\u8b93\u6a21\u578b\u77e5\u9053\u5b83\u5011\u7684\u76f8\u5c0d\u4f4d\u7f6e\u3002<\/p>\n                        <div class=\"flex flex-col items-center\">\n                            <div class=\"w-48 h-48 bg-gray-200 rounded-lg grid grid-cols-4 grid-rows-4 gap-1 p-1 mb-4\">\n                                <div class=\"bg-blue-200 rounded-sm\"><\/div><div class=\"bg-blue-300 rounded-sm\"><\/div><div class=\"bg-blue-200 rounded-sm\"><\/div><div class=\"bg-blue-300 rounded-sm\"><\/div>\n                                <div class=\"bg-blue-300 rounded-sm\"><\/div><div class=\"bg-blue-400 rounded-sm\"><\/div><div class=\"bg-blue-300 rounded-sm\"><\/div><div class=\"bg-blue-400 rounded-sm\"><\/div>\n                                <div class=\"bg-green-300 rounded-sm\"><\/div><div class=\"bg-green-400 rounded-sm\"><\/div><div class=\"bg-green-300 rounded-sm\"><\/div><div class=\"bg-green-400 rounded-sm\"><\/div>\n                                <div class=\"bg-green-200 rounded-sm\"><\/div><div class=\"bg-green-300 rounded-sm\"><\/div><div class=\"bg-green-200 rounded-sm\"><\/div><div class=\"bg-green-300 rounded-sm\"><\/div>\n                            <\/div>\n                            <div class=\"text-3xl font-bold text-gray-400 my-2\">\u2193<\/div>\n                            <div class=\"flex space-x-2\">\n                                <div class=\"w-8 h-16 bg-rose-200 rounded\"><\/div>\n                                <div class=\"w-8 h-16 bg-rose-300 rounded\"><\/div>\n                                <div class=\"w-8 h-16 bg-rose-200 rounded\"><\/div>\n                                <div class=\"w-8 h-16 bg-rose-300 rounded\"><\/div>\n                                <p class=\"self-center text-xl font-bold text-gray-500\">&#8230;<\/p>\n                            <\/div>\n                            <p class=\"mt-2 text-sm text-gray-500\">\u8f49\u63db\u70ba\u5411\u91cf\u5e8f\u5217<\/p>\n                        <\/div>\n                    <\/div>\n                    <div class=\"bg-white p-6 rounded-xl shadow-md border border-gray-100\">\n                        <h3 class=\"text-2xl font-semibold mb-4 text-center text-sky-500\">\u6587\u5b57\u7de8\u78bc (Transformer)<\/h3>\n                        <p class=\"text-gray-600 mb-4 text-center\">\u6587\u5b57\u88ab\u5206\u89e3\u70ba\u8a5e\u5143 (tokens)\uff0c\u6bcf\u500b\u8a5e\u5143\u540c\u6a23\u88ab\u8f49\u63db\u6210\u4e00\u500b\u5411\u91cf\uff0c\u4e26\u900f\u904e\u81ea\u6ce8\u610f\u529b\u6a5f\u5236\u7406\u89e3\u4e0a\u4e0b\u6587\u3002<\/p>\n                         <div class=\"flex flex-col items-center\">\n                            <div class=\"w-full h-48 bg-gray-100 rounded-lg flex items-center justify-center p-4 mb-4\">\n                                <p class=\"text-xl font-serif text-gray-700\">&#8220;\u4e00\u96bb\u8c93\u7684\u7167\u7247&#8221;<\/p>\n                            <\/div>\n                            <div class=\"text-3xl font-bold text-gray-400 my-2\">\u2193<\/div>\n                            <div class=\"flex space-x-2\">\n                                <div class=\"w-12 h-16 bg-sky-200 rounded flex items-center justify-center text-xs\">\u4e00\u96bb<\/div>\n                                <div class=\"w-12 h-16 bg-sky-300 rounded flex items-center justify-center text-xs\">\u8c93\u7684<\/div>\n                                <div class=\"w-12 h-16 bg-sky-200 rounded flex items-center justify-center text-xs\">\u7167\u7247<\/div>\n                            <\/div>\n                             <p class=\"mt-2 text-sm text-gray-500\">\u8f49\u63db\u70ba\u5411\u91cf\u5e8f\u5217<\/p>\n                        <\/div>\n                    <\/div>\n                <\/div>\n                <div class=\"bg-amber-50 border border-amber-200 p-4 rounded-lg text-center\">\n                    <p class=\"text-amber-800\">\ud83d\udca1 <b>\u4e00\u500b\u95dc\u9375\u8da8\u52e2\uff1a<\/b> \u8996\u89ba\u548c\u8a9e\u8a00\u8655\u7406\u90fd\u63a1\u7528\u4e86 Transformer \u67b6\u69cb\u3002\u9019\u610f\u5473\u8457\u5b83\u5011\u7684\u300c\u5167\u90e8\u8a9e\u8a00\u300d\u8b8a\u5f97\u76f8\u4f3c\uff0c\u70ba\u4e0b\u4e00\u6b65\u7684\u300c\u5c0d\u9f4a\u300d\u92ea\u5e73\u4e86\u9053\u8def\u3002<\/p>\n                <\/div>\n            <\/section>\n\n            <section id=\"alignment\" class=\"content-section space-y-12 hidden\">\n                <div class=\"text-center\">\n                    <h2 class=\"text-3xl font-bold text-gray-800\">\u7b2c\u4e8c\u6b65\uff1a\u8b93\u6982\u5ff5\u5728\u7a7a\u9593\u4e2d\u76f8\u9047<\/h2>\n                    <p class=\"mt-2 text-gray-600 max-w-2xl mx-auto\">\u6709\u4e86\u901a\u7528\u7684\u5411\u91cf\u8a9e\u8a00\u5f8c\uff0c\u6a21\u578b\u9700\u8981\u5b78\u7fd2\u5982\u4f55\u300c\u5c0d\u9f4a\u300d\u5b83\u5011\u3002\u76ee\u6a19\u662f\u5275\u5efa\u4e00\u500b\u5171\u4eab\u7684\u300c\u6f5b\u5728\u7a7a\u9593\u300d\uff0c\u5728\u9019\u500b\u7a7a\u9593\u88e1\uff0c\u8a9e\u610f\u76f8\u4f3c\u7684\u5716\u50cf\u548c\u6587\u5b57\u5411\u91cf\u6703\u5f7c\u6b64\u9760\u8fd1\u3002<\/p>\n                <\/div>\n                \n                <div class=\"bg-white p-6 rounded-xl shadow-md border border-gray-100\">\n                    <h3 class=\"text-2xl font-semibold mb-4 text-center\">\u6838\u5fc3\u6a5f\u5236\uff1a\u5c0d\u6bd4\u5b78\u7fd2 (CLIP)<\/h3>\n                    <p class=\"text-gray-600 mb-6 text-center max-w-2xl mx-auto\">\u6a21\u578b\u89c0\u770b\u6578\u5104\u500b\u300c\u5716\u7247-\u6587\u5b57\u300d\u914d\u5c0d\u3002\u5b78\u7fd2\u7684\u76ee\u6a19\u5f88\u7c21\u55ae\uff1a\u6700\u5927\u5316\u6b63\u78ba\u914d\u5c0d\uff08\u4e0b\u5716\u5c0d\u89d2\u7dda\uff09\u7684\u76f8\u4f3c\u5ea6\uff0c\u540c\u6642\u6700\u5c0f\u5316\u932f\u8aa4\u914d\u5c0d\u7684\u76f8\u4f3c\u5ea6\u3002\u5c07\u6ed1\u9f20\u61f8\u505c\u5728\u7db2\u683c\u4e0a\u67e5\u770b\u3002<\/p>\n                    <div id=\"contrast-grid-container\" class=\"mx-auto\" style=\"max-width: 400px;\"><\/div>\n                    <p id=\"grid-tooltip\" class=\"text-center mt-4 h-6 text-gray-600 font-medium\"><\/p>\n                <\/div>\n\n                <div class=\"bg-white p-6 rounded-xl shadow-md border border-gray-100\">\n                    <h3 class=\"text-2xl font-semibold mb-4 text-center\">\u4e00\u500b\u6709\u8da3\u7684\u73fe\u8c61\uff1a\u300c\u6a21\u614b\u9d3b\u6e9d\u300d<\/h3>\n                    <p class=\"text-gray-600 mb-6 text-center max-w-2xl mx-auto\">\u5118\u7ba1\u7d93\u904e\u5c0d\u9f4a\uff0c\u4f46\u5716\u50cf\u5411\u91cf\u548c\u6587\u5b57\u5411\u91cf\u5728\u6f5b\u5728\u7a7a\u9593\u4e2d\u4ecd\u50be\u5411\u65bc\u5f62\u6210\u5404\u81ea\u7684\u7fa4\u96c6\uff0c\u800c\u975e\u5b8c\u7f8e\u6df7\u5408\u3002\u9019\u8868\u660e\u5c0d\u9f4a\u7684\u662f\u5b83\u5011\u7684\u300c\u76f8\u5c0d\u95dc\u4fc2\u300d\uff0c\u800c\u975e\u7d55\u5c0d\u4f4d\u7f6e\u3002<\/p>\n                    <div class=\"chart-container\">\n                        <canvas id=\"modalityGapChart\"><\/canvas>\n                    <\/div>\n                <\/div>\n            <\/section>\n\n            <section id=\"generation\" class=\"content-section space-y-12 hidden\">\n                <div class=\"text-center\">\n                    <h2 class=\"text-3xl font-bold text-gray-800\">\u7b2c\u4e09\u6b65\uff1a\u5f9e\u5c0d\u9f4a\u7684\u8868\u5fb5\u751f\u6210\u8f38\u51fa<\/h2>\n                    <p class=\"mt-2 text-gray-600 max-w-2xl mx-auto\">\u7576\u6a21\u578b\u7406\u89e3\u4e86\u5716\u6587\u95dc\u4fc2\u5f8c\uff0c\u5c31\u80fd\u57f7\u884c\u751f\u6210\u4efb\u52d9\u3002\u9019\u5206\u70ba\u5169\u7a2e\u622a\u7136\u4e0d\u540c\u7684\u8def\u5f91\uff1a\u5f9e\u5716\u50cf\u751f\u6210\u6587\u5b57\uff0c\u6216\u5f9e\u6587\u5b57\u751f\u6210\u5716\u50cf\u3002<\/p>\n                <\/div>\n\n                <div class=\"w-full bg-white p-6 rounded-xl shadow-md border border-gray-100\">\n                    <div class=\"flex justify-center mb-6 border-b\">\n                        <button id=\"show-i2t\" class=\"gen-toggle-btn py-2 px-4 text-lg font-medium border-b-2 border-rose-500 text-rose-600\">\u5716\u751f\u6587 (Image-to-Text)<\/button>\n                        <button id=\"show-t2i\" class=\"gen-toggle-btn py-2 px-4 text-lg font-medium border-b-2 border-transparent text-gray-400\">\u6587\u751f\u5716 (Text-to-Image)<\/button>\n                    <\/div>\n\n                    <div id=\"i2t-content\">\n                        <h3 class=\"text-2xl font-semibold mb-2 text-center\">\u5716\u751f\u6587\uff1a\u770b\u5716\u8aaa\u6545\u4e8b<\/h3>\n                        <p class=\"text-gray-600 mb-6 text-center\">\u9019\u662f\u4e00\u500b\u300c\u7ffb\u8b6f\u300d\u4efb\u52d9\u3002\u8a9e\u8a00\u89e3\u78bc\u5668\u5728\u751f\u6210\u6bcf\u500b\u8a5e\u6642\uff0c\u6703\u900f\u904e\u300c\u8de8\u6ce8\u610f\u529b\u6a5f\u5236\u300d\u53bb\u300c\u770b\u300d\u5716\u50cf\u7684\u4e0d\u540c\u5340\u57df\uff0c\u6c7a\u5b9a\u4e0b\u4e00\u500b\u8a5e\u61c9\u8a72\u662f\u4ec0\u9ebc\u3002<b>\u9ede\u64ca\u4e0b\u65b9\u751f\u6210\u7684\u6587\u5b57\uff0c\u770b\u770b\u6a21\u578b\u5728\u300c\u770b\u300d\u54ea\u88e1\u3002<\/b><\/p>\n                        \n                        <div class=\"flex flex-col md:flex-row gap-6 items-center justify-center\">\n                            <div id=\"i2t-image\" class=\"w-64 h-64 bg-gray-200 rounded-lg grid grid-cols-2 grid-rows-2 gap-1 p-1 relative transition-all duration-300\">\n                                <div class=\"bg-cover bg-center rounded-sm transition-all duration-300\" style=\"background-image: url('https:\/\/placehold.co\/128x128\/a5b4fc\/4338ca?text=Sky')\"><\/div>\n                                <div class=\"bg-cover bg-center rounded-sm transition-all duration-300\" style=\"background-image: url('https:\/\/placehold.co\/128x128\/fca5a5\/7f1d1d?text=Dog')\"><\/div>\n                                <div class=\"bg-cover bg-center rounded-sm transition-all duration-300\" style=\"background-image: url('https:\/\/placehold.co\/128x128\/a7f3d0\/064e3b?text=Grass')\"><\/div>\n                                <div class=\"bg-cover bg-center rounded-sm transition-all duration-300\" style=\"background-image: url('https:\/\/placehold.co\/128x128\/fde047\/854d0e?text=Sun')\"><\/div>\n                            <\/div>\n                             <div class=\"text-2xl font-bold text-gray-400\">\u2192<\/div>\n                            <div class=\"text-2xl md:text-3xl font-serif text-gray-700 p-4 bg-gray-50 rounded-lg\">\n                                <span class=\"interactive-highlight\" data-target=\"0\">\u4e00\u96bb<\/span>\n                                <span class=\"interactive-highlight\" data-target=\"1\">\u5feb\u6a02\u7684\u72d7<\/span>\n                                <span class=\"interactive-highlight\" data-target=\"2\">\u5728\u8349\u5730\u4e0a<\/span>\n                                <span class=\"interactive-highlight\" data-target=\"3\">\u66ec\u592a\u967d<\/span>\n                                <span class=\"interactive-highlight\" data-target=\"-1\">\u3002<\/span>\n                            <\/div>\n                        <\/div>\n                    <\/div>\n\n                    <div id=\"t2i-content\" class=\"hidden\">\n                        <h3 class=\"text-2xl font-semibold mb-2 text-center\">\u6587\u751f\u5716\uff1a\u6309\u6587\u5b57\u63cf\u8ff0\u5275\u4f5c<\/h3>\n                        <p class=\"text-gray-600 mb-6 text-center\">\u9019\u662f\u4e00\u500b\u300c\u53d7\u7d04\u675f\u7684\u5275\u4f5c\u300d\u4efb\u52d9\u3002\u6a21\u578b\u5f9e\u4e00\u5718\u96a8\u6a5f\u96dc\u8a0a\u958b\u59cb\uff0c\u5728\u6587\u5b57\u5411\u91cf\u7684\u300c\u5f15\u5c0e\u300d\u4e0b\uff0c\u4e00\u6b65\u6b65\u5730\u5c07\u96dc\u8a0a\u53bb\u9664\uff0c\u6700\u7d42\u5f62\u6210\u7b26\u5408\u63cf\u8ff0\u7684\u5716\u50cf\u3002<\/p>\n                        <div class=\"flex flex-col items-center\">\n                             <div class=\"w-full max-w-md h-16 bg-gray-100 rounded-lg flex items-center justify-center p-4 mb-4\">\n                                <p class=\"text-lg font-serif text-gray-700\">&#8220;\u4e00\u5ea7\u6f02\u6d6e\u5728\u96f2\u7aef\u4e0a\u7684\u57ce\u5821&#8221;<\/p>\n                            <\/div>\n                            <div class=\"text-3xl font-bold text-gray-400 my-2\">\u2193<\/div>\n                            <div id=\"denoise-animation\" class=\"w-64 h-64 bg-gray-300 rounded-lg flex items-center justify-center transition-all duration-500 text-white font-bold text-2xl\" style=\"background-size: cover;\">\n                                \u6b63\u5728\u521d\u59cb\u5316&#8230;\n                            <\/div>\n                            <button id=\"restart-denoise\" class=\"mt-4 bg-rose-500 text-white py-2 px-4 rounded-lg hover:bg-rose-600 transition\">\u91cd\u65b0\u64ad\u653e<\/button>\n                        <\/div>\n                    <\/div>\n                <\/div>\n\n            <\/section>\n\n        <\/main>\n        \n        <footer class=\"mt-16 pt-8 border-t border-gray-200 text-center text-gray-500\">\n            <p class=\"font-bold text-2xl text-gray-700 mb-4\">\u7e3d\u7d50\uff1a\u4e09\u5927\u652f\u67f1<\/p>\n            <div class=\"grid md:grid-cols-3 gap-4 max-w-4xl mx-auto\">\n                <div class=\"bg-white p-4 rounded-lg shadow-sm border border-gray-100\"><b>1. \u8da8\u540c\u7684\u7de8\u78bc\u5668<\/b><br>\u4f7f\u7528 Transformer \u4f5c\u70ba\u7d71\u4e00\u67b6\u69cb\u8655\u7406\u4e0d\u540c\u6a21\u614b\u3002<\/div>\n                <div class=\"bg-white p-4 rounded-lg shadow-sm border border-gray-100\"><b>2. \u53ef\u64f4\u5c55\u7684\u5c0d\u9f4a<\/b><br>\u900f\u904e\u5c0d\u6bd4\u5b78\u7fd2\u5728\u5927\u898f\u6a21\u8cc7\u6599\u4e0a\u5c0d\u9f4a\u8a9e\u7fa9\u3002<\/div>\n                <div class=\"bg-white p-4 rounded-lg shadow-sm border border-gray-100\"><b>3. \u5c08\u9580\u5316\u7684\u89e3\u78bc\u5668<\/b><br>\u6839\u64da\u76ee\u6a19\u8f38\u51fa\uff08\u6587\u5b57\u6216\u5716\u50cf\uff09\u63a1\u7528\u4e0d\u540c\u7684\u751f\u6210\u7b56\u7565\u3002<\/div>\n            <\/div>\n            <p class=\"mt-8 text-sm\">\u6b64\u4e92\u52d5\u5f0f\u61c9\u7528\u7a0b\u5f0f\u6839\u64da\u63d0\u4f9b\u7684\u6280\u8853\u5831\u544a\u751f\u6210\uff0c\u65e8\u5728\u7c21\u5316\u8907\u96dc\u7684\u591a\u6a21\u614b\u6982\u5ff5\u3002<\/p>\n        <\/footer>\n\n    <\/div>\n\n<script>\ndocument.addEventListener('DOMContentLoaded', () => {\n\n    const mainContent = document.getElementById('main-content');\n    const navButtons = document.querySelectorAll('.nav-btn');\n    const contentSections = document.querySelectorAll('.content-section');\n\n    function switchTab(targetSectionId) {\n        navButtons.forEach(btn => {\n            btn.classList.toggle('active', btn.dataset.section === targetSectionId);\n        });\n        \n        mainContent.style.minHeight = mainContent.offsetHeight + 'px';\n\n        contentSections.forEach(section => {\n            if (section.id === targetSectionId) {\n                section.classList.remove('hidden', 'absolute');\n            } else {\n                section.classList.add('hidden', 'absolute');\n            }\n        });\n\n        requestAnimationFrame(() => {\n           mainContent.style.minHeight = '0px';\n        });\n    }\n\n    navButtons.forEach(button => {\n        button.addEventListener('click', () => {\n            switchTab(button.dataset.section);\n        });\n    });\n\n    const modalityGapChartCtx = document.getElementById('modalityGapChart').getContext('2d');\n    const modalityGapChart = new Chart(modalityGapChartCtx, {\n        type: 'scatter',\n        data: {\n            datasets: [{\n                label: '\u5716\u50cf\u5d4c\u5165 (Image Embeddings)',\n                data: Array.from({length: 50}, () => ({ x: Math.random() * 0.4 + 0.1, y: Math.random() * 0.8 + 0.1 })),\n                backgroundColor: 'rgba(229, 115, 115, 0.7)',\n                pointRadius: 6,\n                pointHoverRadius: 8\n            }, {\n                label: '\u6587\u5b57\u5d4c\u5165 (Text Embeddings)',\n                data: Array.from({length: 50}, () => ({ x: Math.random() * 0.4 + 0.5, y: Math.random() * 0.8 + 0.1 })),\n                backgroundColor: 'rgba(56, 189, 248, 0.7)',\n                pointRadius: 6,\n                pointHoverRadius: 8\n            }]\n        },\n        options: {\n            responsive: true,\n            maintainAspectRatio: false,\n            plugins: {\n                legend: {\n                    position: 'top',\n                },\n                tooltip: {\n                    callbacks: {\n                        label: function(context) {\n                            return `${context.dataset.label}`;\n                        }\n                    }\n                },\n                title: {\n                    display: true,\n                    text: '\u6f5b\u5728\u7a7a\u9593\u4e2d\u7684\u6a21\u614b\u9d3b\u6e9d'\n                }\n            },\n            scales: {\n                x: {\n                    display: false\n                },\n                y: {\n                    display: false\n                }\n            }\n        }\n    });\n\n    const i2tImage = document.getElementById('i2t-image');\n    const i2tSpans = document.querySelectorAll('#i2t-content .interactive-highlight');\n    const imagePatches = i2tImage.children;\n\n    i2tSpans.forEach(span => {\n        span.addEventListener('click', () => {\n            const targetIndex = parseInt(span.dataset.target, 10);\n            \n            for (let i = 0; i < imagePatches.length; i++) {\n                imagePatches[i].classList.toggle('img-patch-highlight', i === targetIndex);\n            }\n\n            i2tSpans.forEach(s => s.classList.remove('text-rose-600', 'font-bold'));\n            span.classList.add('text-rose-600', 'font-bold');\n\n            if(targetIndex === -1){\n                 for (let i = 0; i < imagePatches.length; i++) {\n                    imagePatches[i].classList.remove('img-patch-highlight');\n                }\n            }\n        });\n    });\n\n    const genToggleButtons = document.querySelectorAll('.gen-toggle-btn');\n    const i2tContent = document.getElementById('i2t-content');\n    const t2iContent = document.getElementById('t2i-content');\n\n    document.getElementById('show-i2t').addEventListener('click', () => {\n        i2tContent.classList.remove('hidden');\n        t2iContent.classList.add('hidden');\n        document.getElementById('show-i2t').classList.add('border-rose-500', 'text-rose-600');\n        document.getElementById('show-i2t').classList.remove('text-gray-400', 'border-transparent');\n        document.getElementById('show-t2i').classList.add('text-gray-400', 'border-transparent');\n        document.getElementById('show-t2i').classList.remove('border-sky-500', 'text-sky-600');\n    });\n\n    document.getElementById('show-t2i').addEventListener('click', () => {\n        t2iContent.classList.remove('hidden');\n        i2tContent.classList.add('hidden');\n        document.getElementById('show-t2i').classList.add('border-sky-500', 'text-sky-600');\n        document.getElementById('show-t2i').classList.remove('text-gray-400', 'border-transparent');\n        document.getElementById('show-i2t').classList.add('text-gray-400', 'border-transparent');\n        document.getElementById('show-i2t').classList.remove('border-rose-500', 'text-rose-600');\n        startDenoiseAnimation();\n    });\n    \n    const denoiseContainer = document.getElementById('denoise-animation');\n    let denoiseInterval;\n\n    const denoiseStages = [\n        { blur: '16px', text: '\u6b65\u9a5f 1: 90% \u96dc\u8a0a' },\n        { blur: '12px', text: '\u6b65\u9a5f 2: 70% \u96dc\u8a0a' },\n        { blur: '8px', text: '\u6b65\u9a5f 3: 50% \u96dc\u8a0a' },\n        { blur: '4px', text: '\u6b65\u9a5f 4: 30% \u96dc\u8a0a' },\n        { blur: '2px', text: '\u6b65\u9a5f 5: 10% \u96dc\u8a0a' },\n        { blur: '0px', text: '\u5b8c\u6210\uff01' }\n    ];\n\n    function startDenoiseAnimation() {\n        clearInterval(denoiseInterval);\n        let currentStage = 0;\n        denoiseContainer.style.backgroundImage = `url('https:\/\/placehold.co\/256x256\/3b82f6\/ffffff?text=Castle+in+Clouds')`;\n\n        denoiseInterval = setInterval(() => {\n            if (currentStage < denoiseStages.length) {\n                const stage = denoiseStages[currentStage];\n                denoiseContainer.style.filter = `blur(${stage.blur})`;\n                denoiseContainer.textContent = stage.text;\n                currentStage++;\n            } else {\n                clearInterval(denoiseInterval);\n            }\n        }, 800);\n    }\n    \n    document.getElementById('restart-denoise').addEventListener('click', startDenoiseAnimation);\n\n    const gridContainer = document.getElementById('contrast-grid-container');\n    const gridTooltip = document.getElementById('grid-tooltip');\n    const gridSize = 8;\n    const images = ['\ud83d\udc36', '\ufffd', '\ud83d\ude97', '\ud83c\udf33', '\ud83c\udfe0', '\ud83c\udf4e', '\u2b50', '\ud83c\udf0a'];\n    const texts = ['\u72d7', '\u8c93', '\u8eca', '\u6a39', '\u623f\u5b50', '\u860b\u679c', '\u661f\u661f', '\u6d77\u6d6a'];\n    \n    let gridHtml = '<div class=\"grid gap-1\" style=\"grid-template-columns: repeat(' + (gridSize + 1) + ', minmax(0, 1fr));\">';\n    gridHtml += '<div><\/div>'; \n    for(let i=0; i<gridSize; i++) {\n        gridHtml += `<div class=\"text-center font-bold text-sm\">${texts[i]}<\/div>`;\n    }\n\n    for (let i = 0; i < gridSize; i++) {\n        gridHtml += `<div class=\"text-center font-bold text-2xl\">${images[i]}<\/div>`;\n        for (let j = 0; j < gridSize; j++) {\n            const isPositive = i === j;\n            gridHtml += `<div class=\"contrast-grid-cell w-full aspect-square rounded ${isPositive ? 'bg-emerald-200' : 'bg-red-100'}\" data-positive=\"${isPositive}\" data-text=\"${texts[j]}\" data-image=\"${images[i]}\"><\/div>`;\n        }\n    }\n    gridHtml += '<\/div>';\n    gridContainer.innerHTML = gridHtml;\n\n    gridContainer.querySelectorAll('.contrast-grid-cell').forEach(cell => {\n        cell.addEventListener('mouseover', (e) => {\n            const isPositive = e.target.dataset.positive === 'true';\n            const text = e.target.dataset.text;\n            const image = e.target.dataset.image;\n            e.target.style.backgroundColor = isPositive ? '#34d399' : '#f87171';\n            if (isPositive) {\n                gridTooltip.textContent = `\u2705 \u6b63\u78ba\u914d\u5c0d: \u5716\u50cf ${image} \u8207\u6587\u5b57 \"${text}\"\u3002\u6a21\u578b\u6703\u62c9\u8fd1\u5b83\u5011\u3002`;\n                gridTooltip.style.color = '#059669';\n            } else {\n                gridTooltip.textContent = `\u274c \u932f\u8aa4\u914d\u5c0d: \u5716\u50cf ${image} \u8207\u6587\u5b57 \"${text}\"\u3002\u6a21\u578b\u6703\u63a8\u958b\u5b83\u5011\u3002`;\n                gridTooltip.style.color = '#ef4444';\n            }\n        });\n        cell.addEventListener('mouseout', (e) => {\n             const isPositive = e.target.dataset.positive === 'true';\n             e.target.style.backgroundColor = isPositive ? '#a7f3d0' : '#fecaca';\n             gridTooltip.textContent = '';\n        });\n    });\n\n});\n<\/script>\n\n<\/body>\n<\/html>\n\ufffd\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u4e92\u52d5\u5f0f\u591a\u6a21\u614b\u6a21\u578b\u539f\u7406\u63a2\u7d22 \u591a\u6a21\u614b\u6a21\u578b\u5982\u4f55\u904b\u4f5c\uff1f \u4e00\u500b\u95dc\u65bc\u6a21\u578b\u5982\u4f55\u7406\u89e3\u8207\u751f\u6210\u6587\u5b57\u548c &hellip; <a href=\"https:\/\/ouyangminwei.com\/index.php\/2025\/07\/20\/exploring-the-principles-of-multimodal-models\/\">\u95b1\u8b80\u5168\u6587 <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[1],"tags":[],"post_format":[],"class_list":["post-835","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_edit_lock":"1752829907:1","_edit_last":"1","_aioseo_title":"#post_title #separator_sa #site_title","_aioseo_description":"#post_excerpt","_aioseo_keywords":"","_aioseo_og_title":"","_aioseo_og_description":"","_aioseo_og_article_section":"","_aioseo_og_article_tags":"","_aioseo_twitter_title":"","_aioseo_twitter_description":"","_oembed_2544c1d0cb3503ab4c4d558c3b3c8873":"","_oembed_time_2544c1d0cb3503ab4c4d558c3b3c8873":"","_oembed_99481806ecbe6ce4ee46f8588d320993":"","_oembed_db663acf973e82e6d9d80df71945dfb8":"","_oembed_16cdfab488f57db73586f4286af2704f":"","_wp_old_slug":"","_links":{"self":[{"href":"https:\/\/ouyangminwei.com\/index.php\/wp-json\/wp\/v2\/posts\/835","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ouyangminwei.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ouyangminwei.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ouyangminwei.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ouyangminwei.com\/index.php\/wp-json\/wp\/v2\/comments?post=835"}],"version-history":[{"count":2,"href":"https:\/\/ouyangminwei.com\/index.php\/wp-json\/wp\/v2\/posts\/835\/revisions"}],"predecessor-version":[{"id":838,"href":"https:\/\/ouyangminwei.com\/index.php\/wp-json\/wp\/v2\/posts\/835\/revisions\/838"}],"wp:attachment":[{"href":"https:\/\/ouyangminwei.com\/index.php\/wp-json\/wp\/v2\/media?parent=835"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ouyangminwei.com\/index.php\/wp-json\/wp\/v2\/categories?post=835"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ouyangminwei.com\/index.php\/wp-json\/wp\/v2\/tags?post=835"},{"taxonomy":"post_format","embeddable":true,"href":"https:\/\/ouyangminwei.com\/index.php\/wp-json\/wp\/v2\/post_format?post=835"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}