Job Description
To apply for this job, you need to complete both steps below:
STEP 1:
Please click the link to submit your application directly to the company:
Your application will only be received by Recruiter if submitted via above link.
STEP 2:
Kindly scroll to the bottom of this page and complete the short VinUni Tracking Form.
Filling out this form alone does not count as applying. Kindly remind this form is not part of the company’s application process. It only helps Careers, Alumni, Industry and Development (CAID) Department discover more opportunities and follow up in case of system issues.
Job Summary:
-
The Data Architect / Solution Architect (Data Platform) will be the mastermind behind the technical architecture of our next-generation Data Management and Processing Platform. This role is responsible for designing scalable, highly secure, and cost-effective cloud/hybrid storage systems alongside automated data pipelines capable of handling petabytes ($PB$) of unstructured multimedia data (e.g., massive egocentric video streams, audio, and sensor logs for humanoid robot training).
-
The ideal candidate will bridge the gap between traditional enterprise big data infrastructure and advanced AI engineering. You will architect high-throughput pipelines that leverage Computer Vision, Vision-Language Models (VLMs), and Vision-Language-Action (VLAs) models for automated data pre-processing, semantic scene cutting, and pre-labeling. Concurrently, you will ensure that the underlying data management infrastructure and data governance layers scale in perfect parallel to support this heavy computing throughput, while maintaining rigorous anti-copying security controls.
KEY RESPONSIBILITIES:
-
End-to-End Architecture Design: Design and implement the core data platform architecture, including data ingestion, stream/batch processing, petabyte-scale storage, and seamless data delivery layers.
-
AI-Driven Pipeline Integration (Vision/VLM/VLA): Architect and build high-throughput ETL/ELT pipelines that integrate state-of-the-art AI models (Computer Vision, VLMs, VLAs) to automate data pre-processing. This includes automated video curation, filtering out low-quality/redundant frames, semantic scene indexing, and automated pre-labeling of "Action Atoms" before human QA/QC verification.
-
Parallel Infrastructure Scaling: Design a framework where data processing compute power (GPU/CPU clusters for AI model inference) and data management storage/cataloging infrastructure scale seamlessly in parallel. Ensure zero bottlenecks as data volume expands towards the petabyte scale.
-
Storage Optimization & Cost Management: Optimize cloud storage topologies (primarily AWS S3 and hybrid cloud solutions) to achieve ultra-low storage costs for 2000TB+ environments, utilizing smart lifecycle policies, tiering (Glacier), and efficient indexing.
-
Secure Data Governance & Anti-Copying Solutions: Implement robust Data Governance, Identity and Access Management (IAM), and Confidential Computing frameworks. Design mechanisms (e.g., secure streaming, DRM, or sandboxed environments) that allow cross-functional teams (PO, BA, QA/QC, AI Engineers) to process and validate data without being able to download or make unauthorized local copies.
-
Cross-functional Alignment: Collaborate closely with the Head of Data Acquisition, POs, BAs, and AI Research teams to translate data collection business requirements and client specs into solid, future-proof technical blueprints.
JOB REQUIREMENTS:
Relevant education and experience:
-
Education: Bachelor’s or Master’s degree in Computer Science, Data Science, Artificial Intelligence, Software Engineering, or a related technical field. Professional certifications (e.g., AWS Certified Data Engineer, AWS Solutions Architect Professional, or Google Cloud Professional Data Architect) are highly preferred.
-
Core Technical Experience:
-
Minimum of 5+ years of experience in Data Architecture, System Architecture, or Senior Data Engineering, with a proven track record of building large-scale data platforms.
-
Strong experience with Big Data processing frameworks and ecosystems (such as PySpark, Apache Spark, Hadoop) and Python programming.
-
Hands-on mastery of Cloud Infrastructures (AWS preferred: S3, EC2, IAM, Lambda, Athena, EMR).
-
-
AI & Multimedia Pipeline Orchestration: Proven experience orchestrating AI inference within distributed data pipelines (deploying Vision models, VLMs, or LLMs at scale using frameworks like Ray, Triton Inference Server, or Kubernetes) to process high-volume unstructured video/audio data.
-
Enterprise Security Background: Direct experience in setting up enterprise-grade data security, access control, and data loss prevention (DLP) frameworks.
Preferred Qualifications:
-
Preferred Qualifications: Experience in building data platforms specifically for Autonomous Vehicles, Robotics, Computer Vision AI, or Large Multi-Modal Model (LMM) R&D startups is a massive advantage.
-
Composed & Analytical Mindset: Exceptionally composed and methodical under pressure. Able to systematically diagnose distributed computing bottlenecks, model inference latency, system failures, or security vulnerabilities and deliver structural architectural fixes.
-
Forward-Thinking & Scalability-Obsessed: Always designs systems with tomorrow's scale in mind, focusing on automation, infrastructure-as-code, and eliminating single points of failure through parallel scaling.
-
Collaborative Communicator: Capable of breaking down highly complex architectural and AI concepts into clear, understandable business terms for non-technical stakeholders (Ops, Clients, Managers).
Personality/ Attitude:
-
Exceptionally Composed under Complexity & Pressure: Maintains absolute calm, mental clarity, and a methodical approach when dealing with system failures, high-stakes security threats, model inference latencies, or tight deployment deadlines.
-
Parallel & Scalability-Obsessed Mindset: A forward-thinking engineer who naturally rejects short-term, manual patches. Driven by a passion for automation, infrastructure-as-code, and designing parallel systems that eliminate single points of failure.
-
Analytical & Deeply Structural Troubleshooter: Possesses an uncompromising, root-cause-analysis approach to problem-solving. Able to isolate bottlenecks systematically across distributed computing clusters, cloud storage layers, or complex AI models.
-
Collaborative & Articulate Communicator: Exceptional ability to translate highly dense architectural blueprints, data governance risks, and VLM/VLA capabilities into clear, actionable business insights for non-technical stakeholders (Ops teams, Product Owners, and clients).
-
Security-First Integrity: Demonstrates an unyielding commitment to data privacy, intellectual property protection, and secure infrastructure design, ensuring anti-copying frameworks are never compromised for convenience.
BENEFITS:
-
Highly competitive executive salary package tailored for Senior/Lead Architectural roles.
-
Deep technical ownership over a cutting-edge, petabyte-scale AI & Robotics data infrastructure from the ground up.
-
Premium private health insurance, top-tier hardware provisions (including access to high-performance compute resources), and flexible work opportunities.
-
Direct exposure to the most advanced AI and Humanoid Robotics training data paradigms in the region.

