A AAIA Study GuideISACA Advanced in AI Auditβ„’
Test yourself

Practice Questions

Scenario-style questions with full explanations β€” just like the real exam.

Click an option to lock in your answer. The correct choice and a full explanation appear instantly, and your score updates above. Use the domain filters to drill a single area, or hit Reset quiz to start over. There's exactly one best answer per question β€” pick what an auditor should do FIRST or BEST.

0 / 0
0 of 90 answered this session

History: 0 correct Β· 0 to review Β· 90 untried

⏱️ Take a timed mock exam

πŸ’Ύ
Your progress is saved

Each question remembers whether you last got it right or wrong (stored in this browser). Use β˜… Review mistakes to drill only the ones you missed.

Domain 1 Β· Governance & Risk

An auditor reviewing a newly deployed credit-scoring model finds that the data science team built and validated it, but no business or risk owner is formally accountable for its decisions. What should the auditor do FIRST?

  • ARecommend the data science team be made accountable, since they built the model.
  • BRaise a finding that accountability for the model has not been assigned, and recommend a named model owner in the business be established.
  • CDisable the model until a fairness assessment is completed.
  • DConclude the model is acceptable because it was independently validated.
Answer: B. A core governance control is clear accountability β€” a named business/risk owner who answers for the model's outcomes. The gap is the root governance risk, so document it and recommend assigning ownership. A is wrong: developers shouldn't own the business decision (and that would weaken segregation). C is a management action the auditor doesn't take, and it overreacts to a governance gap. D ignores the missing accountability entirely.
Domain 1 Β· Governance & Risk

An organization is building an AI system that assigns risk scores used to screen job applicants. Under the EU AI Act, how should this system most likely be classified, and what does that imply?

  • AMinimal risk β€” no specific obligations apply.
  • BLimited risk β€” only transparency notices are required.
  • CProhibited β€” employment-related AI is banned outright.
  • DHigh risk β€” it triggers obligations such as risk management, data governance, human oversight, and conformity assessment.
Answer: D. The EU AI Act lists AI used in employment and worker management (including recruitment and candidate evaluation) as high-risk, which brings obligations like risk management, high-quality data governance, logging, transparency, human oversight, and conformity assessment. A and B understate the tier. C is wrong: most employment AI is high-risk, not prohibited β€” the prohibited tier covers things like social scoring and certain biometric categorization, not recruitment scoring generally.
Domain 1 Β· Governance & Risk

Management asks internal audit to help define which AI use cases the company should pursue and to design the model approval workflow. The CAE is concerned. What is the BEST response?

  • ADecline to design or own the workflow, as that would impair independence; offer to advise and later audit the controls management establishes.
  • BAccept fully β€” audit's involvement guarantees the controls will be strong.
  • CAccept, but have a different auditor sign the final report.
  • DRefuse any involvement with AI governance whatsoever.
Answer: A. Internal audit can advise, but designing and owning a control process is a management responsibility β€” taking it on creates a self-review threat to independence when audit later evaluates it. B and C don't fix the impairment (rotating signatures doesn't cure ownership of the design). D over-corrects: advisory input is appropriate, only ownership is the problem.
Domain 1 Β· Governance & Risk

A company wants a single framework to set up an auditable, certifiable management system for governing AI across its lifecycle. Which is the most appropriate primary reference?

  • AISO/IEC 42001, the AI management system standard.
  • BThe EU AI Act.
  • CThe OWASP Top 10 for LLMs.
  • DGDPR.
Answer: A. ISO/IEC 42001 is a certifiable management-system standard (like ISO 27001 for security) specifically for governing AI β€” exactly what's asked. B is law, not a management system you certify against. C is a security-focused vulnerability list for LLM applications, not a governance management system. D governs personal data processing, not AI management broadly.
Domain 1 Β· Governance & Risk

An auditor maps the organization's AI program to the NIST AI RMF and finds strong activity in Map, Measure, and Manage, but little in the Govern function. What is the most significant implication?

  • ANone β€” Govern is optional once the other three functions are in place.
  • BGovern is the cross-cutting function that establishes culture, accountability, and policies; its weakness undermines the consistency and oversight of the other three.
  • CIt only matters for generative AI systems.
  • DThe organization should drop NIST and adopt a different framework.
Answer: B. In the NIST AI RMF, Govern is the cross-cutting function that creates the policies, roles, accountability, and culture the other functions depend on. Weak governance means Map/Measure/Manage are happening without consistent oversight or risk tolerance. A and C are wrong: Govern applies to all AI and is foundational. D is an overreaction β€” fix the gap, don't change frameworks.
Domain 1 Β· Governance & Risk

A marketing team plans to deploy an AI tool that profiles customers using sensitive personal data to predict health-related interests. What governance step is MOST important before deployment?

  • AA Data Protection Impact Assessment (DPIA) to evaluate privacy risk and necessity.
  • BA penetration test of the hosting environment.
  • CA marketing A/B test to confirm the tool increases conversions.
  • DA press release describing the new capability.
Answer: A. High-risk processing of sensitive personal data for profiling is a textbook trigger for a DPIA, which assesses necessity, proportionality, and risks to data subjects before processing begins. B is a security control but doesn't address the privacy/necessity question. C measures business value, not risk or lawfulness. D is irrelevant to governance.
Domain 1 Β· Governance & Risk

During a governance review, an auditor finds the AI policy prohibits "unacceptable bias" but defines no metrics, thresholds, or owner for measuring fairness. How should the auditor characterize this?

  • AAcceptable β€” having any policy statement is sufficient.
  • BA minor wording issue with no control impact.
  • CA control design weakness β€” the policy is not operable because it lacks measurable criteria and accountability.
  • DOut of scope, because fairness is a Domain 2 operational concern.
Answer: C. A policy that can't be measured or enforced is a design weakness: without metrics, thresholds, and an owner, "unacceptable bias" can't be tested or governed. A and B understate the impact β€” unmeasurable policy is effectively no control. D is wrong: policy and fairness criteria are governance (Domain 1), even if the testing happens operationally.
Domain 1 Β· Governance & Risk

A bank's AI risk register lists model-performance risks but omits third-party and supply-chain risk for the foundation model it licenses from a vendor. What is the auditor's BEST recommendation?

  • ANo action β€” vendor risk is the vendor's responsibility, not the bank's.
  • BReplace the vendor model with an in-house model immediately.
  • CRemove model-performance risks since the vendor handles performance.
  • DExpand the risk register and due-diligence process to cover third-party/foundation-model risks, since the bank retains accountability for outcomes.
Answer: D. Outsourcing the model does not outsource accountability β€” the bank still owns the outcomes and must assess vendor/supply-chain risk (data provenance, updates, security, exit). A wrongly transfers accountability. B is a drastic management decision, not an audit recommendation, and ignores risk-based analysis. C removes a legitimate risk and worsens coverage.
Domain 1 Β· Governance & Risk

A generative AI chatbot is deployed to answer customer questions, but users are not told they are interacting with an AI system. Which concern is MOST directly raised?

  • AModel drift.
  • BTransparency / disclosure obligations toward affected individuals.
  • CInsufficient compute capacity.
  • DLack of a rollback plan.
Answer: B. Letting people interact with AI without disclosure is a transparency failure; frameworks and the EU AI Act's limited-risk tier require informing users they're dealing with an AI. A (drift) and D (rollback) are operational matters not raised by the disclosure gap. C is a capacity issue unrelated to the ethical/transparency concern described.
Domain 1 Β· Governance & Risk

An organization claims its AI governance is mature because it has an AI ethics committee. The auditor wants evidence the committee is effective. Which evidence is MOST persuasive?

  • AThe committee's charter and member list.
  • BA slide deck announcing the committee's creation.
  • CMinutes showing the committee reviewed specific use cases and made documented decisions that changed outcomes.
  • DAn email from the CEO endorsing the committee.
Answer: C. Operating effectiveness is shown by evidence the control actually functioned β€” decisions made, use cases reviewed, outcomes influenced. A charter (A) shows design, not operation; an announcement deck (B) and a CEO email (D) show intent or tone, not that the committee did its job. Auditors weight evidence of actual operation most heavily.
Domain 2 Β· AI Operations

A fraud-detection model that performed well at launch is now flagging far fewer transactions, and fraud losses are rising. Production input data patterns have shifted since training. What is the MOST likely cause?

  • AOverfitting during training.
  • BModel/data drift β€” the live data distribution has diverged from the training data.
  • CA prompt-injection attack.
  • DInsufficient training-time hyperparameter tuning.
Answer: B. Performance degrading over time as real-world data shifts away from the training distribution is the definition of drift (data/concept drift), and it's why ongoing monitoring exists. A (overfitting) would have shown at launch, not emerged later. C (prompt injection) applies to LLM/prompt-driven systems, not a tabular fraud classifier reacting to changed data. D is a training issue that wouldn't explain a model that started strong then declined.
Domain 2 Β· AI Operations

An auditor wants to understand a deployed model's intended use, training data, performance across subgroups, and known limitations in one document. Which artifact should they request?

  • AThe data dictionary.
  • BThe model card.
  • CThe network architecture diagram.
  • DThe incident response runbook.
Answer: B. A model card is the standard document summarizing intended use, training data, evaluation metrics (including subgroup performance), limitations, and ethical considerations β€” exactly what's asked. A documents fields/data, not model behavior. C shows infrastructure, not model purpose/performance. D is for handling incidents, not describing the model.
Domain 2 Β· AI Operations

An LLM-based assistant retrieves snippets from a customer-facing knowledge base and includes them in its prompt. An attacker plants text in a public document that instructs the model to ignore its rules and reveal internal data. What threat is this?

  • AData poisoning of the training set.
  • BModel inversion.
  • CConcept drift.
  • D(Indirect) prompt injection.
Answer: D. Malicious instructions smuggled into content the model ingests at inference time is prompt injection (indirect, via retrieved content). A (data poisoning) corrupts the training data, not runtime input. B (model inversion) reconstructs training data from outputs. C (drift) is a performance-degradation phenomenon, not an attack. Controls: input/output filtering, separating instructions from data, least-privilege on what the model can access.
Domain 2 Β· AI Operations

A medical-triage classifier reports 98% accuracy, and the team calls it excellent. The condition it screens for occurs in about 2% of patients. What is the auditor's BEST concern?

  • A98% accuracy proves the model is safe to deploy.
  • BAccuracy should have been reported as an F1 of 98%.
  • CWith a rare positive class, accuracy is misleading; recall/precision on the positive class matter far more.
  • DThe model needs a larger learning rate.
Answer: C. On an imbalanced problem, a model that always predicts "negative" scores 98% accuracy while catching zero cases β€” so accuracy is the wrong headline. Recall (catching true positives) and precision matter, often summarized by F1 or AUC. A trusts the misleading metric. B confuses metrics β€” F1 and accuracy aren't interchangeable. D is an unrelated training tweak.
Domain 2 Β· AI Operations

A data scientist pushed an updated model straight to production over the weekend without an approval or a record of what changed, because "it scored better offline." What control weakness is MOST evident?

  • AInadequate change management for models β€” missing approval, versioning, and documentation.
  • BInsufficient training data.
  • CLack of model explainability.
  • DExcessive human oversight.
Answer: A. Deploying without approval, version control, or a change record is a change-management failure β€” there's no segregation, no traceability, and no ability to roll back safely. B and C describe different issues not evidenced here. D is backwards: the problem is too little oversight and process, not too much.
Domain 2 Β· AI Operations

A model is trained on data scraped from a source an attacker can edit. The attacker inserts mislabeled examples so the model learns a hidden, harmful behavior. What is this attack called?

  • APrompt injection.
  • BMembership inference.
  • CData poisoning.
  • DDenial of service.
Answer: C. Corrupting the training data so the model learns malicious or degraded behavior is data poisoning. A acts at inference time on prompts, not training. B (membership inference) tries to determine whether a record was in the training set. D is an availability attack. Controls: data provenance/integrity checks, curation, anomaly detection on training data.
Domain 2 Β· AI Operations

An auditor reviews the monitoring setup for a deployed recommendation model and finds dashboards for infrastructure uptime and latency only. What is the MOST important gap?

  • ANothing β€” uptime and latency fully cover model health.
  • BThere is no monitoring of model quality β€” prediction accuracy, drift, and data-quality signals.
  • CThe dashboards refresh too frequently.
  • DThe model lacks a public model card.
Answer: B. Infrastructure metrics show the service is up, not that the model is still correct. Effective AI monitoring tracks prediction quality, input/output drift, and data-quality issues so degradation is caught early. A is false β€” a fast, low-latency model can still be quietly wrong. C is trivial. D is a documentation gap, not the monitoring gap described.
Domain 2 Β· AI Operations

A production model starts producing clearly harmful outputs to customers. The team has no defined procedure for who decides to take it offline or how to communicate. What should the auditor recommend FIRST?

  • AEstablish an AI incident-response process with defined roles, escalation, containment (including a kill switch), and communication.
  • BRetrain the model on more data.
  • CAdd more GPUs to improve performance.
  • DPublish a model card.
Answer: A. The gap is the absence of an incident-response capability β€” defined roles, escalation, containment/rollback, and communication so harm can be stopped quickly. B may fix this instance but not the missing process. C is irrelevant to harmful outputs. D is useful documentation but doesn't address responding to live incidents.
Domain 2 Β· AI Operations

An auditor examines an image classifier's training data and finds it was labeled by a single annotator with no review, and labeling guidelines were never written down. What risk is MOST directly created?

  • ALabel quality risk β€” inconsistent or biased labels propagate into model behavior with no way to verify correctness.
  • BNetwork latency risk.
  • COverspending on compute.
  • DExcessive model explainability.
Answer: A. A model is only as good as its labels: a single unreviewed annotator with no documented guidelines means inconsistent, potentially biased ground truth that the model will learn β€” and it can't be independently verified. B and C are infrastructure/cost concerns not raised here. D isn't a risk and isn't evidenced.
Domain 2 Β· AI Operations

A team reports their model achieves 99% accuracy on the test set, but performance in production is much worse. The test set was created by sampling from the same cleaned file used for training. What is the MOST likely problem?

  • AThe production hardware is slower.
  • BData leakage / non-representative test data β€” the test set overlaps with or is too similar to training data, inflating offline scores.
  • CThe model is underfitting.
  • DThe model is too explainable.
Answer: B. Drawing the test set from the same cleaned file risks leakage and a non-representative evaluation, so offline metrics overstate real-world performance β€” exactly the gap seen in production. A wouldn't cause an accuracy gap. C (underfitting) would show poor scores everywhere, not 99% offline. D is not a performance cause.
Domain 2 Β· AI Operations

In reviewing an MLOps pipeline, an auditor wants assurance that any production prediction can be traced back to the exact model version and data that produced it. Which capability is MOST important?

  • AAuto-scaling of inference servers.
  • BA faster GPU.
  • CA larger marketing budget.
  • DReproducibility β€” versioning of models, data, and code plus logging that links predictions to the version that made them.
Answer: D. Traceability and reproducibility require versioning model, data, and code and logging that ties each prediction to the artifact that produced it β€” essential for investigations, audits, and rollback. A and B are scaling/performance features unrelated to traceability. C is irrelevant.
Domain 2 Β· AI Operations

A hiring model shows equal overall accuracy for two groups, but its false-negative rate (qualified candidates rejected) is twice as high for one group. The team says the model is fair because accuracy is equal. What is the auditor's BEST position?

  • AAgree β€” equal accuracy is sufficient evidence of fairness.
  • BDisagree β€” equal accuracy can mask disparate error rates; the unequal false-negative rates indicate a fairness concern needing assessment.
  • CDisagree, and demand the model be permanently banned.
  • DAgree, but recommend a faster model.
Answer: B. Fairness has multiple definitions; equal overall accuracy can hide very different error distributions. A doubled false-negative rate for one group is a real, harmful disparity that must be assessed against the chosen fairness criteria. A accepts a misleading metric. C overreaches into a management decision and ignores assessment. D is irrelevant to fairness.
Domain 2 Β· AI Operations

A company deploys a third-party foundation model via API and builds features on top. The vendor silently updates the model, and downstream behavior changes. What control would have MOST helped detect this?

  • AA bigger firewall.
  • BDisabling all logging to reduce noise.
  • CIncreasing the model's temperature setting.
  • DOngoing output monitoring with a regression/benchmark test suite run against the API over time.
Answer: D. When you don't control the model, you must monitor its behavior: a maintained benchmark/regression suite run regularly against the API catches behavioral shifts from silent vendor updates. A addresses network security, not behavior. B removes the very evidence you'd need. C changes randomness and would make behavior less stable, not detectable.
Domain 2 Β· AI Operations

An auditor finds that a model's input-validation layer accepts free-text that is passed directly into a system prompt used to query a database. Which combined risk is MOST relevant?

  • AOnly slower response times.
  • BPrompt injection leading to unauthorized data access β€” untrusted input is mixed with trusted instructions and privileged actions.
  • CReduced model accuracy on the test set.
  • DHigher cloud storage costs.
Answer: B. Passing untrusted free-text straight into a privileged, database-querying prompt is a classic injection path: an attacker can manipulate the instructions and reach data they shouldn't. Controls include separating instructions from user data, input/output filtering, and least-privilege. A, C, and D are performance/cost issues that miss the security exposure.
Domain 3 Β· Audit Techniques

An auditor concludes a model's controls are effective based solely on a verbal assurance from the lead data scientist that "testing is thorough." What is the primary problem with this conclusion?

  • ANothing β€” management inquiry is the strongest form of evidence.
  • BThe evidence is neither sufficient nor appropriate; inquiry alone, without corroboration, doesn't support an effectiveness conclusion.
  • CThe auditor should have asked two data scientists instead of one.
  • DThe conclusion is fine as long as it's documented.
Answer: B. Inquiry is the weakest evidence and must be corroborated with inspection, re-performance, or observation to be sufficient and appropriate for a conclusion. A inverts the evidence hierarchy. C still relies only on inquiry. D documenting weak evidence doesn't make the conclusion supportable.
Domain 3 Β· Audit Techniques

An auditor needs to test whether a model-approval control operated for every release over the past year, where there were only 18 releases. What is the BEST testing approach?

  • AStatistical attribute sampling of 30 items.
  • BTest a single release and extrapolate.
  • CTest the full population of 18 releases, since it is small enough to examine entirely.
  • DRely on the data scientist's summary spreadsheet only.
Answer: C. When the population is small, testing 100% is more efficient and gives complete assurance β€” sampling adds risk for no benefit. A would require sampling more items than exist or is needlessly imprecise. B (sample of one) gives no basis to conclude. D relies on unverified, second-hand evidence rather than the source records.
Domain 3 Β· Audit Techniques

While planning an AI audit, the auditor has limited time and must focus the engagement. Which approach to scoping is MOST appropriate?

  • AUse a risk-based approach: prioritize the highest-risk models and controls (e.g., high-impact, customer-facing, or regulated use cases).
  • BTest every control equally regardless of risk.
  • CAudit whichever systems are easiest to access.
  • DLet the data science team choose what gets audited.
Answer: A. Risk-based scoping directs scarce audit effort to where the consequences of failure are greatest β€” the core of audit planning. B spreads effort thinly and ignores materiality. C optimizes for convenience, not risk. D surrenders auditor judgment and independence to the auditee.
Domain 3 Β· Audit Techniques

An auditor wants to independently verify that a deployed model produces the documented outputs for a set of known inputs. Which technique provides the strongest evidence?

  • AReading the model's documentation.
  • BAsking the developer whether the outputs are correct.
  • CRe-performance: running the auditor's own test cases through the model and comparing to expected results.
  • DReviewing last year's audit report.
Answer: C. Re-performance β€” independently running test cases and comparing actual to expected outputs β€” is direct, auditor-generated evidence and among the strongest available. A (documentation review) and D (prior report) are indirect. B (inquiry) is the weakest and isn't independent. The strength order: re-performance/observation > inspection > inquiry.
Domain 3 Β· Audit Techniques

In drafting the audit report, the auditor must present a finding that a high-risk model lacks bias testing. What makes the finding MOST useful to management?

  • AStating only that "the model is biased."
  • BListing the names of the data scientists responsible.
  • CIncluding the full source code in the report.
  • DPresenting condition, criteria, cause, effect (risk/impact), and a clear, actionable recommendation.
Answer: D. A well-structured finding gives condition, criteria, cause, and effect plus an actionable recommendation so management understands the gap, why it matters, and what to do. A is an unsupported, overstated claim (and bias wasn't even tested). B blames individuals rather than addressing the control. C dumps detail that obscures the message and adds no decision value.
Domain 3 Β· Audit Techniques

An auditor plans to use a data-analytics script to test 100% of model decisions for policy violations. Before relying on the results, what should the auditor do?

  • ANothing β€” analytics output is always reliable.
  • BHave the auditee write and run the script for them.
  • CValidate the completeness and accuracy of the input data and confirm the analytics logic is correct.
  • DReduce the test to a sample of 10 decisions to save time.
Answer: C. Analytics conclusions are only as trustworthy as the data and logic behind them, so the auditor must verify the source data's completeness/accuracy and that the script does what it's intended to. A assumes reliability without basis. B compromises independence and reliability. D discards the advantage of full-population testing for no good reason.
Domain 1 Β· AI Governance & Risk

An auditor learns that several business units have independently subscribed to generative-AI tools, and no one can produce a complete list of the AI systems in use across the company. What should the auditor recommend FIRST?

  • AEstablish and maintain a central AI inventory/registry capturing each system's owner, purpose, data sources, and risk tier.
  • BImmediately block all generative-AI tools at the network firewall.
  • CRun a bias test on each tool the units have mentioned.
  • DConclude that AI risk is well managed because business units are adopting innovation.
Answer: A. You cannot risk-assess, monitor, or govern what is not catalogued; the absence of an inventory signals shadow AI, so building the registry is the foundational first step. B is a blanket management action that ignores risk-based proportionality and isn't the auditor's to take. C is premature β€” you can't test systems you haven't even identified. D mistakes uncontrolled adoption for governance.
Domain 1 Β· AI Governance & Risk

A retailer plans to repurpose customer purchase data, originally collected to fulfil orders, to train a new product-recommendation model. The data team argues, "It's already our data." What is the auditor's PRIMARY concern?

  • AThe storage cost of retaining the historical purchase data.
  • BWhether the recommendation model will be accurate enough.
  • CPurpose limitation and lawful basis β€” data collected for one purpose generally can't be repurposed for training without a valid basis and a DPIA.
  • DWhether the data is stored in an encrypted database.
Answer: C. Possessing data is not the same as having the right to use it for any purpose; secondary use for training triggers purpose-limitation and lawful-basis requirements, plus a DPIA for high-risk processing. A is a cost issue, not the legal root concern. B addresses utility, not lawfulness. D is a security control that doesn't cure the repurposing problem.
Domain 1 Β· AI Governance & Risk

An insurer adopts a third-party generative-AI tool to draft claims correspondence. Management says, "The vendor is SOC 2 certified and contractually responsible for compliance, so no internal review is needed." What should the auditor recommend?

  • AAccept management's position; the SOC 2 report transfers accountability to the vendor.
  • BBan the tool outright because vendor AI can never be controlled.
  • CPerform an internal vendor risk assessment and establish internal controls (output review, data-use limits, inventory entry, named accountability).
  • DRely solely on the contract's indemnity clause as the mitigating control.
Answer: C. Outsourcing the model does not outsource accountability β€” the insurer still owns its customer outcomes and regulatory obligations, and a generic SOC 2 attestation rarely covers AI-specific risks like hallucination or bias. A wrongly treats an attestation as a transfer of accountability. B is a disproportionate management decision. D leans on a legal clause instead of operating controls.
Domain 1 Β· AI Governance & Risk

During a governance review the auditor finds that the data-science team both builds models and performs the only validation before deployment. Management calls this efficient because "the builders understand the model best." What is the MOST significant concern?

  • AThe models may take longer to deploy than necessary.
  • BThe team may not be writing enough unit tests.
  • CValidation is happening too early in the lifecycle.
  • DThere is no independent challenge β€” the first line is validating its own work, collapsing the three lines of defense.
Answer: D. Independent validation (a second-line MRM function or qualified independent reviewer) provides objectivity that builder familiarity cannot; self-validation defeats segregation of duties. A is a minor efficiency point, not the governance issue. B addresses technical quality but misses the independence root cause. C misstates the problem β€” timing isn't the issue, independence is.
Domain 1 Β· AI Governance & Risk

A multinational deploying an AI hiring tool in the EU tells the auditor it has adopted the NIST AI RMF and is "therefore compliant." What should the auditor point out?

  • ANIST AI RMF adoption is sufficient evidence of legal compliance worldwide.
  • BNIST AI RMF is a voluntary framework, not a legal compliance vehicle; an EU hiring tool is high-risk under the EU AI Act and also engages GDPR, so a regulatory mapping is needed.
  • CThe organization should drop NIST and rely only on ISO/IEC 42001.
  • DHiring tools are minimal risk, so no further action is required.
Answer: B. Voluntary frameworks are good practice but don't satisfy binding law; EU recruitment AI is high-risk under the AI Act and also subject to GDPR, so the auditor recommends mapping all applicable regimes. A confuses a framework with legal compliance β€” the exact trap. C is an overreaction; frameworks complement rather than replace one another. D misclassifies the tier β€” hiring AI is high-risk, not minimal.
Domain 1 Β· AI Governance & Risk

An auditor reviews an AI risk register and finds many controls listed, but no record showing who accepted the residual risk for a high-impact lending model after controls were applied. How should this be characterized?

  • AAcceptable, because controls reduce risk and that is enough.
  • BA finding β€” residual risk must be formally accepted by an accountable owner within the organization's risk appetite.
  • CAcceptable, because inherent risk is what matters, not residual risk.
  • DOut of scope, because risk acceptance is a Domain 3 audit-technique topic.
Answer: B. Even after mitigation, leftover (residual) risk doesn't disappear; it must be consciously accepted by a named, accountable owner within appetite β€” risk accepted by no one is a governance gap. A ignores the accountability requirement. C inverts the concepts: residual risk is precisely what remains to be owned. D wrongly relocates a Domain 1 governance concept.
Domain 1 Β· AI Governance & Risk

A team proposes using a machine-learning model to automate a fully deterministic regulatory calculation that must be perfectly explainable by law. What is the auditor's BEST observation?

  • AThe model should be approved because AI improves efficiency everywhere.
  • BThe model is fine as long as it reaches high accuracy on a test set.
  • CAI may be a poor fit β€” a deterministic, legally explainable problem is better served by rules; using opaque ML introduces unnecessary explainability and compliance risk.
  • DExplainability is irrelevant because the calculation is automated.
Answer: C. When a problem is fully deterministic and decisions must be perfectly explainable, rule-based logic is appropriate; introducing an opaque model adds risk for no benefit β€” a classic "AI chosen for hype, not fit" red flag. A is uncritical hype. B trusts accuracy while ignoring the legal explainability requirement. D dismisses an explicit legal constraint.
Domain 1 Β· AI Governance & Risk

An organization's AI policy lists "fairness" and "transparency" as principles but contains no approval gates, no defined roles, and no prohibited uses. How should the auditor evaluate the policy?

  • AAdequate β€” stating principles is the core purpose of a policy.
  • BA design weakness β€” without acceptable/prohibited uses, approval gates, and roles, the policy is aspirational shelfware that cannot be operated or enforced.
  • CAcceptable, because principles automatically imply the necessary controls.
  • DA reason to delete the policy and start using AI without one.
Answer: B. A foundational AI policy must define acceptable and prohibited use, approval gates, roles, and committed principles; principles alone give nothing to enforce or test. A and C overstate what a list of values achieves. D is absurd β€” the fix is to strengthen the policy, not abandon it.
Domain 1 Β· AI Governance & Risk

A company wants to certify a management system for governing AI across its lifecycle, with defined scope, objectives, internal audits, and management review β€” comparable to how ISO 27001 works for security. Which standard is the most appropriate primary reference?

  • AISO/IEC 23894, AI risk management guidance.
  • BISO/IEC 42001, the AI management system (AIMS) standard.
  • CThe EU AI Act.
  • DThe OECD AI Principles.
Answer: B. ISO/IEC 42001 is the certifiable AI management-system standard with the management-system structure (scope, policy, objectives, internal audit, management review) the scenario describes. A (23894) is risk-management guidance, not a certifiable management system. C is binding law, not a certification scheme. D is a set of high-level principles, not a management system.
Domain 1 Β· AI Governance & Risk

An auditor wants to distinguish ISO/IEC 23894 from ISO/IEC 42001 for a report. Which statement is correct?

  • ABoth are binding regulations enforced by data-protection authorities.
  • B23894 is a certifiable management system and 42001 is informal guidance.
  • C42001 is the certifiable AI management-system standard, while 23894 provides AI risk-management guidance aligning ISO 31000 principles to AI.
  • DThey are identical and the numbers are interchangeable.
Answer: C. 42001 establishes a certifiable AI management system; 23894 is guidance that brings ISO 31000 risk principles to AI risk. A is wrong β€” ISO standards are voluntary, not regulations. B reverses the two. D is false; they serve distinct purposes.
Domain 1 Β· AI Governance & Risk

An auditor maps a company's AI program to the NIST AI RMF and must place "tracking fairness metrics and robustness test results on monitoring dashboards." Which function does this primarily belong to?

  • AGOVERN.
  • BMAP.
  • CMEASURE.
  • DMANAGE.
Answer: C. MEASURE analyzes, tests, and tracks risks β€” bias, robustness, and performance β€” exactly what fairness metrics and test dashboards represent. GOVERN is cross-cutting culture, policy, and accountability. MAP establishes context and identifies risks tied to the use case. MANAGE prioritizes and acts on risks (treatment, response, recovery).
Domain 1 Β· AI Governance & Risk

An organization treats AI governance as a single approval gate at launch, with no further checkpoints. An auditor reviewing the governance design sees what weakness?

  • ANone β€” a launch gate is the only governance point that matters.
  • BGovernance should span the full lifecycle (ideation β†’ design β†’ development β†’ validation β†’ deployment β†’ monitoring β†’ retirement) with checkpoints at each stage, not a one-time gate.
  • CThe launch gate should be removed to speed up delivery.
  • DGovernance is only needed after an incident occurs.
Answer: B. AI risk evolves after launch (drift, misuse, changing context), so governance must include lifecycle checkpoints through monitoring and retirement, ideally extending existing enterprise governance. A and D understate governance to a single point or to reactive cleanup. C removes a needed control entirely.
Domain 1 Β· AI Governance & Risk

A hospital deploys a clinical-decision-support model and instructs clinicians to "always follow the model's recommendation to save time." Over months, clinicians stop questioning outputs even when they look wrong. Which AI-specific risk is MOST directly created?

  • AData poisoning.
  • BModel drift.
  • CAutomation bias β€” humans over-trust the system and stop challenging it, hollowing out the human-oversight control.
  • DIntellectual-property infringement.
Answer: C. Instructing humans to defer unconditionally produces automation bias, which silently defeats the "human-in-the-loop" control that high-impact decisions rely on. A is a training-data attack, not described here. B is performance decay over time, a different phenomenon. D concerns content rights, irrelevant to the oversight failure shown.
Domain 1 Β· AI Governance & Risk

An auditor must determine the EU AI Act classification of a system that performs real-time social scoring of citizens by a public authority. What is the correct classification and implication?

  • AHigh risk β€” permitted with conformity assessment and human oversight.
  • BLimited risk β€” only a transparency notice is required.
  • CUnacceptable risk β€” social scoring of this kind is prohibited.
  • DMinimal risk β€” largely unregulated, voluntary codes only.
Answer: C. The EU AI Act places social scoring (and manipulative/exploitative systems) in the unacceptable-risk tier, which is prohibited rather than merely regulated. A wrongly treats a banned use as a permitted high-risk one. B and D drastically understate the tier; transparency notices and voluntary codes apply to far lower-risk systems.
Domain 1 Β· AI Governance & Risk

A marketing team wants to deploy AI that profiles individuals using sensitive personal data to infer health-related interests. What governance artifact is MOST important to produce before processing begins?

  • AA model card describing intended use.
  • BA press release announcing the capability.
  • CA Data Protection Impact Assessment (DPIA) evaluating necessity, proportionality, and risk to data subjects.
  • DA penetration test of the hosting infrastructure.
Answer: C. High-risk processing of sensitive data for profiling is a textbook DPIA trigger; the DPIA assesses necessity, proportionality, and risk before processing starts. A documents the model but doesn't address privacy lawfulness. B is irrelevant to governance. D is a security control that doesn't evaluate the privacy/necessity question.
Domain 1 Β· AI Governance & Risk

An auditor reviewing training-data governance cannot find any documentation of where the data originated or how it was transformed before reaching the model. Which combined control gap is MOST relevant?

  • AMissing data provenance and lineage β€” without them the organization cannot prove the data was lawfully obtained, suitable, or uncontaminated.
  • BInsufficient GPU capacity for training.
  • CToo much model explainability.
  • DExcessive human oversight of the model.
Answer: A. Provenance (where data came from) and lineage (its journey through transformations) are needed to demonstrate lawful, suitable, contamination-free data; their absence is a core data-governance gap. B is an infrastructure concern not raised here. C is not a risk and isn't evidenced. D is backwards β€” oversight isn't the problem.
Domain 1 Β· AI Governance & Risk

An organization weighing whether to build an in-house model or license a foundation model from a vendor asks the auditor what governance factor matters most in the build-vs-buy decision. What is the BEST answer?

  • ABuying always transfers accountability to the vendor, so buying is governance-superior.
  • BWhichever option has the lowest license cost is the right governance choice.
  • CEither way the deploying organization retains accountability, so buying requires vendor due diligence, contractual audit rights, model documentation, and an exit/continuity plan.
  • DBuilding in-house removes all third-party and supply-chain risk permanently.
Answer: C. Buying or building never shifts accountability for outcomes away from the deployer; a buy decision therefore demands due diligence, audit rights, documentation, and an exit plan, while inheriting the vendor's data, bias, and security posture. A and D state false absolutes about transferring or eliminating risk. B reduces a governance decision to price alone.
Domain 1 Β· AI Governance & Risk

An AI initiative was launched by a single department with no link to enterprise strategy, no documented business objective, no success metric, and no fallback. How should the auditor characterize this?

  • AA model of agile innovation that should be replicated company-wide.
  • BShadow AI lacking strategic alignment β€” a finding, since every initiative should trace to a business objective, measurable benefit, owner, and appropriate sign-off.
  • CAcceptable, because business value is self-evident once AI is involved.
  • DOnly a Domain 2 operational concern, not relevant to governance.
Answer: B. AI deployed outside strategy and governance, with no objective, metric, or fallback, is shadow AI and a classic strategic-alignment finding. A and C romanticize uncontrolled adoption and assume value without evidence. D wrongly relocates a Domain 1 strategic-alignment issue to operations.
Domain 1 Β· AI Governance & Risk

An organization claims its AI governance is mature because it has an AI ethics committee. The auditor wants evidence the committee is operating effectively. Which evidence is MOST persuasive?

  • AThe committee's charter and member roster.
  • BA launch announcement slide deck.
  • CAn email from the CEO endorsing the committee.
  • DMinutes showing the committee reviewed specific high-risk use cases and made documented decisions that changed outcomes.
Answer: D. Operating effectiveness is shown by evidence the control actually functioned β€” use cases reviewed and decisions that influenced outcomes. A demonstrates design, not operation. B and C show intent or tone at the top, not that the committee did its job. Auditors weight evidence of actual operation most heavily.
Domain 1 Β· AI Governance & Risk

Management asks the CAE to have internal audit design and own the AI model-approval workflow and decide which AI use cases the company pursues. What is the BEST response to preserve independence?

  • ADecline to design or own the workflow, since that creates a self-review threat; offer advisory input and later provide independent assurance over the controls management establishes.
  • BAccept fully, because audit involvement guarantees the controls will be strong.
  • CAccept, but have a different auditor sign the eventual assurance report.
  • DRefuse any contact with AI governance whatsoever.
Answer: A. As the third line, internal audit must not own or build a control it will later assess; advising is fine, but designing and owning the workflow creates a self-review impairment. B and C don't cure the impairment β€” rotating the signature doesn't undo ownership of the design. D over-corrects, since advisory input is appropriate; only ownership is the problem.
Domain 2 Β· AI Operations

A team builds a churn model by first scaling all features (computing the mean and standard deviation) across the entire dataset, and only then splitting into train and test sets. Offline accuracy is excellent but production results lag. What is the MOST likely flaw?

  • AThe model is underfitting because scaling removed useful signal.
  • BConcept drift between training and deployment.
  • CThe learning rate is set too high.
  • DData leakage from preprocessing β€” fitting the scaler on the full dataset lets test-set statistics influence training.
Answer: D. Computing scaling parameters before the split lets information from the test set bleed into training β€” a subtle preprocessing leak that inflates offline scores and collapses in production. The scaler must be fit on the training split only and then applied to test/production. A and C are unrelated training symptoms not evidenced. B would not be caused by the preprocessing order described.
Domain 2 Β· AI Operations

An auditor reviews a facial-analysis model trained almost entirely on images of one demographic. It scores well on the held-out test set, which was drawn from the same source. What is the MOST significant concern about the evaluation?

  • AThe test set was too large.
  • BThe model used too few hyperparameters.
  • CThe test set is not representative of the deployment population, so strong metrics may not generalize to underrepresented groups.
  • DThe model card was published too early.
Answer: C. A test set that mirrors a biased training source inherits the same sampling bias, so high scores say little about real-world groups the data underrepresents. Representative evaluation across the deployment population is essential. A is not a defect. B is unrelated to representativeness. D is a documentation timing issue, not the evaluation flaw described.
Domain 2 Β· AI Operations

A document the auditor wants describes a dataset's motivation, how it was collected, who is represented in it, and its recommended and discouraged uses. Which artifact is this?

  • AA model card.
  • BA datasheet for the dataset.
  • CA confusion matrix.
  • DA service-level agreement.
Answer: B. A datasheet for datasets documents a dataset's motivation, composition, collection process, and recommended uses β€” the data-side counterpart to a model card. A model card (A) describes the model, not the dataset's provenance. C is a per-classifier evaluation grid. D is a vendor performance contract, unrelated.
Domain 2 Β· AI Operations

An auditor finds that a single engineer trains models, approves them, and deploys them straight to production with no second party involved. What control principle is MOST clearly violated?

  • ALeast privilege on the inference API.
  • BData minimization.
  • CSegregation of duties β€” the same person should not build, approve, and promote a model.
  • DDifferential privacy.
Answer: C. Letting one person build, approve, and deploy collapses segregation of duties: there is no independent check before a model reaches production, enabling error or abuse to go unnoticed. A concerns runtime permissions, not the approval flow. B and D are privacy techniques unrelated to the promotion-authority gap described.
Domain 2 Β· AI Operations

Before fully replacing a deployed model, a team routes a small percentage of live traffic to the new version and watches its real metrics, ready to ramp up only if they hold. Which deployment pattern is this?

  • ACanary deployment.
  • BShadow deployment.
  • CBig-bang cutover.
  • DData poisoning.
Answer: A. Serving a small slice of real traffic first and ramping up only if metrics hold is a canary deployment. Shadow (B) runs the new model on real traffic but does not use its outputs, only compares them β€” no live slice is served. C switches everything at once with no gradual exposure. D is an attack on training data, not a deployment strategy.
Domain 2 Β· AI Operations

A new model runs on full production traffic, but its predictions are only logged and compared to the live model β€” never shown to users or acted upon. What is the PRIMARY benefit of this approach?

  • AIt evaluates the new model on real traffic with no user-facing risk, since its outputs are not used.
  • BIt eliminates the need for a test set.
  • CIt guarantees the model can never drift.
  • DIt removes the need for a rollback plan.
Answer: A. This is shadow deployment: the candidate sees real production traffic so you get realistic behavior, but because its outputs are never used, there is no customer-facing risk. B is false β€” shadow testing complements, not replaces, offline evaluation. C is false; any deployed model can drift. D is false; you still need rollback once the model goes live for real.
Domain 2 Β· AI Operations

An LLM customer-service feature behaves differently this week. Investigation shows the only change was an edit to the system prompt, made directly in production with no review, test, or record. How should the auditor classify this?

  • ANot a change β€” prompts are configuration, not code, so change management does not apply.
  • BAn uncontrolled change β€” prompt/system-message edits alter behavior and must go through change management with testing, approval, and rollback.
  • CA data-quality issue in the training set.
  • DConcept drift caused by changing user behavior.
Answer: B. A prompt or system-message edit changes the model's behavior just as code does, so it is a production change requiring testing, approval, versioning, and rollback. A is the exact trap the exam warns against β€” prompts are changes. C and D describe unrelated phenomena; here the cause is a known, uncontrolled edit, not training data or shifting relationships.
Domain 2 Β· AI Operations

A team retrains its model whenever someone "feels it is getting stale," with no defined criteria. The auditor wants retraining to be controlled rather than ad-hoc. What is the BEST recommendation?

  • AStop retraining entirely to keep the model stable.
  • BRetrain continuously on every new record in real time.
  • CLet each engineer decide independently when to retrain.
  • DDefine explicit retraining triggers (scheduled, drift-based, or event-based) tied to monitored thresholds and an approval gate.
Answer: D. Controlled retraining means defined triggers β€” scheduled, drift-based, or event-based β€” linked to monitored thresholds and an approval gate, replacing gut feeling with auditable criteria. A abandons a legitimate defense against drift. B is operationally reckless and unverifiable. C recreates the ad-hoc, ungoverned problem under a different name.
Domain 2 Β· AI Operations

During an incident, a team wants to revert a misbehaving model to the previous known-good version, but they cannot, because old versions were overwritten and never stored. What earlier control failure MOST directly caused this?

  • ALack of adversarial testing.
  • BNo model versioning/registry, which removes rollback capability needed for containment.
  • CToo much human-in-the-loop oversight.
  • DExcessive logging slowing the system.
Answer: B. Without a model registry that retains prior versions, there is no known-good artifact to roll back to, so containment collapses β€” weak versioning directly cripples incident response. A concerns testing robustness, not rollback. C is unrelated and backwards. D is a performance concern, not the cause of the missing rollback path.
Domain 2 Β· AI Operations

An auditor learns that models are routinely tuned and validated directly against the live production database rather than in separate environments. What is the MOST significant risk?

  • AFaster experimentation, which is purely beneficial.
  • BNo environment promotion (dev β†’ test β†’ prod), risking untested changes affecting live decisions and data integrity.
  • CThe model card will become outdated.
  • DIncreased cloud storage costs only.
Answer: B. Working directly in production bypasses environment promotion, so untested or experimental changes can affect live decisions and corrupt production data with no isolation. A wrongly treats a control failure as a pure benefit. C is a minor documentation effect, not the core risk. D understates the impact to a cost line item.
Domain 2 Β· AI Operations

A recommendation model's input feature distributions have shifted noticeably β€” a new customer segment now dominates traffic β€” but the relationship between features and the correct outcome appears unchanged. Which phenomenon is this?

  • AData drift (covariate shift) β€” the input distribution moved while the target rule stayed the same.
  • BConcept drift β€” the input-to-output relationship changed.
  • CData leakage.
  • DModel extraction.
Answer: A. Inputs moving while the underlying rule holds is data drift (covariate shift); you detect it by comparing live input distributions against the training baseline. Concept drift (B) is when the input-to-outcome relationship itself changes β€” not the case here. C is a training/evaluation contamination fault. D is an attack that clones a model, unrelated to distribution shift.
Domain 2 Β· AI Operations

A predictive-policing model directs patrols to areas it flags as high-risk; those areas then generate more recorded incidents, which feed back into the next training cycle and reinforce the same flags. What dynamic is this?

  • AAdversarial robustness.
  • BHyperparameter tuning.
  • CA feedback loop β€” the model's own outputs become future training data and amplify its existing bias.
  • DDifferential privacy.
Answer: C. When a model's outputs shape the data it later learns from, a self-reinforcing feedback loop forms that amplifies existing bias rather than correcting it. A and B are testing/training concepts, not this self-reinforcement. D is a privacy protection technique. The control is to monitor for and break the loop, e.g., by using independent outcome data.
Domain 2 Β· AI Operations

A bank states its AI loan decisions are "human-reviewed," but logs show reviewers approve 99.8% of recommendations in about three seconds each. How should the auditor characterize the oversight control?

  • AEffective β€” a human is in the loop, which satisfies the requirement.
  • BOperating in form but not substance β€” the pattern indicates rubber-stamping, so there is no meaningful human oversight.
  • CIrrelevant, because human oversight is never required for AI.
  • DEffective, provided the model has high accuracy.
Answer: B. Near-100% approvals in seconds show the control exists on paper but is not genuinely exercised β€” classic rubber-stamping, so oversight is ineffective in substance. A and D mistake form for substance; a fast model does not make human review real. C is wrong: meaningful human oversight is exactly what high-stakes decisions require. Recommend giving reviewers time, explanations, authority, and override-rate monitoring.
Domain 2 Β· AI Operations

An auditor wants an early-warning indicator that a deployed model is becoming risky before customers are harmed. Which metric is MOST appropriate to track as a key risk indicator (KRI)?

  • AThe marketing campaign's click-through rate.
  • BThe number of GPUs in the cluster.
  • CThe size of the source code repository.
  • DDrift scores and subgroup error rates trending against defined thresholds.
Answer: D. KRIs give early warning of emerging risk β€” drift scores, subgroup error rates, override frequency, and complaint volumes trending toward thresholds let teams act before harm occurs. A is a business KPI, not a risk indicator. B and C are infrastructure or code-size facts with no bearing on the model's risk posture.
Domain 2 Β· AI Operations

A spam classifier is tuned so that almost nothing is ever sent to the spam folder, ensuring legitimate mail is never wrongly blocked. Which metric has been prioritized, and what is the trade-off?

  • ARecall was prioritized, at the cost of precision.
  • BAUC was prioritized, eliminating all errors.
  • CPrecision was prioritized to avoid false positives, at the cost of recall (more spam slips through).
  • DAccuracy was prioritized, which removes the need for any other metric.
Answer: C. Minimizing wrongly blocked legitimate mail means minimizing false positives, which favors precision; the trade-off is lower recall, so more spam reaches the inbox. A reverses the trade-off. B is wrong β€” AUC summarizes ranking across thresholds and does not eliminate errors. D ignores that on imbalanced or cost-asymmetric problems accuracy alone is insufficient.
Domain 2 Β· AI Operations

An auditor wants to confirm a model's individual predictions can be attributed to specific input features so reviewers can sanity-check the reasoning. Which technique is MOST relevant?

  • ACanary deployment.
  • BDifferential privacy.
  • CExplainability methods such as SHAP or LIME that attribute a prediction to its input features.
  • DBlue/green deployment.
Answer: C. SHAP (Shapley-value attributions) and LIME (local surrogate explanations) attribute a prediction to its features so reviewers can check the model is not relying on spurious signals. A and D are deployment patterns, not explainability tools. B protects training-data privacy; it does not explain a prediction's drivers.
Domain 2 Β· AI Operations

Before launching a public LLM assistant, a team assembles people to deliberately attempt to make it produce disallowed content, leak its system prompt, and bypass guardrails. What is this activity called?

  • AHyperparameter tuning.
  • BRed-teaming β€” adversarial testing to surface ways the model can be made to misbehave.
  • CCross-validation.
  • DFeature engineering.
Answer: B. Deliberately attacking an LLM to break its guardrails and surface failure modes before launch is red-teaming, a core generative-AI testing technique. A and D are model-building activities, not adversarial evaluation. C estimates generalization by rotating validation folds; it does not probe for safety bypasses or jailbreaks.
Domain 2 Β· AI Operations

An LLM-powered research assistant cites a court case that does not exist and states it with full confidence. What testing technique is MOST directly aimed at catching this class of failure?

  • ALoad testing.
  • BPenetration testing of the host server.
  • CHallucination / groundedness testing β€” checking that outputs stick to provided, verifiable facts.
  • DDatabase index optimization.
Answer: C. A confidently fabricated fact is a hallucination; groundedness testing checks whether outputs are supported by provided or verifiable sources, often combining automated scoring with human review. A measures throughput under stress. B and D address infrastructure security and performance, not the factual reliability of generated content.
Domain 2 Β· AI Operations

A team validates only overall accuracy before deploying a high-stakes hiring model and performs no comparison of outcomes across protected groups. What is the auditor's BEST finding?

  • ANo finding β€” overall accuracy is sufficient for any model.
  • BFairness/bias testing was not performed; for a high-stakes model this is a finding regardless of how accurate it is overall.
  • CThe model should be deleted immediately by the auditor.
  • DThe only issue is that the model needs a faster inference server.
Answer: B. A model can be accurate overall yet discriminatory across subgroups, so for a high-stakes use case the absence of fairness/bias testing is itself a finding. A accepts a single misleading metric. C is a management action the auditor does not take. D substitutes an irrelevant performance concern for the real fairness gap.
Domain 2 Β· AI Operations

Researchers add tiny, almost imperceptible perturbations to images that cause a vision model to confidently misclassify a stop sign as a speed-limit sign. What attack class is this?

  • AAdversarial examples β€” crafted input perturbations that fool the model at inference time.
  • BMembership inference.
  • CData poisoning.
  • DModel extraction.
Answer: A. Small, often imperceptible perturbations crafted to flip a model's prediction at inference time are adversarial examples. B determines whether a record was in the training set. C corrupts training data, not inference inputs. D clones a model via queries. Mitigations include adversarial training, input sanitization, and robustness testing.
Domain 2 Β· AI Operations

By repeatedly querying a model's API and analyzing its responses, an attacker is able to recover information indicating whether a particular individual's record was part of the training data. Which threat is this?

  • APrompt injection.
  • BMembership inference β€” a privacy attack revealing whether a record was in the training set.
  • CConcept drift.
  • DExcessive agency.
Answer: B. Determining whether a specific record was in the training set is membership inference, a privacy breach. A manipulates an LLM's instructions at runtime. C is performance degradation, not an attack. D is an LLM giving too much autonomy/permission. Mitigations include differential privacy, output limiting, and API access controls.
Domain 2 Β· AI Operations

A competitor sends millions of carefully chosen queries to a company's prediction API and uses the input-output pairs to train a near-copy of the model. Which threat is this, and which control MOST directly counters it?

  • AData poisoning; fixed by cleaning the training set.
  • BModel extraction/stealing; countered by rate limiting, query monitoring, authentication, and abuse detection.
  • CModel inversion; countered by adding more GPUs.
  • DPrompt injection; countered by a longer system prompt.
Answer: B. Cloning a model's behavior through high-volume querying is model extraction/stealing; the direct defenses are rate limiting, query/abuse monitoring, authentication, and watermarking. A targets training data, not query-based cloning. C is a different attack (reconstructing training data) and GPUs do not counter it. D is unrelated, and prompt edits do not stop API extraction.
Domain 2 Β· AI Operations

An LLM agent is granted broad permissions: it can read any customer record and issue refunds without scoping or approval. Even with good prompt guardrails, the auditor is concerned. Which OWASP-LLM risk is MOST relevant?

  • AUnbounded consumption.
  • BExcessive agency β€” the model has too much autonomy and too many permissions, magnifying the impact of any manipulation.
  • CMisinformation.
  • DVector and embedding weaknesses.
Answer: B. Giving an LLM agent powerful, unscoped permissions is excessive agency: if it is ever manipulated (e.g., by injection), the blast radius is huge. Prompt guardrails are bypassable, so the architectural fix is least-privilege tool access and human approval for consequential actions. A is resource abuse, C is false output, and D concerns retrieval/embedding security β€” none match the over-permissioned agent here.
Domain 2 Β· AI Operations

A company integrates a pre-trained open-source model and a third-party dataset downloaded from a public repository, with no verification of their origin or integrity. Which OWASP-LLM category does this MOST directly raise?

  • AImproper output handling.
  • BSystem prompt leakage.
  • CSupply-chain vulnerabilities β€” unvetted third-party models/datasets can be compromised or poisoned.
  • DDenial of wallet.
Answer: C. Using unvetted third-party models and datasets is a supply-chain risk: components can be compromised or poisoned upstream. Controls include vetting sources, verifying signatures/integrity, and maintaining a model bill of materials. A concerns how outputs are consumed, B concerns exposing the system prompt, and D is resource/cost abuse β€” none describe the unverified-provenance problem here.
Domain 2 Β· AI Operations

A deployed model has been making biased, harmful decisions for weeks, but nothing in the traditional security operations center ever alarmed. What does this MOST strongly indicate about the AI incident-response program?

  • ANothing β€” if the SOC did not alarm, no incident occurred.
  • BThe model simply needs more compute.
  • CDetection is inadequate for AI incidents; harmful or biased model behavior must be monitored beyond classic security alerts.
  • DThe issue is purely a documentation gap in the model card.
Answer: C. AI incidents often fail silently β€” biased or harmful decisions trigger no traditional breach alarm β€” so detection must include output monitoring, drift/KRI alerts, and complaint channels, not just SOC alerts. A wrongly treats absence of an alert as absence of an incident. B is irrelevant to harmful decisions. D understates a real detection failure as mere documentation.
Domain 2 Β· AI Operations

After containing an AI incident in which a flawed model wrongly denied many loan applications, the auditor reviews the response. Which step is MOST important to address harm already done to affected individuals?

  • AAdd more GPUs so the next model is faster.
  • BDelete the logs so the incident is not visible to regulators.
  • CRemediate the affected decisions β€” re-adjudicate wrongly denied applications and meet disclosure/notification obligations.
  • DTake no further action once the model is rolled back.
Answer: C. Containment stops the bleeding, but harmful outputs already issued must be remediated β€” re-adjudicate the wrongly denied applications and meet disclosure/notification obligations to affected people and regulators. A is irrelevant to harm done. B is unethical and likely unlawful destruction of evidence. D ignores the people already harmed and any reporting duties.
Domain 2 Β· AI Operations

A team repeatedly evaluates candidate models on the same holdout set, picking whichever scores highest, and then reports that top score as the model's expected real-world performance. What is wrong with this practice?

  • ANothing β€” the test set should always be used to choose the best model.
  • BThe holdout set should have been used for training instead.
  • CThe team should have skipped validation entirely.
  • DReusing the test set to select a model contaminates it; selection belongs on validation, and the reported test score is now optimistic.
Answer: D. You tune and select on validation; the test set is a locked holdout used once to give an honest estimate. Picking the best of many models against the same holdout overfits to it, so the reported figure overstates real-world performance. A inverts the rule. B would destroy the holdout's purpose. C is backwards β€” validation is exactly the set that should drive selection.
Domain 3 Β· Audit Tools & Techniques

An auditor is assigned to review three AI systems: a marketing content recommender, an internal meeting-notes summarizer, and a model that automatically declines insurance claims. With limited time, how should the auditor allocate the depth of testing across them?

  • ASpend equal effort on all three so coverage looks balanced in the report.
  • BStart with the summarizer because it is the simplest to understand.
  • CTie depth of testing to each system's risk tier β€” concentrate on the claim-decline model because it is an automated, consequential decision affecting individuals.
  • DLet the business owners tell the auditor which system is most important to review.
Answer: C. Matching audit effort to the risk tier is the essence of risk-based planning: an automated, high-consequence decision warrants deep testing of bias, governance gates, and monitoring, while a marketing recommender warrants a light touch. A spreads scarce effort thinly and ignores materiality. B optimizes for convenience, not risk. D surrenders auditor judgment and independence to the auditee.
Domain 3 Β· Audit Tools & Techniques

During planning, management asks internal audit to "just confirm the recruitment model is ethical." No internal fairness metric, threshold, or protected-attribute list has ever been defined. What should the auditor do?

  • ATreat the absence of agreed criteria as a reportable governance gap, and work with management to establish explicit criteria before testing β€” without inventing the standard the business should own.
  • BPick a fairness metric and threshold on the auditor's own authority and test against it.
  • CProceed and report a subjective opinion on whether the model "feels" ethical.
  • DDecline the engagement because ethics cannot be audited.
Answer: A. There is no objective basis for a finding without explicit, agreed criteria; the missing criteria is itself a governance finding, and criteria should be set with management's agreement. B impairs independence β€” the auditor would be authoring the very control standard the business should own, then auditing it. C produces an opinion, not an auditable conclusion. D overreacts; ethics becomes auditable once concrete criteria (metric, threshold, protected attributes) exist.
Domain 3 Β· Audit Tools & Techniques

An audit team is strong in control testing but has no one who can evaluate a model-robustness or fairness-metric test. The engagement covers a high-risk medical-triage model. What is the BEST course of action?

  • ASkip the robustness and fairness testing since the team cannot perform it.
  • BHave the data-science team that built the model run and interpret the tests for the auditors.
  • CEngage a qualified specialist for those areas while the auditor remains responsible for scoping their work and integrating the conclusions.
  • DAssert the model is acceptable, noting the team lacked the skills to test it.
Answer: C. The auditor need not be a data scientist, but must understand the system well enough to audit it; where skills are lacking, engaging a specialist while retaining responsibility for scope and integration preserves both competency and ownership of the conclusion. A leaves the highest risks untested on a high-risk system. B destroys independence and reliability by relying on the auditee. D is an unsupported conclusion and an audit-quality failure.
Domain 3 Β· Audit Tools & Techniques

An auditor reviewing the pre-deployment approval gate confirms the AI policy requires independent validation and sign-off before any model goes live, and that the workflow has those steps built in. What has the auditor established, and what is still needed?

  • AOperating effectiveness is established; no further testing is needed.
  • BOnly design has been established; the auditor must still test operating effectiveness by confirming actual releases were validated and signed off before go-live.
  • CBoth design and operating effectiveness are established because the policy is documented.
  • DNothing useful β€” policy documents are not audit evidence.
Answer: B. Confirming the control is capable of working if it operates as intended is a test of design; it does not prove the gate actually operated for real releases over the period, which requires testing operating effectiveness against a sample of go-lives. A and C wrongly equate a documented design with proof it operated. D is too strong β€” policy and workflow design are valid design evidence, just not sufficient for an effectiveness conclusion.
Domain 3 Β· Audit Tools & Techniques

An auditor draws a simple random sample of 75 of 80,000 lending decisions to assess whether a model treats a protected subgroup fairly, finds no errors, and prepares to conclude the model is fair. A reviewer objects. What is the flaw?

  • AThe sample size of 75 is too large and wasted effort.
  • BRandom sampling can never be used in an AI audit.
  • CA simple random sample may contain almost no members of the protected subgroup, so it is not representative of the fairness risk; the sample should be stratified by subgroup and time.
  • DThe auditor should have asked the model owner whether the model is fair instead of sampling.
Answer: C. Representativeness is the heart of AI sampling: a sample that omits the very subgroup whose treatment is the risk can hide exactly the bias being investigated, so the sample must be stratified by subgroup (and time/decision type) to mirror the risks. A misreads the issue β€” the problem is design, not size. B is false; random sampling is legitimate when representative. D replaces evidence with the weakest source, inquiry from the auditee.
Domain 3 Β· Audit Tools & Techniques

To corroborate a model's reported validation results, an auditor asks the team to reproduce the result. The team cannot, because the exact model version, data snapshot, and random seed used at validation were never recorded. How should the auditor treat this?

  • AAccept the original validation report at face value, since reproduction is optional.
  • BNote it only as a minor housekeeping observation with no control impact.
  • CRe-run the current production model and treat any matching number as confirmation.
  • DRaise the inability to reproduce as a control finding, because unpinned versions, data, and seeds make results unverifiable and reproducibility itself is evidence.
Answer: D. Reproducibility is itself evidence; if validation results cannot be reproduced because the version, data, or seed was not pinned, the inability is a finding about governance and controls, not a footnote. A accepts unverifiable, self-reported evidence. B understates impact β€” unverifiable validation undermines reliance on the result. C compares against the wrong artifact (the current model is not what was validated) and a coincidental match proves nothing.
Domain 3 Β· Audit Tools & Techniques

An auditor wants to test whether a deployed scoring model still behaves as documented. The auditor independently runs a controlled set of inputs through the production model on a pinned version and compares the outputs to the documented expected results. Which evidence technique is this, and how reliable is it?

  • AReperformance β€” among the strongest evidence, because the auditor generates it directly, though it is valid only for the version and data tested.
  • BInquiry β€” the strongest evidence because it comes straight from the system owner.
  • CObservation β€” reliable because the auditor watched a process occur.
  • DInspection β€” weakest evidence because it only reviews documents.
Answer: A. Independently re-scoring inputs and comparing to expected results is reperformance, the strongest technique because the auditor produces the evidence, with the caveat that it speaks only to the version and data run. B mislabels the act and inverts the hierarchy β€” inquiry is the weakest evidence. C describes watching, not redoing. D both mislabels the act (this is not document review) and misranks inspection.
Domain 3 Β· Audit Tools & Techniques

An auditor must decide which single piece of evidence to rely on most heavily to conclude a fairness threshold was met. The available items are: an interview with the lead data scientist, the model card, the version-control history, and the auditor's own reperformance of the fairness calculation on a pinned snapshot. Which is most reliable?

  • AThe interview, because it gives the most context.
  • BThe model card, because it is an official internal document.
  • CThe version-control history, because it shows who changed what.
  • DThe auditor's reperformance of the fairness calculation, because auditor-generated evidence ranks highest.
Answer: D. The reliability hierarchy puts evidence the auditor produces (reperformance) above independent system logs, above auditee documents, above what someone says. A is inquiry, the weakest, and from the party that owns the model. B is self-reported and must be verified against reality. C is high-reliability for code lineage but proves what changed, not that the fairness threshold was met.
Domain 3 Β· Audit Tools & Techniques

An auditor needs to determine whether an automated pricing model has ever charged a protected group more than others. The decision logs hold 3 million fully digital, complete records. What is the BEST testing approach?

  • ATake a judgmental sample of 50 high-value transactions and extrapolate.
  • BUse CAATs to test the full population, since the data is digital and complete and full-population analysis detects even rare disparities a sample would miss.
  • CRely on the pricing team's internal fairness summary.
  • DTake a simple random sample of 30 and conclude if no disparity appears.
Answer: B. Sampling is a response to populations that cannot economically be tested in full; when data is digital and complete, CAATs/analytics enable full-population testing, which gives stronger, more defensible coverage and surfaces rare disparities. A and D both sample and risk missing rare-but-material bias, exactly the risk in scope. C relies on the unverified, second-hand assertion of the auditee.
Domain 3 Β· Audit Tools & Techniques

An external auditor wants to find anomalies in a client's customer dataset and pastes the data into a public generative-AI chatbot, then reports the items the tool flagged. Which problems are MOST relevant?

  • AOnly that the chatbot might be slow to respond.
  • BBreach of client data confidentiality, plus over-reliance with no explainability or independent verification of the flags.
  • COnly that the tool may cost money to use at scale.
  • DNothing β€” AI tool output can be reported directly as audit findings.
Answer: B. Two serious issues arise: confidential client data should never be sent to an uncontrolled external model, and the auditor cannot explain, reproduce, or verify why items were flagged, so reporting them is over-reliance on an unvalidated tool. A and C raise trivial operational concerns and miss the substance. D is wrong β€” the AI tool assists; the auditor must validate its output, protect the data, and retain accountability for the conclusion.
Domain 3 Β· Audit Tools & Techniques

An auditor assesses the training data behind a churn model and finds the latest customer records are eighteen months old, even though customer behavior and product mix have changed substantially since then. Which data-quality dimension is MOST directly at issue, and why does it matter?

  • AValidity β€” values no longer conform to expected formats.
  • BCompleteness β€” entire fields are missing from the dataset.
  • CTimeliness β€” stale data combined with a shifting world drives drift and degraded decisions.
  • DAccuracy β€” the recorded values do not reflect reality at the time they were captured.
Answer: C. The records may be correctly formatted, complete, and accurate as captured, but they are out of date; timeliness asks whether data is current enough, and stale data in a changed environment is a classic driver of drift and degraded predictions. A concerns format/range conformance, not freshness. B concerns missing records or fields, not described here. D concerns whether values were true when captured, which is not the issue when the data is simply old.
Domain 3 Β· Audit Tools & Techniques

A draft finding reads in full: "The fraud model has had no fairness re-test since launch 14 months ago. We recommend scheduling re-tests." Management asks why this should rank above other items. According to the 4 Cs structure, what is the finding missing?

  • AThe condition β€” the finding never states what was observed.
  • BThe criteria, cause, and effect β€” the benchmark, the root cause, and the business impact that would let management weigh severity.
  • CThe recommendation β€” there is no suggested action.
  • DThe names of the staff responsible for the lapse.
Answer: B. A well-formed finding gives condition, criteria, cause, and effect plus a recommendation; this draft has the condition (no re-test in 14 months) and a recommendation, but omits the criteria (e.g., policy requiring annual/per-retrain re-tests), the root cause (e.g., no owner assigned), and the effect (e.g., undetected drift causing biased declines with regulatory and reputational exposure) β€” which are what drive severity. A and C are present in the text. D would blame individuals rather than address the control, which is poor practice.
Domain 3 Β· Audit Tools & Techniques

While auditing an AI governance program, the auditor finds the bias-testing process is weak and offers to author the new bias-testing standard so the team "doesn't have to." The same auditor is scheduled to provide assurance over that standard next year. What is the core problem?

  • ANo problem β€” helping management is always appropriate and improves the controls.
  • BThe only issue is that authoring a standard is outside the auditor's technical competence.
  • CAuthoring the standard makes the auditor the owner of the control, impairing independence and creating a self-review threat when they later audit it; the auditor should recommend content but let management own it.
  • DThe problem is purely that the work was offered for free rather than billed.
Answer: C. The auditor recommends, evaluates, and reports, while management designs, owns, and operates controls; writing the standard makes the auditor the control owner and creates a self-review threat that impairs objectivity over next year's assurance. A ignores the independence boundary. B misdiagnoses it as a competency issue rather than an independence issue. D is irrelevant to objectivity. If management insists on help, it becomes a disclosed advisory engagement and a different auditor should perform the later assurance review.

βœ…
How to read your score

Aim to consistently clear the mid-70s percent across all domains before exam day, and make sure you can explain why each wrong option is wrong β€” that's the skill the scenario format rewards. Then loop back to the study plan to shore up weak areas.