Practice Questions
Scenario-style questions with full explanations β just like the real exam.
Click an option to lock in your answer. The correct choice and a full explanation appear instantly, and your score updates above. Use the domain filters to drill a single area, or hit Reset quiz to start over. There's exactly one best answer per question β pick what an auditor should do FIRST or BEST.
History: 0 correct Β· 0 to review Β· 90 untried
Each question remembers whether you last got it right or wrong (stored in this browser). Use β Review mistakes to drill only the ones you missed.
An auditor reviewing a newly deployed credit-scoring model finds that the data science team built and validated it, but no business or risk owner is formally accountable for its decisions. What should the auditor do FIRST?
- ARecommend the data science team be made accountable, since they built the model.
- BRaise a finding that accountability for the model has not been assigned, and recommend a named model owner in the business be established.
- CDisable the model until a fairness assessment is completed.
- DConclude the model is acceptable because it was independently validated.
An organization is building an AI system that assigns risk scores used to screen job applicants. Under the EU AI Act, how should this system most likely be classified, and what does that imply?
- AMinimal risk β no specific obligations apply.
- BLimited risk β only transparency notices are required.
- CProhibited β employment-related AI is banned outright.
- DHigh risk β it triggers obligations such as risk management, data governance, human oversight, and conformity assessment.
Management asks internal audit to help define which AI use cases the company should pursue and to design the model approval workflow. The CAE is concerned. What is the BEST response?
- ADecline to design or own the workflow, as that would impair independence; offer to advise and later audit the controls management establishes.
- BAccept fully β audit's involvement guarantees the controls will be strong.
- CAccept, but have a different auditor sign the final report.
- DRefuse any involvement with AI governance whatsoever.
A company wants a single framework to set up an auditable, certifiable management system for governing AI across its lifecycle. Which is the most appropriate primary reference?
- AISO/IEC 42001, the AI management system standard.
- BThe EU AI Act.
- CThe OWASP Top 10 for LLMs.
- DGDPR.
An auditor maps the organization's AI program to the NIST AI RMF and finds strong activity in Map, Measure, and Manage, but little in the Govern function. What is the most significant implication?
- ANone β Govern is optional once the other three functions are in place.
- BGovern is the cross-cutting function that establishes culture, accountability, and policies; its weakness undermines the consistency and oversight of the other three.
- CIt only matters for generative AI systems.
- DThe organization should drop NIST and adopt a different framework.
A marketing team plans to deploy an AI tool that profiles customers using sensitive personal data to predict health-related interests. What governance step is MOST important before deployment?
- AA Data Protection Impact Assessment (DPIA) to evaluate privacy risk and necessity.
- BA penetration test of the hosting environment.
- CA marketing A/B test to confirm the tool increases conversions.
- DA press release describing the new capability.
During a governance review, an auditor finds the AI policy prohibits "unacceptable bias" but defines no metrics, thresholds, or owner for measuring fairness. How should the auditor characterize this?
- AAcceptable β having any policy statement is sufficient.
- BA minor wording issue with no control impact.
- CA control design weakness β the policy is not operable because it lacks measurable criteria and accountability.
- DOut of scope, because fairness is a Domain 2 operational concern.
A bank's AI risk register lists model-performance risks but omits third-party and supply-chain risk for the foundation model it licenses from a vendor. What is the auditor's BEST recommendation?
- ANo action β vendor risk is the vendor's responsibility, not the bank's.
- BReplace the vendor model with an in-house model immediately.
- CRemove model-performance risks since the vendor handles performance.
- DExpand the risk register and due-diligence process to cover third-party/foundation-model risks, since the bank retains accountability for outcomes.
A generative AI chatbot is deployed to answer customer questions, but users are not told they are interacting with an AI system. Which concern is MOST directly raised?
- AModel drift.
- BTransparency / disclosure obligations toward affected individuals.
- CInsufficient compute capacity.
- DLack of a rollback plan.
An organization claims its AI governance is mature because it has an AI ethics committee. The auditor wants evidence the committee is effective. Which evidence is MOST persuasive?
- AThe committee's charter and member list.
- BA slide deck announcing the committee's creation.
- CMinutes showing the committee reviewed specific use cases and made documented decisions that changed outcomes.
- DAn email from the CEO endorsing the committee.
A fraud-detection model that performed well at launch is now flagging far fewer transactions, and fraud losses are rising. Production input data patterns have shifted since training. What is the MOST likely cause?
- AOverfitting during training.
- BModel/data drift β the live data distribution has diverged from the training data.
- CA prompt-injection attack.
- DInsufficient training-time hyperparameter tuning.
An auditor wants to understand a deployed model's intended use, training data, performance across subgroups, and known limitations in one document. Which artifact should they request?
- AThe data dictionary.
- BThe model card.
- CThe network architecture diagram.
- DThe incident response runbook.
An LLM-based assistant retrieves snippets from a customer-facing knowledge base and includes them in its prompt. An attacker plants text in a public document that instructs the model to ignore its rules and reveal internal data. What threat is this?
- AData poisoning of the training set.
- BModel inversion.
- CConcept drift.
- D(Indirect) prompt injection.
A medical-triage classifier reports 98% accuracy, and the team calls it excellent. The condition it screens for occurs in about 2% of patients. What is the auditor's BEST concern?
- A98% accuracy proves the model is safe to deploy.
- BAccuracy should have been reported as an F1 of 98%.
- CWith a rare positive class, accuracy is misleading; recall/precision on the positive class matter far more.
- DThe model needs a larger learning rate.
A data scientist pushed an updated model straight to production over the weekend without an approval or a record of what changed, because "it scored better offline." What control weakness is MOST evident?
- AInadequate change management for models β missing approval, versioning, and documentation.
- BInsufficient training data.
- CLack of model explainability.
- DExcessive human oversight.
A model is trained on data scraped from a source an attacker can edit. The attacker inserts mislabeled examples so the model learns a hidden, harmful behavior. What is this attack called?
- APrompt injection.
- BMembership inference.
- CData poisoning.
- DDenial of service.
An auditor reviews the monitoring setup for a deployed recommendation model and finds dashboards for infrastructure uptime and latency only. What is the MOST important gap?
- ANothing β uptime and latency fully cover model health.
- BThere is no monitoring of model quality β prediction accuracy, drift, and data-quality signals.
- CThe dashboards refresh too frequently.
- DThe model lacks a public model card.
A production model starts producing clearly harmful outputs to customers. The team has no defined procedure for who decides to take it offline or how to communicate. What should the auditor recommend FIRST?
- AEstablish an AI incident-response process with defined roles, escalation, containment (including a kill switch), and communication.
- BRetrain the model on more data.
- CAdd more GPUs to improve performance.
- DPublish a model card.
An auditor examines an image classifier's training data and finds it was labeled by a single annotator with no review, and labeling guidelines were never written down. What risk is MOST directly created?
- ALabel quality risk β inconsistent or biased labels propagate into model behavior with no way to verify correctness.
- BNetwork latency risk.
- COverspending on compute.
- DExcessive model explainability.
A team reports their model achieves 99% accuracy on the test set, but performance in production is much worse. The test set was created by sampling from the same cleaned file used for training. What is the MOST likely problem?
- AThe production hardware is slower.
- BData leakage / non-representative test data β the test set overlaps with or is too similar to training data, inflating offline scores.
- CThe model is underfitting.
- DThe model is too explainable.
In reviewing an MLOps pipeline, an auditor wants assurance that any production prediction can be traced back to the exact model version and data that produced it. Which capability is MOST important?
- AAuto-scaling of inference servers.
- BA faster GPU.
- CA larger marketing budget.
- DReproducibility β versioning of models, data, and code plus logging that links predictions to the version that made them.
A hiring model shows equal overall accuracy for two groups, but its false-negative rate (qualified candidates rejected) is twice as high for one group. The team says the model is fair because accuracy is equal. What is the auditor's BEST position?
- AAgree β equal accuracy is sufficient evidence of fairness.
- BDisagree β equal accuracy can mask disparate error rates; the unequal false-negative rates indicate a fairness concern needing assessment.
- CDisagree, and demand the model be permanently banned.
- DAgree, but recommend a faster model.
A company deploys a third-party foundation model via API and builds features on top. The vendor silently updates the model, and downstream behavior changes. What control would have MOST helped detect this?
- AA bigger firewall.
- BDisabling all logging to reduce noise.
- CIncreasing the model's temperature setting.
- DOngoing output monitoring with a regression/benchmark test suite run against the API over time.
An auditor finds that a model's input-validation layer accepts free-text that is passed directly into a system prompt used to query a database. Which combined risk is MOST relevant?
- AOnly slower response times.
- BPrompt injection leading to unauthorized data access β untrusted input is mixed with trusted instructions and privileged actions.
- CReduced model accuracy on the test set.
- DHigher cloud storage costs.
An auditor concludes a model's controls are effective based solely on a verbal assurance from the lead data scientist that "testing is thorough." What is the primary problem with this conclusion?
- ANothing β management inquiry is the strongest form of evidence.
- BThe evidence is neither sufficient nor appropriate; inquiry alone, without corroboration, doesn't support an effectiveness conclusion.
- CThe auditor should have asked two data scientists instead of one.
- DThe conclusion is fine as long as it's documented.
An auditor needs to test whether a model-approval control operated for every release over the past year, where there were only 18 releases. What is the BEST testing approach?
- AStatistical attribute sampling of 30 items.
- BTest a single release and extrapolate.
- CTest the full population of 18 releases, since it is small enough to examine entirely.
- DRely on the data scientist's summary spreadsheet only.
While planning an AI audit, the auditor has limited time and must focus the engagement. Which approach to scoping is MOST appropriate?
- AUse a risk-based approach: prioritize the highest-risk models and controls (e.g., high-impact, customer-facing, or regulated use cases).
- BTest every control equally regardless of risk.
- CAudit whichever systems are easiest to access.
- DLet the data science team choose what gets audited.
An auditor wants to independently verify that a deployed model produces the documented outputs for a set of known inputs. Which technique provides the strongest evidence?
- AReading the model's documentation.
- BAsking the developer whether the outputs are correct.
- CRe-performance: running the auditor's own test cases through the model and comparing to expected results.
- DReviewing last year's audit report.
In drafting the audit report, the auditor must present a finding that a high-risk model lacks bias testing. What makes the finding MOST useful to management?
- AStating only that "the model is biased."
- BListing the names of the data scientists responsible.
- CIncluding the full source code in the report.
- DPresenting condition, criteria, cause, effect (risk/impact), and a clear, actionable recommendation.
An auditor plans to use a data-analytics script to test 100% of model decisions for policy violations. Before relying on the results, what should the auditor do?
- ANothing β analytics output is always reliable.
- BHave the auditee write and run the script for them.
- CValidate the completeness and accuracy of the input data and confirm the analytics logic is correct.
- DReduce the test to a sample of 10 decisions to save time.
An auditor learns that several business units have independently subscribed to generative-AI tools, and no one can produce a complete list of the AI systems in use across the company. What should the auditor recommend FIRST?
- AEstablish and maintain a central AI inventory/registry capturing each system's owner, purpose, data sources, and risk tier.
- BImmediately block all generative-AI tools at the network firewall.
- CRun a bias test on each tool the units have mentioned.
- DConclude that AI risk is well managed because business units are adopting innovation.
A retailer plans to repurpose customer purchase data, originally collected to fulfil orders, to train a new product-recommendation model. The data team argues, "It's already our data." What is the auditor's PRIMARY concern?
- AThe storage cost of retaining the historical purchase data.
- BWhether the recommendation model will be accurate enough.
- CPurpose limitation and lawful basis β data collected for one purpose generally can't be repurposed for training without a valid basis and a DPIA.
- DWhether the data is stored in an encrypted database.
An insurer adopts a third-party generative-AI tool to draft claims correspondence. Management says, "The vendor is SOC 2 certified and contractually responsible for compliance, so no internal review is needed." What should the auditor recommend?
- AAccept management's position; the SOC 2 report transfers accountability to the vendor.
- BBan the tool outright because vendor AI can never be controlled.
- CPerform an internal vendor risk assessment and establish internal controls (output review, data-use limits, inventory entry, named accountability).
- DRely solely on the contract's indemnity clause as the mitigating control.
During a governance review the auditor finds that the data-science team both builds models and performs the only validation before deployment. Management calls this efficient because "the builders understand the model best." What is the MOST significant concern?
- AThe models may take longer to deploy than necessary.
- BThe team may not be writing enough unit tests.
- CValidation is happening too early in the lifecycle.
- DThere is no independent challenge β the first line is validating its own work, collapsing the three lines of defense.
A multinational deploying an AI hiring tool in the EU tells the auditor it has adopted the NIST AI RMF and is "therefore compliant." What should the auditor point out?
- ANIST AI RMF adoption is sufficient evidence of legal compliance worldwide.
- BNIST AI RMF is a voluntary framework, not a legal compliance vehicle; an EU hiring tool is high-risk under the EU AI Act and also engages GDPR, so a regulatory mapping is needed.
- CThe organization should drop NIST and rely only on ISO/IEC 42001.
- DHiring tools are minimal risk, so no further action is required.
An auditor reviews an AI risk register and finds many controls listed, but no record showing who accepted the residual risk for a high-impact lending model after controls were applied. How should this be characterized?
- AAcceptable, because controls reduce risk and that is enough.
- BA finding β residual risk must be formally accepted by an accountable owner within the organization's risk appetite.
- CAcceptable, because inherent risk is what matters, not residual risk.
- DOut of scope, because risk acceptance is a Domain 3 audit-technique topic.
A team proposes using a machine-learning model to automate a fully deterministic regulatory calculation that must be perfectly explainable by law. What is the auditor's BEST observation?
- AThe model should be approved because AI improves efficiency everywhere.
- BThe model is fine as long as it reaches high accuracy on a test set.
- CAI may be a poor fit β a deterministic, legally explainable problem is better served by rules; using opaque ML introduces unnecessary explainability and compliance risk.
- DExplainability is irrelevant because the calculation is automated.
An organization's AI policy lists "fairness" and "transparency" as principles but contains no approval gates, no defined roles, and no prohibited uses. How should the auditor evaluate the policy?
- AAdequate β stating principles is the core purpose of a policy.
- BA design weakness β without acceptable/prohibited uses, approval gates, and roles, the policy is aspirational shelfware that cannot be operated or enforced.
- CAcceptable, because principles automatically imply the necessary controls.
- DA reason to delete the policy and start using AI without one.
A company wants to certify a management system for governing AI across its lifecycle, with defined scope, objectives, internal audits, and management review β comparable to how ISO 27001 works for security. Which standard is the most appropriate primary reference?
- AISO/IEC 23894, AI risk management guidance.
- BISO/IEC 42001, the AI management system (AIMS) standard.
- CThe EU AI Act.
- DThe OECD AI Principles.
An auditor wants to distinguish ISO/IEC 23894 from ISO/IEC 42001 for a report. Which statement is correct?
- ABoth are binding regulations enforced by data-protection authorities.
- B23894 is a certifiable management system and 42001 is informal guidance.
- C42001 is the certifiable AI management-system standard, while 23894 provides AI risk-management guidance aligning ISO 31000 principles to AI.
- DThey are identical and the numbers are interchangeable.
An auditor maps a company's AI program to the NIST AI RMF and must place "tracking fairness metrics and robustness test results on monitoring dashboards." Which function does this primarily belong to?
- AGOVERN.
- BMAP.
- CMEASURE.
- DMANAGE.
An organization treats AI governance as a single approval gate at launch, with no further checkpoints. An auditor reviewing the governance design sees what weakness?
- ANone β a launch gate is the only governance point that matters.
- BGovernance should span the full lifecycle (ideation β design β development β validation β deployment β monitoring β retirement) with checkpoints at each stage, not a one-time gate.
- CThe launch gate should be removed to speed up delivery.
- DGovernance is only needed after an incident occurs.
A hospital deploys a clinical-decision-support model and instructs clinicians to "always follow the model's recommendation to save time." Over months, clinicians stop questioning outputs even when they look wrong. Which AI-specific risk is MOST directly created?
- AData poisoning.
- BModel drift.
- CAutomation bias β humans over-trust the system and stop challenging it, hollowing out the human-oversight control.
- DIntellectual-property infringement.
An auditor must determine the EU AI Act classification of a system that performs real-time social scoring of citizens by a public authority. What is the correct classification and implication?
- AHigh risk β permitted with conformity assessment and human oversight.
- BLimited risk β only a transparency notice is required.
- CUnacceptable risk β social scoring of this kind is prohibited.
- DMinimal risk β largely unregulated, voluntary codes only.
A marketing team wants to deploy AI that profiles individuals using sensitive personal data to infer health-related interests. What governance artifact is MOST important to produce before processing begins?
- AA model card describing intended use.
- BA press release announcing the capability.
- CA Data Protection Impact Assessment (DPIA) evaluating necessity, proportionality, and risk to data subjects.
- DA penetration test of the hosting infrastructure.
An auditor reviewing training-data governance cannot find any documentation of where the data originated or how it was transformed before reaching the model. Which combined control gap is MOST relevant?
- AMissing data provenance and lineage β without them the organization cannot prove the data was lawfully obtained, suitable, or uncontaminated.
- BInsufficient GPU capacity for training.
- CToo much model explainability.
- DExcessive human oversight of the model.
An organization weighing whether to build an in-house model or license a foundation model from a vendor asks the auditor what governance factor matters most in the build-vs-buy decision. What is the BEST answer?
- ABuying always transfers accountability to the vendor, so buying is governance-superior.
- BWhichever option has the lowest license cost is the right governance choice.
- CEither way the deploying organization retains accountability, so buying requires vendor due diligence, contractual audit rights, model documentation, and an exit/continuity plan.
- DBuilding in-house removes all third-party and supply-chain risk permanently.
An AI initiative was launched by a single department with no link to enterprise strategy, no documented business objective, no success metric, and no fallback. How should the auditor characterize this?
- AA model of agile innovation that should be replicated company-wide.
- BShadow AI lacking strategic alignment β a finding, since every initiative should trace to a business objective, measurable benefit, owner, and appropriate sign-off.
- CAcceptable, because business value is self-evident once AI is involved.
- DOnly a Domain 2 operational concern, not relevant to governance.
An organization claims its AI governance is mature because it has an AI ethics committee. The auditor wants evidence the committee is operating effectively. Which evidence is MOST persuasive?
- AThe committee's charter and member roster.
- BA launch announcement slide deck.
- CAn email from the CEO endorsing the committee.
- DMinutes showing the committee reviewed specific high-risk use cases and made documented decisions that changed outcomes.
Management asks the CAE to have internal audit design and own the AI model-approval workflow and decide which AI use cases the company pursues. What is the BEST response to preserve independence?
- ADecline to design or own the workflow, since that creates a self-review threat; offer advisory input and later provide independent assurance over the controls management establishes.
- BAccept fully, because audit involvement guarantees the controls will be strong.
- CAccept, but have a different auditor sign the eventual assurance report.
- DRefuse any contact with AI governance whatsoever.
A team builds a churn model by first scaling all features (computing the mean and standard deviation) across the entire dataset, and only then splitting into train and test sets. Offline accuracy is excellent but production results lag. What is the MOST likely flaw?
- AThe model is underfitting because scaling removed useful signal.
- BConcept drift between training and deployment.
- CThe learning rate is set too high.
- DData leakage from preprocessing β fitting the scaler on the full dataset lets test-set statistics influence training.
An auditor reviews a facial-analysis model trained almost entirely on images of one demographic. It scores well on the held-out test set, which was drawn from the same source. What is the MOST significant concern about the evaluation?
- AThe test set was too large.
- BThe model used too few hyperparameters.
- CThe test set is not representative of the deployment population, so strong metrics may not generalize to underrepresented groups.
- DThe model card was published too early.
A document the auditor wants describes a dataset's motivation, how it was collected, who is represented in it, and its recommended and discouraged uses. Which artifact is this?
- AA model card.
- BA datasheet for the dataset.
- CA confusion matrix.
- DA service-level agreement.
An auditor finds that a single engineer trains models, approves them, and deploys them straight to production with no second party involved. What control principle is MOST clearly violated?
- ALeast privilege on the inference API.
- BData minimization.
- CSegregation of duties β the same person should not build, approve, and promote a model.
- DDifferential privacy.
Before fully replacing a deployed model, a team routes a small percentage of live traffic to the new version and watches its real metrics, ready to ramp up only if they hold. Which deployment pattern is this?
- ACanary deployment.
- BShadow deployment.
- CBig-bang cutover.
- DData poisoning.
A new model runs on full production traffic, but its predictions are only logged and compared to the live model β never shown to users or acted upon. What is the PRIMARY benefit of this approach?
- AIt evaluates the new model on real traffic with no user-facing risk, since its outputs are not used.
- BIt eliminates the need for a test set.
- CIt guarantees the model can never drift.
- DIt removes the need for a rollback plan.
An LLM customer-service feature behaves differently this week. Investigation shows the only change was an edit to the system prompt, made directly in production with no review, test, or record. How should the auditor classify this?
- ANot a change β prompts are configuration, not code, so change management does not apply.
- BAn uncontrolled change β prompt/system-message edits alter behavior and must go through change management with testing, approval, and rollback.
- CA data-quality issue in the training set.
- DConcept drift caused by changing user behavior.
A team retrains its model whenever someone "feels it is getting stale," with no defined criteria. The auditor wants retraining to be controlled rather than ad-hoc. What is the BEST recommendation?
- AStop retraining entirely to keep the model stable.
- BRetrain continuously on every new record in real time.
- CLet each engineer decide independently when to retrain.
- DDefine explicit retraining triggers (scheduled, drift-based, or event-based) tied to monitored thresholds and an approval gate.
During an incident, a team wants to revert a misbehaving model to the previous known-good version, but they cannot, because old versions were overwritten and never stored. What earlier control failure MOST directly caused this?
- ALack of adversarial testing.
- BNo model versioning/registry, which removes rollback capability needed for containment.
- CToo much human-in-the-loop oversight.
- DExcessive logging slowing the system.
An auditor learns that models are routinely tuned and validated directly against the live production database rather than in separate environments. What is the MOST significant risk?
- AFaster experimentation, which is purely beneficial.
- BNo environment promotion (dev β test β prod), risking untested changes affecting live decisions and data integrity.
- CThe model card will become outdated.
- DIncreased cloud storage costs only.
A recommendation model's input feature distributions have shifted noticeably β a new customer segment now dominates traffic β but the relationship between features and the correct outcome appears unchanged. Which phenomenon is this?
- AData drift (covariate shift) β the input distribution moved while the target rule stayed the same.
- BConcept drift β the input-to-output relationship changed.
- CData leakage.
- DModel extraction.
A predictive-policing model directs patrols to areas it flags as high-risk; those areas then generate more recorded incidents, which feed back into the next training cycle and reinforce the same flags. What dynamic is this?
- AAdversarial robustness.
- BHyperparameter tuning.
- CA feedback loop β the model's own outputs become future training data and amplify its existing bias.
- DDifferential privacy.
A bank states its AI loan decisions are "human-reviewed," but logs show reviewers approve 99.8% of recommendations in about three seconds each. How should the auditor characterize the oversight control?
- AEffective β a human is in the loop, which satisfies the requirement.
- BOperating in form but not substance β the pattern indicates rubber-stamping, so there is no meaningful human oversight.
- CIrrelevant, because human oversight is never required for AI.
- DEffective, provided the model has high accuracy.
An auditor wants an early-warning indicator that a deployed model is becoming risky before customers are harmed. Which metric is MOST appropriate to track as a key risk indicator (KRI)?
- AThe marketing campaign's click-through rate.
- BThe number of GPUs in the cluster.
- CThe size of the source code repository.
- DDrift scores and subgroup error rates trending against defined thresholds.
A spam classifier is tuned so that almost nothing is ever sent to the spam folder, ensuring legitimate mail is never wrongly blocked. Which metric has been prioritized, and what is the trade-off?
- ARecall was prioritized, at the cost of precision.
- BAUC was prioritized, eliminating all errors.
- CPrecision was prioritized to avoid false positives, at the cost of recall (more spam slips through).
- DAccuracy was prioritized, which removes the need for any other metric.
An auditor wants to confirm a model's individual predictions can be attributed to specific input features so reviewers can sanity-check the reasoning. Which technique is MOST relevant?
- ACanary deployment.
- BDifferential privacy.
- CExplainability methods such as SHAP or LIME that attribute a prediction to its input features.
- DBlue/green deployment.
Before launching a public LLM assistant, a team assembles people to deliberately attempt to make it produce disallowed content, leak its system prompt, and bypass guardrails. What is this activity called?
- AHyperparameter tuning.
- BRed-teaming β adversarial testing to surface ways the model can be made to misbehave.
- CCross-validation.
- DFeature engineering.
An LLM-powered research assistant cites a court case that does not exist and states it with full confidence. What testing technique is MOST directly aimed at catching this class of failure?
- ALoad testing.
- BPenetration testing of the host server.
- CHallucination / groundedness testing β checking that outputs stick to provided, verifiable facts.
- DDatabase index optimization.
A team validates only overall accuracy before deploying a high-stakes hiring model and performs no comparison of outcomes across protected groups. What is the auditor's BEST finding?
- ANo finding β overall accuracy is sufficient for any model.
- BFairness/bias testing was not performed; for a high-stakes model this is a finding regardless of how accurate it is overall.
- CThe model should be deleted immediately by the auditor.
- DThe only issue is that the model needs a faster inference server.
Researchers add tiny, almost imperceptible perturbations to images that cause a vision model to confidently misclassify a stop sign as a speed-limit sign. What attack class is this?
- AAdversarial examples β crafted input perturbations that fool the model at inference time.
- BMembership inference.
- CData poisoning.
- DModel extraction.
By repeatedly querying a model's API and analyzing its responses, an attacker is able to recover information indicating whether a particular individual's record was part of the training data. Which threat is this?
- APrompt injection.
- BMembership inference β a privacy attack revealing whether a record was in the training set.
- CConcept drift.
- DExcessive agency.
A competitor sends millions of carefully chosen queries to a company's prediction API and uses the input-output pairs to train a near-copy of the model. Which threat is this, and which control MOST directly counters it?
- AData poisoning; fixed by cleaning the training set.
- BModel extraction/stealing; countered by rate limiting, query monitoring, authentication, and abuse detection.
- CModel inversion; countered by adding more GPUs.
- DPrompt injection; countered by a longer system prompt.
An LLM agent is granted broad permissions: it can read any customer record and issue refunds without scoping or approval. Even with good prompt guardrails, the auditor is concerned. Which OWASP-LLM risk is MOST relevant?
- AUnbounded consumption.
- BExcessive agency β the model has too much autonomy and too many permissions, magnifying the impact of any manipulation.
- CMisinformation.
- DVector and embedding weaknesses.
A company integrates a pre-trained open-source model and a third-party dataset downloaded from a public repository, with no verification of their origin or integrity. Which OWASP-LLM category does this MOST directly raise?
- AImproper output handling.
- BSystem prompt leakage.
- CSupply-chain vulnerabilities β unvetted third-party models/datasets can be compromised or poisoned.
- DDenial of wallet.
A deployed model has been making biased, harmful decisions for weeks, but nothing in the traditional security operations center ever alarmed. What does this MOST strongly indicate about the AI incident-response program?
- ANothing β if the SOC did not alarm, no incident occurred.
- BThe model simply needs more compute.
- CDetection is inadequate for AI incidents; harmful or biased model behavior must be monitored beyond classic security alerts.
- DThe issue is purely a documentation gap in the model card.
After containing an AI incident in which a flawed model wrongly denied many loan applications, the auditor reviews the response. Which step is MOST important to address harm already done to affected individuals?
- AAdd more GPUs so the next model is faster.
- BDelete the logs so the incident is not visible to regulators.
- CRemediate the affected decisions β re-adjudicate wrongly denied applications and meet disclosure/notification obligations.
- DTake no further action once the model is rolled back.
A team repeatedly evaluates candidate models on the same holdout set, picking whichever scores highest, and then reports that top score as the model's expected real-world performance. What is wrong with this practice?
- ANothing β the test set should always be used to choose the best model.
- BThe holdout set should have been used for training instead.
- CThe team should have skipped validation entirely.
- DReusing the test set to select a model contaminates it; selection belongs on validation, and the reported test score is now optimistic.
An auditor is assigned to review three AI systems: a marketing content recommender, an internal meeting-notes summarizer, and a model that automatically declines insurance claims. With limited time, how should the auditor allocate the depth of testing across them?
- ASpend equal effort on all three so coverage looks balanced in the report.
- BStart with the summarizer because it is the simplest to understand.
- CTie depth of testing to each system's risk tier β concentrate on the claim-decline model because it is an automated, consequential decision affecting individuals.
- DLet the business owners tell the auditor which system is most important to review.
During planning, management asks internal audit to "just confirm the recruitment model is ethical." No internal fairness metric, threshold, or protected-attribute list has ever been defined. What should the auditor do?
- ATreat the absence of agreed criteria as a reportable governance gap, and work with management to establish explicit criteria before testing β without inventing the standard the business should own.
- BPick a fairness metric and threshold on the auditor's own authority and test against it.
- CProceed and report a subjective opinion on whether the model "feels" ethical.
- DDecline the engagement because ethics cannot be audited.
An audit team is strong in control testing but has no one who can evaluate a model-robustness or fairness-metric test. The engagement covers a high-risk medical-triage model. What is the BEST course of action?
- ASkip the robustness and fairness testing since the team cannot perform it.
- BHave the data-science team that built the model run and interpret the tests for the auditors.
- CEngage a qualified specialist for those areas while the auditor remains responsible for scoping their work and integrating the conclusions.
- DAssert the model is acceptable, noting the team lacked the skills to test it.
An auditor reviewing the pre-deployment approval gate confirms the AI policy requires independent validation and sign-off before any model goes live, and that the workflow has those steps built in. What has the auditor established, and what is still needed?
- AOperating effectiveness is established; no further testing is needed.
- BOnly design has been established; the auditor must still test operating effectiveness by confirming actual releases were validated and signed off before go-live.
- CBoth design and operating effectiveness are established because the policy is documented.
- DNothing useful β policy documents are not audit evidence.
An auditor draws a simple random sample of 75 of 80,000 lending decisions to assess whether a model treats a protected subgroup fairly, finds no errors, and prepares to conclude the model is fair. A reviewer objects. What is the flaw?
- AThe sample size of 75 is too large and wasted effort.
- BRandom sampling can never be used in an AI audit.
- CA simple random sample may contain almost no members of the protected subgroup, so it is not representative of the fairness risk; the sample should be stratified by subgroup and time.
- DThe auditor should have asked the model owner whether the model is fair instead of sampling.
To corroborate a model's reported validation results, an auditor asks the team to reproduce the result. The team cannot, because the exact model version, data snapshot, and random seed used at validation were never recorded. How should the auditor treat this?
- AAccept the original validation report at face value, since reproduction is optional.
- BNote it only as a minor housekeeping observation with no control impact.
- CRe-run the current production model and treat any matching number as confirmation.
- DRaise the inability to reproduce as a control finding, because unpinned versions, data, and seeds make results unverifiable and reproducibility itself is evidence.
An auditor wants to test whether a deployed scoring model still behaves as documented. The auditor independently runs a controlled set of inputs through the production model on a pinned version and compares the outputs to the documented expected results. Which evidence technique is this, and how reliable is it?
- AReperformance β among the strongest evidence, because the auditor generates it directly, though it is valid only for the version and data tested.
- BInquiry β the strongest evidence because it comes straight from the system owner.
- CObservation β reliable because the auditor watched a process occur.
- DInspection β weakest evidence because it only reviews documents.
An auditor must decide which single piece of evidence to rely on most heavily to conclude a fairness threshold was met. The available items are: an interview with the lead data scientist, the model card, the version-control history, and the auditor's own reperformance of the fairness calculation on a pinned snapshot. Which is most reliable?
- AThe interview, because it gives the most context.
- BThe model card, because it is an official internal document.
- CThe version-control history, because it shows who changed what.
- DThe auditor's reperformance of the fairness calculation, because auditor-generated evidence ranks highest.
An auditor needs to determine whether an automated pricing model has ever charged a protected group more than others. The decision logs hold 3 million fully digital, complete records. What is the BEST testing approach?
- ATake a judgmental sample of 50 high-value transactions and extrapolate.
- BUse CAATs to test the full population, since the data is digital and complete and full-population analysis detects even rare disparities a sample would miss.
- CRely on the pricing team's internal fairness summary.
- DTake a simple random sample of 30 and conclude if no disparity appears.
An external auditor wants to find anomalies in a client's customer dataset and pastes the data into a public generative-AI chatbot, then reports the items the tool flagged. Which problems are MOST relevant?
- AOnly that the chatbot might be slow to respond.
- BBreach of client data confidentiality, plus over-reliance with no explainability or independent verification of the flags.
- COnly that the tool may cost money to use at scale.
- DNothing β AI tool output can be reported directly as audit findings.
An auditor assesses the training data behind a churn model and finds the latest customer records are eighteen months old, even though customer behavior and product mix have changed substantially since then. Which data-quality dimension is MOST directly at issue, and why does it matter?
- AValidity β values no longer conform to expected formats.
- BCompleteness β entire fields are missing from the dataset.
- CTimeliness β stale data combined with a shifting world drives drift and degraded decisions.
- DAccuracy β the recorded values do not reflect reality at the time they were captured.
A draft finding reads in full: "The fraud model has had no fairness re-test since launch 14 months ago. We recommend scheduling re-tests." Management asks why this should rank above other items. According to the 4 Cs structure, what is the finding missing?
- AThe condition β the finding never states what was observed.
- BThe criteria, cause, and effect β the benchmark, the root cause, and the business impact that would let management weigh severity.
- CThe recommendation β there is no suggested action.
- DThe names of the staff responsible for the lapse.
While auditing an AI governance program, the auditor finds the bias-testing process is weak and offers to author the new bias-testing standard so the team "doesn't have to." The same auditor is scheduled to provide assurance over that standard next year. What is the core problem?
- ANo problem β helping management is always appropriate and improves the controls.
- BThe only issue is that authoring a standard is outside the auditor's technical competence.
- CAuthoring the standard makes the auditor the owner of the control, impairing independence and creating a self-review threat when they later audit it; the auditor should recommend content but let management own it.
- DThe problem is purely that the work was offered for free rather than billed.
Aim to consistently clear the mid-70s percent across all domains before exam day, and make sure you can explain why each wrong option is wrong β that's the skill the scenario format rewards. Then loop back to the study plan to shore up weak areas.