Martech Monitoring
Login Start Free

Author: Martech Monitoring Team

  • How to Set Up SFMC Automation Error Alerts That Actually Work

    Why Most SFMC Automation Alerts Fail Before They Start

    You’ve set up email notifications in Automation Studio. You feel covered. Then one Monday morning, you discover a nightly data sync has been failing silently for four days โ€” and your alert emails were sitting unread in a shared inbox alongside dozens of other routine notifications nobody checks anymore.

    This is the real problem with SFMC automation alerts: it’s not that the tools aren’t there, it’s that most teams configure them once and assume the job is done. Effective alerting is a system, not a checkbox. This guide walks you through building that system properly โ€” from native SFMC configuration to routing strategies that ensure the right person sees the right error at the right time.

    Understanding What SFMC Actually Gives You Natively

    Automation Studio provides built-in error notification settings at two levels: the account level and the individual automation level. Both matter, and many teams configure only one.

    Account-Level Notification Settings

    In Setup, under Automation Studio Settings, you can define a default notification email address that receives alerts whenever any automation in your account encounters an error. This is a useful catch-all, but it’s also where alert fatigue begins if you’re not careful. Every skipped record warning, every benign timeout retry, every low-severity issue lands in the same inbox as your critical payment data imports.

    Navigate here via: Setup โ†’ Platform Tools โ†’ Apps โ†’ Automation Studio โ†’ Settings. The field labeled Error Notifications accepts a single email address or distribution list. Use a distribution list โ€” never a single person’s inbox โ€” so coverage survives vacations and role changes.

    Automation-Level Notifications

    Inside each individual automation, the Notifications tab lets you configure email alerts specific to that workflow. You can set recipients for both errors and skipped records separately. This granularity is powerful and underused. A high-stakes revenue reporting automation should notify your senior data engineer directly. A low-priority preference center sync can notify a shared team alias. Map your notification recipients to the business criticality of the automation, not just who built it.

    The Four Failure Modes You Need to Alert On

    Native SFMC notifications cover activity-level errors, but there are failure patterns that won’t trigger any built-in alert at all. Know all four:

    • Hard activity errors: A SQL query fails, an import file is missing, a script activity throws an exception. These are caught by native notifications and are the most visible failures.
    • Silent skipped records: An import activity processes but skips rows due to validation errors. The automation reports as “complete” โ€” no error notification fires. Your data is silently incomplete.
    • Automation never starts: A schedule drift, a UI save error, or a dependency issue means the automation simply doesn’t run. No error is thrown because nothing executed. This is the ghost failure.
    • Partial completion: Step 1 of 5 completes, Step 2 errors and stops. Downstream activities never run. Native alerts catch the error on Step 2 but won’t tell you what downstream impact occurred.

    For failures in categories 2, 3, and 4, you need monitoring logic beyond what SFMC provides out of the box โ€” which is why teams increasingly rely on external tools like Martech Monitoring to watch for automations that don’t run on schedule, not just automations that error when they do.

    Building an Alert Routing Strategy That Scales

    The goal is simple: the right person gets paged for a P1 failure, and nobody gets paged at 2am for a warning-level skipped record report. Here’s how to structure it.

    Tier Your Automations by Business Impact

    Before touching any notification settings, classify every automation in your instance into three tiers:

    • Tier 1 โ€“ Critical: Revenue-impacting, compliance-related, or feeds downstream systems (e.g., transactional sends, CRM syncs, suppression list imports). Failure requires immediate response.
    • Tier 2 โ€“ Important: Operational but recoverable within a business day (e.g., lead nurture programs, daily reporting). Failure should surface within hours.
    • Tier 3 โ€“ Low Priority: Nice-to-have automations where failure has minimal immediate business impact. Weekly digest, preference data aggregation, etc.

    Document this classification in a shared spreadsheet or your team’s wiki. It becomes the foundation for every alerting decision you make.

    Route Alerts by Tier, Not by Sender

    Once tiers are defined, configure notification recipients accordingly:

    • Tier 1 automations: Alert a distribution list that triggers a PagerDuty or Opsgenie incident, or at minimum routes to a Slack channel that has an on-call rotation. If your team doesn’t have an on-call process for marketing data, this is the moment to build one.
    • Tier 2 automations: Alert a team email alias that someone reviews every morning. Consider a dedicated sfmc-automation-alerts@yourcompany.com address that feeds into a monitored ticketing queue.
    • Tier 3 automations: Log the error but don’t alert urgently. A weekly digest review of Tier 3 failures is often sufficient.

    Defeating Alert Fatigue: The Practical Approach

    Alert fatigue is the silent killer of monitoring programs. When every notification looks the same โ€” regardless of severity โ€” humans learn to ignore them all. Here are specific tactics to prevent this in SFMC environments.

    Suppress Noise at the Source

    Audit your Automation Studio error logs for the last 30 days. Identify recurring errors that your team has already assessed as non-actionable. Common culprits include:

    • FTP import automations that error on weekends when source files aren’t generated (expected behavior, not a real failure)
    • SQL queries that return zero rows and are configured to error on empty results unnecessarily
    • Script activities with overly broad try/catch blocks that escalate warnings as errors

    Fix these at the automation level first. Change SQL activities to handle empty results gracefully. Adjust schedule windows to match when source data is actually available. Every non-actionable alert you eliminate is one fewer cry-wolf notification eroding your team’s trust in the system.

    Use Meaningful Subject Lines

    SFMC’s native notification emails have generic subject lines. When these arrive in a shared inbox, no one knows at a glance whether to escalate or ignore. If you’re routing alerts through a middleware tool or webhook (see below), customize the subject line to include:

    • Automation name
    • Failure tier (e.g., [CRITICAL] or [LOW])
    • Error type in plain language

    Example: [CRITICAL] Revenue Data Import โ€“ Import Activity Failed โ€“ Missing Source File tells the recipient everything they need to triage before opening the email.

    Extending Alerts Beyond Native SFMC: The API Approach

    For teams that need richer alerting logic, the SFMC REST API opens up significant options. You can use a Script Activity at the end of each automation to make an API call that logs completion status to an external system or triggers a conditional alert.

    // Script Activity - Automation Heartbeat to External Webhook
    var endpoint = 'https://your-monitoring-endpoint.com/sfmc/heartbeat';
    var payload = {
      automationName: 'Nightly Revenue Sync',
      status: 'complete',
      timestamp: Platform.Function.SystemDateToLocalDate(Now()),
      environment: 'Production'
    };
    
    var req = new Script.Util.HttpRequest(endpoint);
    req.emptyContentHandling = 0;
    req.retryCount = 2;
    req.encoding = 'UTF-8';
    req.method = 'POST';
    req.contentType = 'application/json';
    req.postData = Stringify(payload);
    
    var resp = req.send();
    

    Place this Script Activity as the final step in your Tier 1 automations. If the webhook doesn’t receive a heartbeat within the expected window, your external monitoring layer fires an alert. This catches the ghost failure scenario โ€” automations that never start โ€” which SFMC’s native tools cannot detect on their own.

    Platforms like Martech Monitoring are purpose-built for this pattern, monitoring automation run schedules and surfacing missed executions automatically without requiring you to build and maintain custom webhook infrastructure.

    Operationalizing Your Alert System: What Good Looks Like

    A mature SFMC alerting setup has these characteristics:

    • Every Tier 1 automation has a documented expected run window โ€” not just an error alert, but a “this should have run by X time” check.
    • Alert recipients are role-based distribution lists, not individual email addresses. When someone leaves, the alert coverage doesn’t leave with them.
    • There’s a monthly alert audit where the team reviews which alerts fired, which were acted on, and which were noise. Anything generating recurring noise gets investigated and fixed.
    • Runbooks exist for Tier 1 failures. When an alert fires at 11pm, the on-call person shouldn’t have to guess what to do. A short runbook per automation โ€” what the failure likely means, what to check first, who to escalate to โ€” dramatically reduces mean time to resolution.
    • Alerts are tested deliberately. At least once a quarter, intentionally break a Tier 1 automation in a sandboxed way to verify the full alert chain fires correctly and reaches the right people.

    Conclusion

    Effective SFMC automation alerting is less about enabling a notification email and more about building a system your team actually trusts and responds to. That means tiering your automations, routing alerts with purpose, eliminating noise at the source, and monitoring for failures that SFMC’s native tools simply can’t see โ€” like automations that never run.

    The teams that get this right catch failures before they impact customer sends or downstream data quality. The teams that don’t are still discovering four-day-old failures on Monday mornings.

    Want to automate your SFMC monitoring without building custom infrastructure? Check out Martech Monitoring โ€” built specifically to give SFMC teams visibility into automation health, missed runs, and deliverability issues before they become business problems.

  • The Complete SFMC Health Check Checklist: Daily, Weekly, and Monthly

    Whether you’re an SFMC admin managing a handful of automations or an operations team responsible for hundreds across multiple business units, you need a repeatable process to ensure nothing falls through the cracks. This checklist covers the checks every SFMC team should be running โ€” daily, weekly, and monthly.

    Bookmark this page. You’ll use it more than you think.

    Daily Checks

    Automation Studio

    • Review all automation statuses โ€” Look for any automation showing “Error” or “Stopped” status. Filter by “Last Run” to catch automations that should have run but didn’t.
    • Verify file drop automations fired โ€” These are the most commonly missed failures because “no file” means “no run” which means “no error.”
    • Check query activity results โ€” Confirm that SQL queries returned expected row counts. A query that normally returns 10,000 rows returning 0 is a red flag.
    • Review import activity logs โ€” Look for partial imports, rejected rows, or unexpected zero-row imports.

    Journey Builder

    • Verify active journeys are injecting โ€” Check that journeys showing “Running” are actually processing contacts, not just sitting idle.
    • Review journey error logs โ€” Look for contacts stuck in wait steps longer than expected or falling to error paths.
    • Check entry source populations โ€” Ensure data extensions feeding journeys are being populated as expected.

    Sends and Deliverability

    • Review triggered send statuses โ€” Confirm all critical triggered sends (welcome, transactional, password reset) are active.
    • Check bounce rates โ€” A sudden spike in bounces can indicate a list quality issue or a blocklisting event.
    • Monitor send volumes โ€” Verify daily send counts are within expected ranges.

    Weekly Checks

    Data Health

    • Audit data extension row counts โ€” Compare current counts to the previous week. Significant deviations warrant investigation.
    • Review subscriber growth/churn โ€” Track new subscribers vs. unsubscribes to spot trends early.
    • Check data extension retention policies โ€” Ensure DEs with retention policies are properly purging old records.

    Performance Trending

    • Review automation run times โ€” An automation that used to complete in 5 minutes now taking 30 minutes is an early warning sign of data growth issues.
    • Check API usage โ€” Monitor REST and SOAP API call volumes against your allocation.
    • Review error trends โ€” Are the same automations failing repeatedly? Address root causes, not just symptoms.

    Security and Access

    • Review user login activity โ€” Check for failed login attempts or logins from unexpected locations.
    • Audit API integration credentials โ€” Verify that all active integrations are authorized and credentials haven’t expired.

    Monthly Checks

    Capacity Planning

    • Review send limit utilization โ€” If you’re consistently using 80%+ of your send limit, plan for an upgrade before you hit the ceiling.
    • Audit automation inventory โ€” Identify and deactivate automations that are no longer needed. Zombie automations waste processing resources and make monitoring harder.
    • Review data extension storage โ€” SFMC has storage limits. Track growth trends to avoid hitting them unexpectedly.

    Documentation and Process

    • Update automation documentation โ€” Ensure your team knows what each automation does, who owns it, and what to do when it fails.
    • Review alert routing โ€” Are alerts going to the right people? Has the team changed since alerts were configured?
    • Test disaster recovery procedures โ€” Can you restore a critical automation or data extension from backup if needed?

    Automating This Checklist

    If you’re reading this and thinking “there’s no way my team has time to do all of this manually” โ€” you’re right. That’s exactly the problem this checklist exposes.

    Most teams realistically cover about 20% of these checks. The other 80% are either done sporadically or not at all. That’s how silent failures happen.

    The solution is automation. Tools like Martech Monitoring can handle the daily and weekly checks automatically โ€” verifying automation statuses, tracking data extension health, monitoring journey injection rates, and alerting your team the moment something deviates from expected behavior.

    Our free tier covers up to 5 automations with daily checks โ€” enough to protect your most critical workflows while you evaluate whether full monitoring is right for your team.

    Download the Checklist

    We’ll be publishing a downloadable PDF version of this checklist soon. Drop us a note if you’d like us to send it to you when it’s ready.


    Take Action on Your SFMC Monitoring

    Download the free SFMC Monitoring Checklist รขโ‚ฌโ€ 27 critical items to monitor, with recommended frequencies and alert thresholds for each.

    Or watch the product demo to see how Martech Monitoring automates all of this for you รขโ‚ฌโ€ catching Journey failures, Automation errors, and Data Extension issues in minutes, not days.

    Start monitoring free รขโ‚ฌโ€ no credit card required.

  • 7 Common SFMC Automation Failures and How to Prevent Them

    You open Automation Studio on Monday morning and see it: a red “Error” status on an automation that was supposed to run all weekend. Customer welcome emails haven’t sent since Friday. Three days of new signups are sitting in a data extension, waiting for a journey that never triggered.

    Sound familiar? You’re not alone. Here are the 7 most common SFMC automation failures we see, and exactly how to prevent each one.

    1. File Drop Automations That Never Fire

    What happens: A file drop automation is waiting for a file from an external system (CRM, data warehouse, FTP). The file never arrives, so the automation never starts. No error is logged because technically nothing “failed” โ€” it just never ran.

    How to prevent it: Monitor for absence of activity, not just errors. If a file drop automation that normally runs daily hasn’t triggered in 24 hours, you need an alert. This is where most manual monitoring fails โ€” you can’t check for something that didn’t happen unless you’re tracking expected schedules.

    2. SQL Query Errors in Query Activities

    What happens: A query activity references a field that was renamed, a data extension that was deleted, or uses syntax that worked in a previous SFMC release but now throws an error. The automation runs, the query fails, and downstream activities operate on stale or empty data.

    How to prevent it: Test queries after any schema change. Monitor data extension row counts after query activities run โ€” if a DE that normally has 50,000 rows suddenly has 0, the query likely failed. Automated monitoring can flag these anomalies instantly.

    3. Expired SFTP or API Credentials

    What happens: File transfer activities fail because SFTP credentials expired or were rotated by IT. This is especially common in enterprise environments where security policies mandate credential rotation every 60-90 days.

    How to prevent it: Maintain a credential rotation calendar and test connections proactively. When monitoring detects a file transfer failure, the alert should include enough context to immediately identify credential expiry as the likely cause.

    4. Data Extension Schema Mismatches

    What happens: An import activity fails because the source file has a new column, a changed column order, or a data type mismatch. This often happens when upstream systems change their export format without notifying the SFMC team.

    How to prevent it: Set up validation checks that verify imported row counts match expectations. Monitor for partial imports โ€” an automation might “succeed” but only import 100 of 10,000 expected rows because of a schema issue in row 101.

    5. Journey Builder Entry Source Depletion

    What happens: A journey’s entry source data extension stops receiving new records. The journey shows “Running” but isn’t injecting anyone. From the Journey Builder UI, everything looks fine โ€” you only notice when campaign metrics drop to zero.

    How to prevent it: Monitor journey injection rates alongside entry source data extension populations. If the entry DE’s row count flatlines, or if the journey’s injection count drops below historical averages, trigger an alert. This requires looking at the system holistically, not just at individual components.

    6. Send Throttling and Deliverability Hits

    What happens: An automation triggers a send to a larger-than-expected audience (e.g., a segmentation query returns too many results due to a missing WHERE clause). This blows through your hourly send limit, causes throttling on subsequent sends, and can damage your sender reputation with ISPs.

    How to prevent it: Monitor send volumes against expected ranges. Flag any send where the audience size exceeds the historical average by more than 2x. This simple check can prevent accidental mass sends and the deliverability problems they cause.

    7. Triggered Send Definition Deactivation

    What happens: A triggered send gets paused or deactivated โ€” sometimes by a team member, sometimes by SFMC itself due to excessive errors. Journeys and automations that reference this triggered send continue to run, but no emails actually send. SFMC doesn’t alert on this.

    How to prevent it: Regularly audit triggered send statuses. If a triggered send that handles critical communications (welcome emails, order confirmations, password resets) goes inactive, you need to know within minutes, not days.

    The Common Thread

    Notice the pattern? Most of these failures are silent. SFMC won’t page you at 2 AM because a journey stopped injecting contacts. It won’t send a Slack message when an automation hasn’t run in 24 hours. It just… continues, quietly broken.

    That’s why purpose-built monitoring exists. Martech Monitoring checks your automations, journeys, and data extensions on a schedule, and alerts you the moment something deviates from expected behavior. You can start monitoring for free โ€” no credit card required.

    Because the only thing worse than an SFMC failure is an SFMC failure nobody knows about.


    Take Action on Your SFMC Monitoring

    Download the free SFMC Monitoring Checklist รขโ‚ฌโ€ 27 critical items to monitor, with recommended frequencies and alert thresholds for each.

    Or watch the product demo to see how Martech Monitoring automates all of this for you รขโ‚ฌโ€ catching Journey failures, Automation errors, and Data Extension issues in minutes, not days.

    Start monitoring free รขโ‚ฌโ€ no credit card required.

  • Why Your SFMC Automations Are Failing Silently (And How to Fix It)

    If you manage Salesforce Marketing Cloud (SFMC), you’ve probably experienced it: an automation silently fails, emails stop sending, and nobody notices until a stakeholder asks why campaign numbers tanked. By then, the damage is done โ€” missed revenue, angry customers, and a fire drill to figure out what went wrong.

    The truth is, SFMC doesn’t tell you when things break. There’s no built-in alerting for failed automations, stalled journeys, or data extension anomalies. You’re expected to check manually โ€” and in a platform running dozens of automations across multiple business units, that’s a full-time job nobody signed up for.

    What Can Go Wrong in SFMC?

    More than you’d think. Here are the most common silent failures we see across SFMC instances:

    1. Automation Failures

    Automations can fail for dozens of reasons โ€” expired credentials, schema mismatches, file drops that never arrived, SQL query errors. SFMC logs these failures, but unless someone checks Automation Studio daily, they go unnoticed.

    2. Journey Builder Stalls

    Journeys can stop injecting contacts without throwing a visible error. A misconfigured entry source, a depleted data extension, or a deactivated triggered send can all cause a journey to silently stop working while still showing a “Running” status.

    3. Data Extension Anomalies

    When a data extension that normally receives 10,000 records per day suddenly receives 500 โ€” or 50,000 โ€” something has changed upstream. Without monitoring, you won’t catch this until the downstream effects cascade through your campaigns.

    4. Send Limit Approaching

    SFMC enforces send limits per business unit. If you’re approaching your limit and don’t know it, sends will start failing with cryptic errors that are difficult to debug in the moment.

    Why Manual Monitoring Doesn’t Scale

    Most teams handle this with some combination of:

    • A shared spreadsheet of “things to check”
    • A junior admin logging into Automation Studio each morning
    • Hoping someone notices when numbers look off in reports

    This works when you have 5 automations. It falls apart at 20. At 50+, it’s impossible โ€” especially across multiple business units.

    What Proactive Monitoring Looks Like

    Proactive SFMC monitoring means you get an alert before a stakeholder asks “why didn’t that email go out?” Here’s what an effective monitoring setup should do:

    • Check automation status on a schedule โ€” every hour, or every 15 minutes for critical automations
    • Alert on failures immediately โ€” via email, Slack, or Teams, depending on your team’s workflow
    • Track data extension row counts โ€” flag anomalies based on historical patterns
    • Monitor journey health โ€” verify that journeys are actively injecting and processing contacts
    • Log everything โ€” maintain a historical record for troubleshooting and audit purposes

    Build vs. Buy

    Some teams build internal monitoring using WSProxy, SSJS, and webhook integrations. This works, but requires ongoing developer time to maintain, and often breaks when SFMC updates its API or when the developer who built it leaves the team.

    Purpose-built monitoring tools like Martech Monitoring provide this out of the box โ€” automated checks, intelligent alerts, and historical trending โ€” without the maintenance burden. We offer a free tier that monitors up to 5 automations with daily checks, so you can see the value before committing.

    The Bottom Line

    SFMC is a powerful platform, but it’s not designed to tell you when things go wrong. If your team is spending hours each week manually checking automation status, or worse, finding out about failures from stakeholders, it’s time to automate your monitoring.

    The cost of undetected failures โ€” in missed revenue, damaged sender reputation, and team stress โ€” far exceeds the cost of monitoring. Start with the basics: know what’s running, know when it breaks, and fix it before anyone else notices.


    Take Action on Your SFMC Monitoring

    Download the free SFMC Monitoring Checklist รขโ‚ฌโ€ 27 critical items to monitor, with recommended frequencies and alert thresholds for each.

    Or watch the product demo to see how Martech Monitoring automates all of this for you รขโ‚ฌโ€ catching Journey failures, Automation errors, and Data Extension issues in minutes, not days.

    Start monitoring free รขโ‚ฌโ€ no credit card required.