Martech Monitoring
Login Start Free

Blog

  • The Complete SFMC Health Check Checklist: Daily, Weekly, and Monthly

    Whether you’re an SFMC admin managing a handful of automations or an operations team responsible for hundreds across multiple business units, you need a repeatable process to ensure nothing falls through the cracks. This checklist covers the checks every SFMC team should be running — daily, weekly, and monthly.

    Bookmark this page. You’ll use it more than you think.

    Daily Checks

    Automation Studio

    • Review all automation statuses — Look for any automation showing “Error” or “Stopped” status. Filter by “Last Run” to catch automations that should have run but didn’t.
    • Verify file drop automations fired — These are the most commonly missed failures because “no file” means “no run” which means “no error.”
    • Check query activity results — Confirm that SQL queries returned expected row counts. A query that normally returns 10,000 rows returning 0 is a red flag.
    • Review import activity logs — Look for partial imports, rejected rows, or unexpected zero-row imports.

    Journey Builder

    • Verify active journeys are injecting — Check that journeys showing “Running” are actually processing contacts, not just sitting idle.
    • Review journey error logs — Look for contacts stuck in wait steps longer than expected or falling to error paths.
    • Check entry source populations — Ensure data extensions feeding journeys are being populated as expected.

    Sends and Deliverability

    • Review triggered send statuses — Confirm all critical triggered sends (welcome, transactional, password reset) are active.
    • Check bounce rates — A sudden spike in bounces can indicate a list quality issue or a blocklisting event.
    • Monitor send volumes — Verify daily send counts are within expected ranges.

    Weekly Checks

    Data Health

    • Audit data extension row counts — Compare current counts to the previous week. Significant deviations warrant investigation.
    • Review subscriber growth/churn — Track new subscribers vs. unsubscribes to spot trends early.
    • Check data extension retention policies — Ensure DEs with retention policies are properly purging old records.

    Performance Trending

    • Review automation run times — An automation that used to complete in 5 minutes now taking 30 minutes is an early warning sign of data growth issues.
    • Check API usage — Monitor REST and SOAP API call volumes against your allocation.
    • Review error trends — Are the same automations failing repeatedly? Address root causes, not just symptoms.

    Security and Access

    • Review user login activity — Check for failed login attempts or logins from unexpected locations.
    • Audit API integration credentials — Verify that all active integrations are authorized and credentials haven’t expired.

    Monthly Checks

    Capacity Planning

    • Review send limit utilization — If you’re consistently using 80%+ of your send limit, plan for an upgrade before you hit the ceiling.
    • Audit automation inventory — Identify and deactivate automations that are no longer needed. Zombie automations waste processing resources and make monitoring harder.
    • Review data extension storage — SFMC has storage limits. Track growth trends to avoid hitting them unexpectedly.

    Documentation and Process

    • Update automation documentation — Ensure your team knows what each automation does, who owns it, and what to do when it fails.
    • Review alert routing — Are alerts going to the right people? Has the team changed since alerts were configured?
    • Test disaster recovery procedures — Can you restore a critical automation or data extension from backup if needed?

    Automating This Checklist

    If you’re reading this and thinking “there’s no way my team has time to do all of this manually” — you’re right. That’s exactly the problem this checklist exposes.

    Most teams realistically cover about 20% of these checks. The other 80% are either done sporadically or not at all. That’s how silent failures happen.

    The solution is automation. Tools like Martech Monitoring can handle the daily and weekly checks automatically — verifying automation statuses, tracking data extension health, monitoring journey injection rates, and alerting your team the moment something deviates from expected behavior.

    Our free tier covers up to 5 automations with daily checks — enough to protect your most critical workflows while you evaluate whether full monitoring is right for your team.

    Download the Checklist

    We’ll be publishing a downloadable PDF version of this checklist soon. Drop us a note if you’d like us to send it to you when it’s ready.

  • 7 Common SFMC Automation Failures and How to Prevent Them

    You open Automation Studio on Monday morning and see it: a red “Error” status on an automation that was supposed to run all weekend. Customer welcome emails haven’t sent since Friday. Three days of new signups are sitting in a data extension, waiting for a journey that never triggered.

    Sound familiar? You’re not alone. Here are the 7 most common SFMC automation failures we see, and exactly how to prevent each one.

    1. File Drop Automations That Never Fire

    What happens: A file drop automation is waiting for a file from an external system (CRM, data warehouse, FTP). The file never arrives, so the automation never starts. No error is logged because technically nothing “failed” — it just never ran.

    How to prevent it: Monitor for absence of activity, not just errors. If a file drop automation that normally runs daily hasn’t triggered in 24 hours, you need an alert. This is where most manual monitoring fails — you can’t check for something that didn’t happen unless you’re tracking expected schedules.

    2. SQL Query Errors in Query Activities

    What happens: A query activity references a field that was renamed, a data extension that was deleted, or uses syntax that worked in a previous SFMC release but now throws an error. The automation runs, the query fails, and downstream activities operate on stale or empty data.

    How to prevent it: Test queries after any schema change. Monitor data extension row counts after query activities run — if a DE that normally has 50,000 rows suddenly has 0, the query likely failed. Automated monitoring can flag these anomalies instantly.

    3. Expired SFTP or API Credentials

    What happens: File transfer activities fail because SFTP credentials expired or were rotated by IT. This is especially common in enterprise environments where security policies mandate credential rotation every 60-90 days.

    How to prevent it: Maintain a credential rotation calendar and test connections proactively. When monitoring detects a file transfer failure, the alert should include enough context to immediately identify credential expiry as the likely cause.

    4. Data Extension Schema Mismatches

    What happens: An import activity fails because the source file has a new column, a changed column order, or a data type mismatch. This often happens when upstream systems change their export format without notifying the SFMC team.

    How to prevent it: Set up validation checks that verify imported row counts match expectations. Monitor for partial imports — an automation might “succeed” but only import 100 of 10,000 expected rows because of a schema issue in row 101.

    5. Journey Builder Entry Source Depletion

    What happens: A journey’s entry source data extension stops receiving new records. The journey shows “Running” but isn’t injecting anyone. From the Journey Builder UI, everything looks fine — you only notice when campaign metrics drop to zero.

    How to prevent it: Monitor journey injection rates alongside entry source data extension populations. If the entry DE’s row count flatlines, or if the journey’s injection count drops below historical averages, trigger an alert. This requires looking at the system holistically, not just at individual components.

    6. Send Throttling and Deliverability Hits

    What happens: An automation triggers a send to a larger-than-expected audience (e.g., a segmentation query returns too many results due to a missing WHERE clause). This blows through your hourly send limit, causes throttling on subsequent sends, and can damage your sender reputation with ISPs.

    How to prevent it: Monitor send volumes against expected ranges. Flag any send where the audience size exceeds the historical average by more than 2x. This simple check can prevent accidental mass sends and the deliverability problems they cause.

    7. Triggered Send Definition Deactivation

    What happens: A triggered send gets paused or deactivated — sometimes by a team member, sometimes by SFMC itself due to excessive errors. Journeys and automations that reference this triggered send continue to run, but no emails actually send. SFMC doesn’t alert on this.

    How to prevent it: Regularly audit triggered send statuses. If a triggered send that handles critical communications (welcome emails, order confirmations, password resets) goes inactive, you need to know within minutes, not days.

    The Common Thread

    Notice the pattern? Most of these failures are silent. SFMC won’t page you at 2 AM because a journey stopped injecting contacts. It won’t send a Slack message when an automation hasn’t run in 24 hours. It just… continues, quietly broken.

    That’s why purpose-built monitoring exists. Martech Monitoring checks your automations, journeys, and data extensions on a schedule, and alerts you the moment something deviates from expected behavior. You can start monitoring for free — no credit card required.

    Because the only thing worse than an SFMC failure is an SFMC failure nobody knows about.

  • Why Your SFMC Automations Are Failing Silently (And How to Fix It)

    If you manage Salesforce Marketing Cloud (SFMC), you’ve probably experienced it: an automation silently fails, emails stop sending, and nobody notices until a stakeholder asks why campaign numbers tanked. By then, the damage is done — missed revenue, angry customers, and a fire drill to figure out what went wrong.

    The truth is, SFMC doesn’t tell you when things break. There’s no built-in alerting for failed automations, stalled journeys, or data extension anomalies. You’re expected to check manually — and in a platform running dozens of automations across multiple business units, that’s a full-time job nobody signed up for.

    What Can Go Wrong in SFMC?

    More than you’d think. Here are the most common silent failures we see across SFMC instances:

    1. Automation Failures

    Automations can fail for dozens of reasons — expired credentials, schema mismatches, file drops that never arrived, SQL query errors. SFMC logs these failures, but unless someone checks Automation Studio daily, they go unnoticed.

    2. Journey Builder Stalls

    Journeys can stop injecting contacts without throwing a visible error. A misconfigured entry source, a depleted data extension, or a deactivated triggered send can all cause a journey to silently stop working while still showing a “Running” status.

    3. Data Extension Anomalies

    When a data extension that normally receives 10,000 records per day suddenly receives 500 — or 50,000 — something has changed upstream. Without monitoring, you won’t catch this until the downstream effects cascade through your campaigns.

    4. Send Limit Approaching

    SFMC enforces send limits per business unit. If you’re approaching your limit and don’t know it, sends will start failing with cryptic errors that are difficult to debug in the moment.

    Why Manual Monitoring Doesn’t Scale

    Most teams handle this with some combination of:

    • A shared spreadsheet of “things to check”
    • A junior admin logging into Automation Studio each morning
    • Hoping someone notices when numbers look off in reports

    This works when you have 5 automations. It falls apart at 20. At 50+, it’s impossible — especially across multiple business units.

    What Proactive Monitoring Looks Like

    Proactive SFMC monitoring means you get an alert before a stakeholder asks “why didn’t that email go out?” Here’s what an effective monitoring setup should do:

    • Check automation status on a schedule — every hour, or every 15 minutes for critical automations
    • Alert on failures immediately — via email, Slack, or Teams, depending on your team’s workflow
    • Track data extension row counts — flag anomalies based on historical patterns
    • Monitor journey health — verify that journeys are actively injecting and processing contacts
    • Log everything — maintain a historical record for troubleshooting and audit purposes

    Build vs. Buy

    Some teams build internal monitoring using WSProxy, SSJS, and webhook integrations. This works, but requires ongoing developer time to maintain, and often breaks when SFMC updates its API or when the developer who built it leaves the team.

    Purpose-built monitoring tools like Martech Monitoring provide this out of the box — automated checks, intelligent alerts, and historical trending — without the maintenance burden. We offer a free tier that monitors up to 5 automations with daily checks, so you can see the value before committing.

    The Bottom Line

    SFMC is a powerful platform, but it’s not designed to tell you when things go wrong. If your team is spending hours each week manually checking automation status, or worse, finding out about failures from stakeholders, it’s time to automate your monitoring.

    The cost of undetected failures — in missed revenue, damaged sender reputation, and team stress — far exceeds the cost of monitoring. Start with the basics: know what’s running, know when it breaks, and fix it before anyone else notices.