Heartbeat + auto_failback on + DRBD

Alles was in keine andere Systemkategorie passt
preussner
Posts: 19
Joined: 2006-02-15 16:16
 

Heartbeat + auto_failback on + DRBD

Post by preussner »

servus.

ich hab folgendes Problem.
ich muss einen Server auf auto_fallback on umstellen. Da eine der beiden Maschinen stärker ist als die andere.
Heartbeat funkt einwandfrei und schaltet auch zurück wenn der db01 wieder da ist.

ABER.
ich hab das Problem das drbd noch nicht mit dem Syncen fertig ist.
ich hab mir das ganze nochmal als testsystem aufgesetzt.
Wenn ich db01 abschalte werden alle Dienste nach db02 verschoben. und laufen da problemlos weiter.
zusätzlich erzeuge ich noch eine große datei auf dem Server db02 auf der gespiegelten drbd platte.

wenn das geschehen ist schalte ich db01 wieder ein und er fängt auch an die änderungen zu syncronisieren.
Leider bekomm ich drbd nicht dazu den boot prozess solange aufzuhalten bis er fertig ist mit syncen.
Problem hier ist das dann Heartbeat startet und durch die auto_fallback funktion wieder alle dienste zurückholt obwohl die Platte noch nicht 100% die ganzen daten hat.

drbd8 config

Code: Select all

#
# drbd.conf example
#
# parameters you _need_ to change are the hostname, device, disk,
# meta-disk, address and port in the "on <hostname> {}" sections.
#
# you ought to know about the protocol, and the various timeouts.
#
# you probably want to set the rate in the syncer sections

#
# NOTE common pitfall:
# rate is given in units of _byte_ not bit
#

#
# increase timeout and maybe ping-int in net{}, if you see
# problems with "connection lost/connection established"
# (or change your setup to reduce network latency; make sure full
#  duplex behaves as such; check average roundtrip times while
#  network is saturated; and so on ...)
#

skip {
  As you can see, you can also comment chunks of text
  with a 'skip[optional nonsense]{ skipped text }' section.
  This comes in handy, if you just want to comment out
  some 'resource <some name> {...}' section:
  just precede it with 'skip'.

  The basic format of option assignment is
  <option name><linear whitespace><value>;

  It should be obvious from the examples below,
  but if you really care to know the details:

  <option name> :=
        valid options in the respective scope
  <value>  := <num>|<string>|<choice>|...
              depending on the set of allowed values
              for the respective option.
  <num>    := [0-9]+, sometimes with an optional suffix of K,M,G
  <string> := (<name>|"([^"\n]*|\.)*")+
  <name>   := [/_.A-Za-z0-9-]+
}

#
# At most ONE global section is allowed.
# It must precede any resource section.
#
global {
    # By default we load the module with a minor-count of 32. In case you
    # have more devices in your config, the module gets loaded with
    # a minor-count that ensures that you have 10 minors spare.
    # In case 10 spare minors are too little for you, you can set the
    # minor-count exeplicit here. ( Note, in contrast to DRBD-0.7 an
    # unused, spare minor has only a very little overhead of allocated
    # memory (a single pointer to be exact). )
    #
    # minor-count 64;

    # The user dialog counts and displays the seconds it waited so
    # far. You might want to disable this if you have the console
    # of your server connected to a serial terminal server with
    # limited logging capacity.
    # The Dialog will print the count each 'dialog-refresh' seconds,
    # set it to 0 to disable redrawing completely. [ default = 1 ]
    #
     dialog-refresh 5; # 5 seconds

    # You might disable one of drbdadm's sanity check.
    # disable-ip-verification;

    # Participate in DRBD's online usage counter at http://usage.drbd.org
    # possilbe options: ask, yes, no. Default is ask. In case you do not
    # know, set it to ask, and follow the on screen instructions later.
    usage-count yes;
}


#
# The common section can have all the sections a resource can have but
# not the host section (started with the "on" keyword).
# The common section must precede all resources.
# All resources inherit the settings from the common section.
# Whereas settings in the resources have precedence over the common
# setting.
#

common {
  syncer { rate 680M; }
}

#
# this need not be r#, you may use phony resource names,
# like "resource web" or "resource mail", too
#

resource drbd0 {

  # transfer protocol to use.
  # C: write IO is reported as completed, if we know it has
  #    reached _both_ local and remote DISK.
  #    * for critical transactional data.
  # B: write IO is reported as completed, if it has reached
  #    local DISK and remote buffer cache.
  #    * for most cases.
  # A: write IO is reported as completed, if it has reached
  #    local DISK and local tcp send buffer. (see also sndbuf-size)
  #    * for high latency networks
  #
  #**********
  # uhm, benchmarks have shown that C is actually better than B.
  # this note shall disappear, when we are convinced that B is
  # the right choice "for most cases".
  # Until then, always use C unless you have a reason not to.
  #     --lge
  #**********
  #
  protocol C;

  handlers {
    # what should be done in case the node is primary, degraded
    # (=no connection) and has inconsistent data.
    pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";

    # The node is currently primary, but lost the after split brain
    # auto recovery procedure. As as consequence it should go away.
    pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";

    # In case you have set the on-io-error option to "call-local-io-error",
    # this script will get executed in case of a local IO error. It is
    # expected that this script will case a immediate failover in the
    # cluster.
    local-io-error "echo o > /proc/sysrq-trigger ; halt -f";

    # Commands to run in case we need to downgrade the peer's disk
    # state to "Outdated". Should be implemented by the superior
    # communication possibilities of our cluster manager.
    # The provided script uses ssh, and is for demonstration/development
    # purposis.
    # outdate-peer "/usr/lib/drbd/outdate-peer.sh on amd 192.168.22.11 192.168.23.11 on alf 192.168.22.12 192.168.23.12";
    #
    # Update: Now there is a solution that relies on heartbeat's
    # communication layers. You should really use this.
    #outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";

    # The node is currently primary, but should become sync target
    # after the negotiating phase. Alert someone about this incident.
    #pri-lost "echo pri-lost. Have a look at the log files. | mail -s 'DRBD Alert' root";

    # Notify someone in case DRBD split brained.
    #split-brain "echo split-brain. drbdadm -- --discard-my-data connect $DRBD_RESOURCE ? | mail -s 'DRBD Alert' root";
  }

  startup {
    # Wait for connection timeout.
    # The init script blocks the boot process until the resources
    # are connected. This is so when the cluster manager starts later,
    # it does not see a resource with internal split-brain.
    # In case you want to limit the wait time, do it here.
    # Default is 0, which means unlimited. Unit is seconds.
    #
     wfc-timeout  0;

    # Wait for connection timeout if this node was a degraded cluster.
    # In case a degraded cluster (= cluster with only one node left)
    # is rebooted, this timeout value is used.
    #
    degr-wfc-timeout 60;    # 2 minutes.

    # In case there was a split brain situation the devices will
    # drop their network configuration instead of connecting. Since
    # this means that the network is working, the cluster manager
    # should be able to communicate as well. Therefore the default
    # of DRBD's init script is to terminate in this case. To make
    # it to continue waiting in this case set this option.
    #
     wait-after-sb;

    # In case you are using DRBD for GFS/OCFS2 you want that the
    # startup script promotes it to primary. Nodenames are also
    # possible instead of "both".
    #  become-primary-on both;
  }

  disk {
    # if the lower level device reports io-error you have the choice of
    #  "pass_on"  ->  Report the io-error to the upper layers.
    #                 Primary   -> report it to the mounted file system.
    #                 Secondary -> ignore it.
    #  "call-local-io-error"
    #             ->  Call the script configured by the name "local-io-error".
    #  "detach"   ->  The node drops its backing storage device, and
    #                 continues in disk less mode.
    #
    on-io-error   detach;

    # Controls the fencing policy, default is "dont-care". Before you
    # set any policy you need to make sure that you have a working
    # outdate-peer handler. Possible values are:
    #  "dont-care"     -> Never call the outdate-peer handler. [ DEFAULT ]
    #  "resource-only" -> Call the outdate-peer handler if we primary and
    #                     loose the connection to the secondary. As well
    #                     whenn a unconnected secondary wants to become
    #                     primary.
    #  "resource-and-stonith"
    #                  -> Calls the outdate-peer handler and freezes local
    #                     IO immediately after loss of connection. This is
    #                     necessary if your heartbeat can STONITH the other
    #                     node.
    # fencing resource-only;

    # In case you only want to use a fraction of the available space
    # you might use the "size" option here.
    #
    # size 10G;

    # In case you are sure that your storage subsystem has battery
    # backed up RAM and you know from measurements that it really honors
    # flush instructions by flushing data out from its non volatile
    # write cache to disk, you have double security. You might then
    # reduce this to single security by disabling disk flushes with
    # this option. It might improve performance in this case.
    # ONLY USE THIS OPTION IF YOU KNOW WHAT YOU ARE DOING.
    # no-disk-flushes;
    # no-md-flushes;
  }

  net {
    # this is the size of the tcp socket send buffer
    # increase it _carefully_ if you want to use protocol A over a
    # high latency network with reasonable write throughput.
    # defaults to 2*65535; you might try even 1M, but if your kernel or
    # network driver chokes on that, you have been warned.
    # sndbuf-size 512k;

     timeout       60;    #  6 seconds  (unit = 0.1 seconds)
     connect-int   10;    # 10 seconds  (unit = 1 second)
     ping-int      10;    # 10 seconds  (unit = 1 second)
     ping-timeout   5;    # 500 ms (unit = 0.1 seconds)

    # Maximal number of requests (4K) to be allocated by DRBD.
    # The minimum is hardcoded to 32 (=128 kByte).
    # For high performance installations it might help if you
    # increase that number. These buffers are used to hold
    # datablocks while they are written to disk.
    #
    # max-buffers     2048;

    # When the number of outstanding requests on a standby (secondary)
    # node exceeds bdev-threshold, we start to kick the backing device
    # to start its request processing. This is an advanced tuning
    # parameter to get more performance out of capable storage controlers.
    # Some controlers like to be kicked often, other controlers
    # deliver better performance when they are kicked less frequently.
    # Set it to the value of max-buffers to get the least possible
    # number of run_task_queue_disk() / q->unplug_fn(q) calls.
    #
    # unplug-watermark   128;

    # The highest number of data blocks between two write barriers.
    # If you set this < 10 you might decrease your performance.
    # max-epoch-size  2048;

    # if some block send times out this many times, the peer is
    # considered dead, even if it still answers ping requests.
    # ko-count 4;

    # If you want to use OCFS2/openGFS on top of DRBD enable
    # this optione, and only enable it if you are going to use
    # one of these filesystems. Do not enable it for ext2,
    # ext3,reiserFS,XFS,JFS etc...
    # allow-two-primaries;

    # This enables peer authentication. Without this everybody
    # on the network could connect to one of your DRBD nodes with
    # a program that emulates DRBD's protocoll and could suck off
    # all your data.
    # Specify one of the kernel's digest algorithms, e.g.:
    # md5, sha1, sha256, sha512, wp256, wp384, wp512, michael_mic ...
    # an a shared secret.
    # Authentication is only done once after the TCP connection
    # is establised, there are no disadvantages from using authentication,
    # therefore I suggest to enable it in any case.
    # cram-hmac-alg "sha1";
    # shared-secret "FooFunFactory";

    # In case the nodes of your cluster nodes see each other again, after
    # an split brain situation in which both nodes where primary
    # at the same time, you have two diverged versions of your data.
    #
    # In case both nodes are secondary you can control DRBD's
    # auto recovery strategy by the "after-sb-0pri" options. The
    # default is to disconnect.
    #    "disconnect" ... No automatic resynchronisation, simply disconnect.
    #    "discard-younger-primary"
    #                     Auto sync from the node that was primary before
    #                     the split brain situation happened.
    #    "discard-older-primary"
    #                     Auto sync from the node that became primary
    #                     as second during the split brain situation.
    #    "discard-least-changes"
    #                     Auto sync from the node that touched more
    #                     blocks during the split brain situation.
    #    "discard-node-NODENAME"
    #                     Auto sync _to_ the named node.
    after-sb-0pri discard-least-changes;

    # In one of the nodes is already primary, then the auto-recovery
    # strategie is controled by the "after-sb-1pri" options.
    #    "disconnect" ... always disconnect
    #    "consensus"  ... discard the version of the secondary if the outcome
    #                     of the "after-sb-0pri" algorithm would also destroy
    #                     the current secondary's data. Otherwise disconnect.
    #    "violently-as0p" Always take the decission of the "after-sb-0pri"
    #                     algorithm. Even if that causes case an erratic change
    #                     of the primarie's view of the data.
    #                     This is only usefull if you use an 1node FS (i.e.
    #                     not OCFS2 or GFS) with the allow-two-primaries
    #                     flag, _AND_ you really know what you are doing.
    #                     This is DANGEROUS and MAY CRASH YOUR MACHINE if you
    #                     have a FS mounted on the primary node.
    #    "discard-secondary"
    #                     discard the version of the secondary.
    #    "call-pri-lost-after-sb"  Always honour the outcome of the "after-sb-0pri"
    #                     algorithm. In case it decides the the current
    #                     secondary has the right data, it panics the
    #                     current primary.
    #    "suspend-primary" ???
    after-sb-1pri discard-secondary;

    # In case both nodes are primary you control DRBD's strategy by
    # the "after-sb-2pri" option.
    #    "disconnect" ... Go to StandAlone mode on both sides.
    #    "violently-as0p" Always take the decission of the "after-sb-0pri".
    #    "call-pri-lost-after-sb" ... Honor the outcome of the "after-sb-0pri"
    #                     algorithm and panic the other node.
    after-sb-2pri disconnect;

    # To solve the cases when the outcome of the resync descissions is
    # incompatible to the current role asignment in the cluster.
    #    "disconnect" ... No automatic resynchronisation, simply disconnect.
    #    "violently" .... Sync to the primary node is allowed, violating the
    #                     assumption that data on a block device is stable
    #                     for one of the nodes. DANGEROUS, DO NOT USE.
    #    "call-pri-lost"  Call the "pri-lost" helper program on one of the
    #                     machines. This program is expected to reboot the
    #                     machine. (I.e. make it secondary.)
    rr-conflict disconnect;

    # DRBD-0.7's behaviour is equivalent to
    #   after-sb-0pri discard-younger-primary;
    #   after-sb-1pri consensus;
    #   after-sb-2pri disconnect;
  }

  syncer {
    al-extents 257;
  }

  on db01 {
    device     /dev/drbd0;
    disk       /dev/sda3;
    address    10.0.0.10:7788;
    flexible-meta-disk  internal;
  }

  on db02 {
    device    /dev/drbd0;
    disk      /dev/sda3;
    address   10.0.0.11:7788;
    flexible-meta-disk  internal;
  }
}
heartbeat ha.cf

Code: Select all

debugfile       /var/log/heartbeat/debug
logfile         /var/log/heartbeat/log
logfacility     local0
deadtime        10
initdead        20
udpport         694
bcast           eth1

mcast eth1 225.0.0.1 694 1 0
auto_failback on

node    db01
node    db02
ich hoffe das jemand schon mal das problem hatt und mir helfen kann.

danke schon mal.
User avatar
kindascared
Posts: 8
Joined: 2008-11-28 10:30
 

Re: Heartbeat + auto_failback on + DRBD

Post by kindascared »

wieso machst du das ganze in heartbeat version 1 über config files, und nicht wie empfohlen über crm aka heartbeat v2?

du könntest dann "theoretisch" ein master/slave setup einrichten (das zufällig genau für drbd/heartbeat konzipiert wurde :wink: ). Dann würde er beim failback merken das das promoten zum primary auf dem einen node nicht bzw noch nicht klappt und würde wieder umswitchen. anschliessend switcht du dann zu bestimmten zeiten oder manuell auf den stärkeren node.

autofailback kannst du in version 2 vergessen...hier heisst der spass u.a. default_resource_stickiness
preussner
Posts: 19
Joined: 2006-02-15 16:16
 

Re: Heartbeat + auto_failback on + DRBD

Post by preussner »

ganz einfach weil das system schon mit heartbeat version 1 seit über 2 jahren läuft.

na wenn das bei der 2er version auch ned geht dann kann ich ja bei der 1er bleiben.
da schalte ich den autofall back aus und machs dann per hand. aber das ist keine lösung das muss auch so gehen.