2010年10月

查看MongoDB中Replica Set的状态

查看Replica Set的状态,执行rs.status()即可

rs.status()
{
        "set" : "set1",
        "date" : "Thu Oct 28 2010 10:09:28 GMT+0800 (CST)",
        "myState" : 2,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.168.x.x:10000",
                        "health" : 1,
                        "state" : 1,
                        "uptime" : 2486042,
                        "lastHeartbeat" : "Thu Oct 28 2010 10:09:26 GMT+0800 (CST)"
                },
                {
                        "_id" : 1,
                        "name" : "zjm-hadoop-slave217:10000",
                        "health" : 1,
                        "state" : 2,
                        "self" : true
                },
                {
                        "_id" : 2,
                        "name" : "192.168.x.x:10001",
                        "health" : 1,
                        "state" : 7,
                        "uptime" : 2486042,
                        "lastHeartbeat" : "Thu Oct 28 2010 10:09:27 GMT+0800 (CST)"
                }
        ],
        "ok" : 1
}

其中,health为1表明服务器正常,0表明服务器down了

state为1表明是Primary,2表明是Secondary,3是Recovering,7是Arbiter,8是Down

如果想在Secondary上find数据,需要先执行以下命令:

db.getMongo().setSlaveOk();
db.suv.find().count()

要查看某个Collection的信息,可以用db.suv.stats()

{
        "ns" : "suv.suv",
        "count" : 9519581,
        "size" : 573341552,
        "avgObjSize" : 60.2276037149114,
        "storageSize" : 736285696,
        "numExtents" : 21,
        "nindexes" : 3,
        "lastExtentSize" : 129238272,
        "paddingFactor" : 1,
        "flags" : 1,
        "totalIndexSize" : 1612344128,
        "indexSizes" : {
                "_id_" : 462365632,
                "Lt_1" : 790758336,
                "Cl.Cid_1" : 359220160
        },
        "ok" : 1
}

其中:

  1. size即为DataSize,是数据所占的空间大小
  2. storageSize为includes free space allocated to this collection,就是说包括已经分配但还空闲的空间
  3. totalIndexSize为所有索引所占的空间
  4. totalSize为该Collection所占的总空间,它等于storageSize+totalIndexSize

以上几个信息,也可以通过下面命令来查询:

db.suv.dataSize()
db.suv.storageSize()
db.suv.totalIndexSize()
db.suv.totalSize()

修改MongoDB中Replica Set的配置

MongoDB1.6的版本中,向Replica Set中增加成员很简单,如下:

#增加新的成员
rs.add("192.168.x,210:10000");
#增加新的选举成员
rs.addArb("192.168.x,216:10001");

这样配置好,Mongo会自动把数据同步到新的成员上

1.6版本中没有提供remove成员的命令,据说在1.7版本中有这个命令

不过,我们可以通过replSetReconfig来完成此操作

下面我们去掉set中一个成员,并增加一个新的成员,操作如下:

#在remove的成员上停掉MongoDB服务
kill -2 `cat /opt/mongodb_data/mongod.lock`
#将老成员的数据data文件scp到新服务器上,为了加快rs_sync的过程
scp suv.* 10.x.x.x:/pvdata/
#在新成员上启动MongoDB服务
/usr/local/mongodb/bin/mongod --fork --shardsvr --port 10000 --replSet set2 --dbpath /pvdata/mongodb_data  --logpath /pvdata/mongodb_log/mongod.log --logappend
#在该set的Master成员上执行命令
config = {_id: 'set2', members: [
        {_id: 0, host: '192.168.x,218:10000'},
        {_id: 1, host: '10.x.x.x:10000'},
        {_id: 2, host: '192.168.x,216:10001', arbiterOnly: true}
    ]}
    use local
    old_config = db.system.replset.findOne();
    #注意需要设置新config的versin,否则会报错version number wrong
    config.version = old_config.version + 1;
    use admin
    db.runCommand({ replSetReconfig : config })

这样就完成了对Replica Set的重新配置,移走了一台旧服务器,并增加了15.39的新服务器

我们查看新服务器mongod.log,可以看到:

Wed Oct 27 14:24:19 done allocating datafile /pvdata/mongodb_data/local.ns, size: 16MB,  took 0.049 secs
...
Wed Oct 27 17:55:12 [rs_sync] replSet initialSyncOplogApplication 66500000
Wed Oct 27 17:58:02 [rs_sync] replSet initial sync finishing up
Wed Oct 27 17:58:02 [rs_sync] replSet set minValid=4cc7f49a:7b
Wed Oct 27 17:58:02 [rs_sync] building new index on { _id: 1 } for local.replset.minvalid
Wed Oct 27 17:58:02 [rs_sync] done for 0 records 0.052secs
Wed Oct 27 17:58:02 [rs_sync] replSet initial sync done
Wed Oct 27 17:58:04 [rs_sync] replSet SECONDARY

最后,说明新的Server已经sync完成,并作为secondary成功启动了

我们用rs.status()也能看到新的set的状态:

{
        "set" : "set2",
        "date" : "Wed Oct 27 2010 17:57:39 GMT+0800 (CST)",
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "zjm-hadoop-slave218:10000",
                        "health" : 1,
                        "state" : 1,
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "10.x.x.x.x:10000",
                        "health" : 1,
                        "state" : 3,
                        "uptime" : 1231,
                        "lastHeartbeat" : "Wed Oct 27 2010 17:57:39 GMT+0800 (CST)"
                },
                {
                        "_id" : 2,
                        "name" : "192.168.x,216:10001",
                        "health" : 1,
                        "state" : 7,
                        "uptime" : 1237,
                        "lastHeartbeat" : "Wed Oct 27 2010 17:57:39 GMT+0800 (CST)"
                }
        ],
        "ok" : 1
}

Hive中实现自定义函数UDF

Hive的UDF,其实很类似Mysql之类的自定义函数

不过它需要用java来编写,而不是用传统的SQL来完成

实现一个UDF的步骤如下:

  1. 实现一个Java Class,继承自UDF
  2. 打成jar包,并加入到Hive的ClassPath中
  3. 生成自定义函数,执行select
  4. 删除刚才创建的临时函数

下面这个UDF,是我给hive的array增加的一个函数

用来判断array中是否包含某个值,hive的标准函数中并没有此功能函数

package com.sohu.hadoop.hive.udf;
import java.util.*;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.BooleanWritable;
import org.apache.hadoop.io.Text;

public final class ArrayContains extends UDF {

  public BooleanWritable evaluate(ArrayList<String> arr,Text ele)
    {
        BooleanWritable rtn = new BooleanWritable(false);
        if (arr == null || arr.size() < 1)
        {
            return rtn;
        }
        try {
            String cstr = ele.toString();   
            for (String str : arr)
            {
                if (str.equals(cstr))
                {
                    rtn = new BooleanWritable(true);
                    break;
                }
            }
           
        } catch (Exception e) {
            e.printStackTrace();
        }
       
        return rtn;
    }
}

然后执行编译打包:

javac -classpath /opt/hadoop_client/hadoop/hadoop-0.20.2+228-core.jar:/opt/hadoop_client/hive/lib/hive-exec-0.5.0.jar src/com/sohu/hadoop/hive/udf/ArrayContains.java -d build
jar -cvf hadooop-mc-udf.jar -C build .

最后执行Hive QL查询:

hive -e "add jar /opt/ysz/udf/hadooop-mc-udf.jar;drop temporary function array_contains;create temporary function array_contains as 'com.sohu.hadoop.hive.udf.ArrayContains';select suv,channelid from pvlog_pre where array_contains(channelid,'2')"

最新文章

最近回复

  • feifei435:这两个URI实际是不一样的
  • zsy: git push origin 分支 -f 给力!
  • 冼敏兵:简单易懂,good fit
  • Jack:无需改配置文件,看着累! # gluster volume se...
  • Mr.j:按照你的方法凑效了,折腾死了。。。。
  • zheyemaster:补充一句:我的网站路径:D:\wamp\www ~~菜鸟站长, ...
  • zheyemaster:wamp2.5(apache2.4.9)下局域网访问403错误的...
  • Git中pull对比fetch和merge | 炼似春秋:[…] 首先,我搜索了git pull和git fe...
  • higkoo:总结一下吧, 性能调优示例: gluster volume s...
  • knowaeap:请问一下博主,你维护的openyoudao支持opensuse吗

分类

归档

其它