如果只要求倒数第二列不重复,则:
#!/usr/bin/perl
# test.pl
use strict;
my %keys;
while(<>) {
my @ary = split /\t/;
if(! $keys{$ary[-2]}) {
$keys{$ary[-2]} = 1;
print "$ary[-2]\t$ary[-1]";
}
}
如果要求两列都不重复,则:
#!/usr/bin/perl
use strict;
my %keys;
while(<>) {
my @ary = split /\t/;
if(! $keys{$ary[-2].".".$ary[-1]}) {
$keys{$ary[-2].".".$ary[-1]} = 1;
print "$ary[-2]\t$ary[-1]";
}
}
用法:test.pl < source.txt > target.txt
运行结果,用给出的数据测试,结果均为:
中文xxxxxxx1 中文字符1
中文xxxxxxx2 中文字符1
中文xxxxxxx3 中文字符1
中文xxxxxxx4 中文字符2
中文xxxxxxx5 中文字符2
不知道什么原因,今天我发的回答都显示不出来,只能用匿名发送试试看。(jasonqwu)
做了一个脚本,在我的机器上试过了:
use 5.016;
use warnings;
use utf8;
my %target;
my $source_file = 'original.txt';
my $target_file = 'target.txt';
my $source_file_fh; # your source file handle
my $target_file_fh; # your target file handle
my $key; # key item in target file
my $content; # last content item in target file
open($source_file_fh, "open($target_file_fh , ">:utf8", $target_file) or die "Can't open $target_file : $!\n";
while (<$source_file_fh>) {
$content = get_last_item($_);
$key = get_key($_, $content);
$target{$key} = $content if ($key);
}
for (sort keys %target) {
say $target_file_fh "$_ $target{$_}";
}
close $target_file_fh;
close $source_file_fh;
sub get_last_item {
my $str = shift;
$str =~ /.*[ ]+(.*)/;
return $1;
}
sub get_key {
my $str = shift;
my $content = shift;
$str =~ /.*[ ]+(.*)[ ]+$content/;
return $1;
}
按照新的要求,修改了代码,请确认。
是否可以假定所有行都是3列(以空格分割),那么所需要做的工作其实就是替换去除第一列
此时只需要
perl -F -pe 'print "@F[1,2]"‘ source.txt > target.txt