[トップ] [自作PC] [PCの履歴] [フリーソフト] [プロバイダー選び] [レンタルサーバー]
[VPS比較] [フレッツ光東西でギガビットインターネット接続] [着メロ] [MSX] [ツイッターでポイントを貯めよう! ]

Perlメモ - URLの正規表現(Puny=日本語ドメイン対応)

[Perlメモに戻る]

  • Perlメモ - URLの正規表現(Puny=日本語ドメイン対応)
    • URLの正規表現
      • 普通のもの
      • file://スキーマ付
      • PunyCode(いいかえれば日本語ドメイン)付
    • アンケート
    • このことに関する話題

URLの正規表現

普通のもの

qq(s?(?:https?|ftp|news)://[-_.!~*'a-zA-Z0-9;/?:@&=$,%#]);

file://スキーマ付

q(s?(?:(?:(?:https?|ftp|news)://)|(?:file:[/¥x5c][/¥x5c]))(?:[-¥x5c_.!~*'a-zA-Z0-9;/?:@&=$,%#]));

PunyCode(いいかえれば日本語ドメイン)付

  • UTF8

q{(¥b(?:https?|ftp|news)://(?:(?:[-_.!~*'()a-zA-Z0-9;:&=$,]|%[0-9A-Fa-f][0-9A-Fa-f])*@)?(?:(?:(?:[a-zA-Z0-9](?:[-_a-zA-Z0-9]*[a-zA-Z0-9])?|[-_0-9a-zA-Z¥x-¥xfd](?:[-_0-9a-zA-Z¥x-¥xfd]*[-_0-9a-zA-Z¥x-¥xfd])?)¥.)*[a-zA-Z](?:[-a-zA-Z0-9]*[a-zA-Z0-9])?¥.?|[0-9]¥.[0-9]¥.[0-9]¥.[0-9])(?::[0-9]*)?(?:/(?:[-_.!~*'a-zA-Z0-9:@&=$,]|%[0-9A-Fa-f][0-9A-Fa-f])*(?:;(?:[-_.!~*'a-zA-Z0-9:@&=$,]|%[0-9A-Fa-f][0-9A-Fa-f])*)*(?:/(?:[-_.!~*'a-zA-Z0-9:@&=$,]|%[0-9A-Fa-f][0-9A-Fa-f])*(?:;(?:[-_.!~*'a-zA-Z0-9:@&=$,]|%[0-9A-Fa-f][0-9A-Fa-f])*)*)*)?(?:¥?(?:[-_.!~*'a-zA-Z0-9;/?:@&=$,]|%[0-9A-Fa-f][0-9A-Fa-f])*)?(?:¥x(?:[-_.!~*'a-zA-Z0-9;/?:@&=$,]|%[0-9A-Fa-f][0-9A-Fa-f])*)?)};

判別は

q{[¥x-¥xfd]}

  • EUC

q{(¥b(?:https?|ftp|news)://(?:(?:[-_.!~*'()a-zA-Z0-9;:&=$,]|%[0-9A-Fa-f][0-9A-Fa-f])*@)?(?:(?:(?:[a-zA-Z0-9](?:[-_a-zA-Z0-9]*[a-zA-Z0-9])?|[-_0-9a-zA-Z¥xa1-¥xfe](?:[-_0-9a-zA-Z¥xa1-¥xfe]*[-_0-9a-zA-Z¥xa1-¥xfe])?)¥.)*[a-zA-Z](?:[-a-zA-Z0-9]*[a-zA-Z0-9])?¥.?|[0-9]¥.[0-9]¥.[0-9]¥.[0-9])(?::[0-9]*)?(?:/(?:[-_.!~*'a-zA-Z0-9:@&=$,]|%[0-9A-Fa-f][0-9A-Fa-f])*(?:;(?:[-_.!~*'a-zA-Z0-9:@&=$,]|%[0-9A-Fa-f][0-9A-Fa-f])*)*(?:/(?:[-_.!~*'a-zA-Z0-9:@&=$,]|%[0-9A-Fa-f][0-9A-Fa-f])*(?:;(?:[-_.!~*'a-zA-Z0-9:@&=$,]|%[0-9A-Fa-f][0-9A-Fa-f])*)*)*)?(?:¥?(?:[-_.!~*'a-zA-Z0-9;/?:@&=$,]|%[0-9A-Fa-f][0-9A-Fa-f])*)?(?:¥x(?:[-_.!~*'a-zA-Z0-9;/?:@&=$,]|%[0-9A-Fa-f][0-9A-Fa-f])*)?)};

判別は

q{[¥x-¥xfe]};

そのソースは(EUCですが)

$digit = q{[0-9]};
$alpha = q{[a-zA-Z]};
$alphanum = q{[a-zA-Z0-9]};
$hex = q{[0-9A-Fa-f]};
$escaped = qq{%$hex$hex};
$uric = q{(?:[-_.!~*'a-zA-Z0-9;/?:@&=$,]} . qq{|$escaped)};
#$uric = q{(?:[-_.!~*'()a-zA-Z0-9;/?:@&=$,]} . qq{|$escaped)};
$fragment = qq{$uric*};
$query = qq{$uric*};
$pchar = q{(?:[-_.!~*'a-zA-Z0-9:@&=$,]} . qq{|$escaped)};
#$pchar = q{(?:[-_.!~*'()a-zA-Z0-9:@&=$,]} . qq{|$escaped)};
$param = qq{$pchar*};
$segment = qq{$pchar*(?:;$param)*};
$path_segments = qq{$segment(?:/$segment)*};
$abs_path = qq{/$path_segments};
$port = qq{$digit*};
$IPv4address = qq{$digit¥¥.$digit¥¥.$digit¥¥.$digit};
$toplabel = qq{$alpha(?:} . q{[-a-zA-Z0-9]*} . qq{$alphanum)?};
$domainlabel = qq{$alphanum(?:} . q{[-_a-zA-Z0-9]*} . qq{$alphanum)?};
$domainlabel_rfc_class= q{[-_0-9a-zA-Z¥xa1-¥xfe]};
$domainlabel_rfc_punyonly_class= q{[¥xa1-¥xfe]};
$domainlabel_rfc=qq{$domainlabel_rfc_class(?:}
   . qq{$domainlabel_rfc_class*} . qq{$domainlabel_rfc_class)?};
$domainlabel_rfc_punyonly=
                qq{$domainlabel_rfc_class(?:}
                 . qq{$domainlabel_rfc_class*}
                         . qq{$domainlabel_rfc_punyonly_class)?} .
 '|' .
                qq{$domainlabel_rfc_punyonly_class(?:}
                 . qq{$domainlabel_rfc_class*}
                         . qq{$domainlabel_rfc_class)?};# . 
# '|' .
#               qq{$domainlabel_rfc_class(?:}
#                . qq{$domainlabel_rfc_punyonly_class?};
#                        . qq{$domainlabel_rfc_class)*};
$hostname = qq{(?:(?:$domainlabel|$domainlabel_rfc)¥¥.)*$toplabel¥¥.?};
$hostname_punyonly = qq{(?:(?:$domainlabel_rfc_punyonly)¥¥.)$toplabel¥¥.?}; 
#$hostname = qq{(?:$domainlabel¥¥.)*$toplabel¥¥.?};
$host = qq{(?:$hostname|$IPv4address)};
$host_punyonly = qq{(?:$hostname_punyonly)};
$hostport = qq{$host(?::$port)?};
$hostport_punyonly = qq{$host_punyonly(?::$port)?};
$userinfo = q{(?:[-_.!~*'()a-zA-Z0-9;:&=$,]|} . qq{$escaped)*};
$server = qq{(?:$userinfo¥@)?$hostport};
$server_punyonly = qq{(?:$userinfo¥@)?$hostport_punyonly};
$authority = qq{$server};
$authority_punyonly = qq{$server_punyonly};
#$scheme = q{(?:https?|shttp)};
$scheme = q{(?:https?|ftp)};
$net_path = qq{//$authority(?:$abs_path)?};
$net_path_punyonly = qq{//$authority_punyonly(?:$abs_path)?};
$hier_part = qq{$net_path(?:¥¥?$query)?};
$hier_part_punyonly = qq{$net_path_punyonly(?:¥¥?$query)?};
$absoluteURI = qq{$scheme:$hier_part};
$absoluteURI_punyonly = qq{$scheme:$hier_part_punyonly};
$URI_reference = qq{$absoluteURI(?:¥¥x$fragment)?};
$URI_reference_punyonly = qq{$absoluteURI_punyonly(?:¥¥x$fragment)?};
$http_URL_regex = q{¥b} . $URI_reference;
$http_URL_regex_punyonly = q{¥b} . $URI_reference_punyonly;

######################################################################

$test=<
              

アンケート

このことに関する話題

このページのトラックバックURL
https://daiba.cx/?cmd=tb&tb_id=6a0a8dbc1d8bd5e8

リンク元

Perlメモ/URLの正規表現(Puny=日本語ドメイン対応)のトラックバックはありません。